版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/quiet_girl/article/details/79974788
一、说明
StanfordCoreNLP是Stanford开发的关于自然语言处理的工具包,其包括分词、词性还原以及词性标注等很多功能。具体可参考官网:https://stanfordnlp.github.io/CoreNLP/。 这里主要是将其词性还原功能的简单使用。
二、下载和使用
1、下载地址:https://stanfordnlp.github.io/CoreNLP/,下载界面如下图:
2、下载好之后解压,从解压后的文件中找到以下6个jar包,添加到java项目中:
3、接下来就可以使用代码直接调用了。
三、代码
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import java.util.List;
import java.util.Properties;
/**
* 代码功能:词性还原、词干提取
* jar包下载地址:https://stanfordnlp.github.io/CoreNLP/
* 工具包API地址:https://stanfordnlp.github.io/CoreNLP/api.html
*/
public class StemmerTest {
public static void main(String[] args){
Properties props = new Properties(); // set up pipeline properties
props.put("annotators", "tokenize, ssplit, pos, lemma"); //分词、分句、词性标注和次元信息。
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String txtWords = "Franklin said, If a man empties his purse into his head,no man can take it away from him,an investment in knowledge always pays the best interest."; // 待处理文本
Annotation document = new Annotation(txtWords);
pipeline.annotate(document);
List<CoreMap> words = document.get(CoreAnnotations.SentencesAnnotation.class);
for(CoreMap word_temp: words) {
for (CoreLabel token: word_temp.get(CoreAnnotations.TokensAnnotation.class)) {
String word = token.get(CoreAnnotations.TextAnnotation.class); // 获取单词信息
String lema = token.get(CoreAnnotations.LemmaAnnotation.class); // 获取对应上面word的词元信息,即我所需要的词形还原后的单词
System.out.println(word + " " + lema);
}
}
}
}
输出结果如下:
Franklin Franklin
said say
, ,
If if
a a
man man
empties empty
his he
purse purse
into into
his he
head head
, ,
no no
man man
can can
take take
it it
away away
from from
him he
, ,
an a
investment investment
in in
knowledge knowledge
always always
pays pay
the the
best best
interest interest
. .
附:
关于词性标注等功能请详见官网API文档:https://stanfordnlp.github.io/CoreNLP/api.html
参考文献:
https://blog.csdn.net/cuixianpeng/article/details/12999537
https://blog.csdn.net/hksskh/article/details/49183175