概述
HanLP 是由一系列模型和算法组成的Java工具包。目标是普及自然语言处理在生产环境中的应用。它不仅是分词,还提供了词法分析、句法分析、语义理解等完整的功能。HanLP 具有功能齐全、性能高效、结构清晰、语料最新、功能可定制等特点。
HanLP 是完全开源的,包括字典。不依赖其他jar,底层使用了一系列高速数据结构,如双数组Trie树、DAWG、AhoCorasickDoubleArrayTrie等,这些基础组件都是开源的。
通过工具类HanLP,可以一句话调用所有函数,文档详细,开箱即用。底层算法经过精心优化,极速分词模式下每秒可达2000万字,内存仅需要120MB。IO方面,字典加载速度极快,快速启动仅需500ms
POM文件
4.0.0
com.iqilu
Segment
1.0-SNAPSHOT
jar
Hello
http://maven.apache.org
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
junit
junit
3.8.1
test
com.hankcs
hanlp
portable-1.3.2
DemoSegment.java
package com.iqilu;
import com.hankcs.hanlp.HanLP;
import com.hankcs.hanlp.seg.common.Term;
import java.util.List;
public class DemoSegment {
public static void main(String[] args) {
String[] testCase = new String[]{
“Goods and services”,
“Married and unmarried are indeed interfering with participles”,
“Buy fruits and then come to the Expo and die at the Expo”,
“China’s capital is Beijing”,
“Welcome the new teacher to come to dinner”,
“The virgin officer of the industry and information technology must personally explain the installation of technical devices such as 24 switches through the subordinate departments every month”,
“With the rise of web games, the current web games are prosperous and rely on archives. The design for logical judgment is reduced, but this one cannot be completely ignored.”,
};
for (String sentence : testCase)
{
List termList = HanLP.segment(sentence);
System.out.println(termList);
}
}
}结果
[Products/n, and/c, services/vn]
[Married/v, of/uj, and/c, not yet/d, married/v, of/uj, indeed/ad, at/p, interference/v, participle/n, ah/y]
[Buy/v, fruit/n, then/c, come/v, Expo/j, finally/f, go/v, Expo/j]
[China/ns, of/uj, capital/n, yes/v, Beijing/ns]
[Welcome/v, new/a, teacher/n, before death/t, come/v, dinner/v]
[Industry and Information Office/n, female/b, secretary/n, monthly/r, passing/p, subordinate/v, department/n, all/nr, personally/d,
Explain/v, 24/m, port/q, switch/n, etc/u, technical/n, device/n, of/uj, installation/v, work/vn]
[With/p, page/q, youxing/n, from/v, to/v, now/t, of/uj, page tour/nz, flourishing/an,/w,
Depend on/v, archive/vn, proceed/v, logic/n, judge/v, of/uj, design/vn, reduce/v, up/ul,/w,
But/c, this piece of/r, also/d, cannot/v, completely/ad, ignore/v, drop/v,./w]Java分词工具只是众多的Java开发工具之一,以后大家还会接触到更多相关知识。
最后
以上就是敏感洋葱为你收集整理的Java培训学习之分词工具之HanLP介绍的全部内容,希望文章能够帮你解决Java培训学习之分词工具之HanLP介绍所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复