概述
es默认的standard分词器不会按照下划线去分词
GET my_index/_analyze
{
"analyzer": "standard",
"text": "yi_yuan_ordersvc_person_comp_inter_s1104_ISubmitProdCoSvc_prodDataSubmit"
}
#解析结果:
{
"tokens" : [
{
"token" : "yi_yuan_ordersvc_person_comp_inter_s1104_isubmitprodcosvc_proddatasubmit",
"start_offset" : 0,
"end_offset" : 75,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
通过自定义分词可以解决这一问题
#自定义分词
PUT /my_index2
{
"settings": {
"analysis": {
"char_filter": {
"XtoS": {
"type": "mapping",
"mappings": ["_=>|"]
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": ["XtoS"],
"tokenizer": "standard",
"filter": ["lowercase"]
}
}
}
}
}
#测试分词效果
GET my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "yi_yuan_ordersvc_person_comp_inter_s1104_ISubmitProdCoSvc_prodDataSubmit"
}
#分词结果
{
"tokens" : [
{
"token" : "com",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "sitech",
"start_offset" : 4,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "ordersvc",
"start_offset" : 11,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "person",
"start_offset" : 20,
"end_offset" : 26,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "comp",
"start_offset" : 27,
"end_offset" : 31,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "inter",
"start_offset" : 32,
"end_offset" : 37,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "s1104",
"start_offset" : 38,
"end_offset" : 43,
"type" : "<ALPHANUM>",
"position" : 6
},
{
"token" : "isubmitprodcosvc",
"start_offset" : 44,
"end_offset" : 60,
"type" : "<ALPHANUM>",
"position" : 7
},
{
"token" : "proddatasubmit",
"start_offset" : 61,
"end_offset" : 75,
"type" : "<ALPHANUM>",
"position" : 8
}
]
}
最后
以上就是如意薯片为你收集整理的解决ES默认不按照下划线分词的方法的全部内容,希望文章能够帮你解决解决ES默认不按照下划线分词的方法所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复