概述
最近在对Elasticsearch对数组相似度处理的时候产生了疑惑:
Elasticsearch在对数组做相似度处理的时候和对一串字符文档相似度处理的区别在哪里?
(Elasticsearch 5.4版本)建立的索引结构如下:
POST user
{
"mappings": {
"app": {
"properties": {
"appPackageNameLists": {
"type": "keyword",
"index": true
},
"gaid": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"region": {
"type": "keyword"
}
}
}
}
}
插入数据:
POST user/app/4ad86eae-4477-40a1-abaf-bf45ba2dbe1c
{
"gaid": "4ad86eae-4477-40a1-abaf-bf45ba2dbe1c",
"appPackageNameLists": [
"com.mi.global.shop",
"com.nox.mopen.app",
"com.ludashi.dualspace",
"info.cloneapp.mochat.in.goast",
"com.parallel.space.lite.arm64",
"com.freecharge.android",
"com.swiftkey.languageprovider"
],
"region": "IN"
}
POST user/app/4af0a32b-6995-4c92-8189-79aad0e6fecb
{
"gaid": "4af0a32b-6995-4c92-8189-79aad0e6fecb",
"appPackageNameLists": [
"com.joynow.killplane2",
"com.google.android.youtube",
"root.rootchecker",
"com.mxtech.videoplayer.pro",
"com.teslacoilsw.launcher",
"com.whatsapp",
"com.lbe.parallel.intl"
],
"region": "IN"
}
POST user/app/4be2411e-089d-4a6c-b773-6ca31e36c675
{
"gaid": "4be2411e-089d-4a6c-b773-6ca31e36c675",
"appPackageNameLists": [
"com.joynow.killplane2",
"com.google.android.youtube",
"root.rootchecker",
"com.mxtech.videoplayer.pro",
"com.teslacoilsw.launcher",
"com.whatsapp",
"com.lbe.parallel.intl",
"com.outfit7.mytalkingtomfree",
"fbs.com",
"com.tencent.mm"
],
"region": "IN"
}
查看一条数据的数组字段的分词结果:
因为appPackageNameLists是keyword类型,没有进行分词,
所以结果如下(app列表被拆分成为单个的数据元素)
只是很简单的记录了起始结束坐标和元素位置:
GET /user/app/4be2411e-089d-4a6c-b773-6ca31e36c675/_termvectors?fields=appPackageNameLists
{
"_index": "user",
"_type": "app",
"_id": "4be2411e-089d-4a6c-b773-6ca31e36c675",
"_version": 1,
"found": true,
"took": 1,
"term_vectors": {
"appPackageNameLists": {
"field_statistics": {
"sum_doc_freq": 10,
"doc_count": 1,
"sum_ttf": -1
},
"terms": {
"com.google.android.youtube": {
"term_freq": 1,
"tokens": [
{
"position": 1,
"start_offset": 22,
"end_offset": 48
}
]
},
"com.joynow.killplane2": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 21
}
]
},
"com.lbe.parallel.intl": {
"term_freq": 1,
"tokens": [
{
"position": 6,
"start_offset": 131,
"end_offset": 152
}
]
},
"com.mxtech.videoplayer.pro": {
"term_freq": 1,
"tokens": [
{
"position": 3,
"start_offset": 66,
"end_offset": 92
}
]
},
"com.outfit7.mytalkingtomfree": {
"term_freq": 1,
"tokens": [
{
"position": 7,
"start_offset": 153,
"end_offset": 181
}
]
},
"com.tencent.mm": {
"term_freq": 1,
"tokens": [
{
"position": 9,
"start_offset": 190,
"end_offset": 204
}
]
},
"com.teslacoilsw.launcher": {
"term_freq": 1,
"tokens": [
{
"position": 4,
"start_offset": 93,
"end_offset": 117
}
]
},
"com.whatsapp": {
"term_freq": 1,
"tokens": [
{
"position": 5,
"start_offset": 118,
"end_offset": 130
}
]
},
"fbs.com": {
"term_freq": 1,
"tokens": [
{
"position": 8,
"start_offset": 182,
"end_offset": 189
}
]
},
"root.rootchecker": {
"term_freq": 1,
"tokens": [
{
"position": 2,
"start_offset": 49,
"end_offset": 65
}
]
}
}
}
}
}
取一条完整的包名进行match结果如下:
GET user/app/_search
{
"from": 0,
"query": {
"bool": {
"should": [
{
"match": {
"appPackageNameLists": {
"query": "com.outfit7.mytalkingtomfree",
"boost": 1
}
}
}
]
}
},
"size": 100
}
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "user2",
"_type": "app",
"_id": "4be2411e-089d-4a6c-b773-6ca31e36c675",
"_score": 0.2876821,
"_source": {
"gaid": "4be2411e-089d-4a6c-b773-6ca31e36c675",
"appPackageNameLists": [
"com.joynow.killplane2",
"com.google.android.youtube",
"root.rootchecker",
"com.mxtech.videoplayer.pro",
"com.teslacoilsw.launcher",
"com.whatsapp",
"com.lbe.parallel.intl",
"com.outfit7.mytalkingtomfree",
"fbs.com",
"com.tencent.mm"
],
"region": "IN"
}
}
]
}
}
取这个包名中的部分数据如com进行match结果(是匹配不到的):
GET user/app/_search
{
"from": 0,
"query": {
"bool": {
"should": [
{
"match": {
"appPackageNameLists": {
"query": "com",
"boost": 1
}
}
}
]
}
},
"size": 100
}
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
不包含对象嵌套的数组,
其实没有那么复杂
最后
以上就是自信猫咪为你收集整理的Elasticsearch数组的理解的全部内容,希望文章能够帮你解决Elasticsearch数组的理解所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复