最近在对Elasticsearch对数组相似度处理的时候产生了疑惑:
Elasticsearch在对数组做相似度处理的时候和对一串字符文档相似度处理的区别在哪里?
(Elasticsearch 5.4版本)建立的索引结构如下:
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25POST user { "mappings": { "app": { "properties": { "appPackageNameLists": { "type": "keyword", "index": true }, "gaid": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "region": { "type": "keyword" } } } } }
插入数据:
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52POST user/app/4ad86eae-4477-40a1-abaf-bf45ba2dbe1c { "gaid": "4ad86eae-4477-40a1-abaf-bf45ba2dbe1c", "appPackageNameLists": [ "com.mi.global.shop", "com.nox.mopen.app", "com.ludashi.dualspace", "info.cloneapp.mochat.in.goast", "com.parallel.space.lite.arm64", "com.freecharge.android", "com.swiftkey.languageprovider" ], "region": "IN" } POST user/app/4af0a32b-6995-4c92-8189-79aad0e6fecb { "gaid": "4af0a32b-6995-4c92-8189-79aad0e6fecb", "appPackageNameLists": [ "com.joynow.killplane2", "com.google.android.youtube", "root.rootchecker", "com.mxtech.videoplayer.pro", "com.teslacoilsw.launcher", "com.whatsapp", "com.lbe.parallel.intl" ], "region": "IN" } POST user/app/4be2411e-089d-4a6c-b773-6ca31e36c675 { "gaid": "4be2411e-089d-4a6c-b773-6ca31e36c675", "appPackageNameLists": [ "com.joynow.killplane2", "com.google.android.youtube", "root.rootchecker", "com.mxtech.videoplayer.pro", "com.teslacoilsw.launcher", "com.whatsapp", "com.lbe.parallel.intl", "com.outfit7.mytalkingtomfree", "fbs.com", "com.tencent.mm" ], "region": "IN" }
查看一条数据的数组字段的分词结果:
因为appPackageNameLists是keyword类型,没有进行分词,
所以结果如下(app列表被拆分成为单个的数据元素)
只是很简单的记录了起始结束坐标和元素位置:
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121GET /user/app/4be2411e-089d-4a6c-b773-6ca31e36c675/_termvectors?fields=appPackageNameLists { "_index": "user", "_type": "app", "_id": "4be2411e-089d-4a6c-b773-6ca31e36c675", "_version": 1, "found": true, "took": 1, "term_vectors": { "appPackageNameLists": { "field_statistics": { "sum_doc_freq": 10, "doc_count": 1, "sum_ttf": -1 }, "terms": { "com.google.android.youtube": { "term_freq": 1, "tokens": [ { "position": 1, "start_offset": 22, "end_offset": 48 } ] }, "com.joynow.killplane2": { "term_freq": 1, "tokens": [ { "position": 0, "start_offset": 0, "end_offset": 21 } ] }, "com.lbe.parallel.intl": { "term_freq": 1, "tokens": [ { "position": 6, "start_offset": 131, "end_offset": 152 } ] }, "com.mxtech.videoplayer.pro": { "term_freq": 1, "tokens": [ { "position": 3, "start_offset": 66, "end_offset": 92 } ] }, "com.outfit7.mytalkingtomfree": { "term_freq": 1, "tokens": [ { "position": 7, "start_offset": 153, "end_offset": 181 } ] }, "com.tencent.mm": { "term_freq": 1, "tokens": [ { "position": 9, "start_offset": 190, "end_offset": 204 } ] }, "com.teslacoilsw.launcher": { "term_freq": 1, "tokens": [ { "position": 4, "start_offset": 93, "end_offset": 117 } ] }, "com.whatsapp": { "term_freq": 1, "tokens": [ { "position": 5, "start_offset": 118, "end_offset": 130 } ] }, "fbs.com": { "term_freq": 1, "tokens": [ { "position": 8, "start_offset": 182, "end_offset": 189 } ] }, "root.rootchecker": { "term_freq": 1, "tokens": [ { "position": 2, "start_offset": 49, "end_offset": 65 } ] } } } } }
取一条完整的包名进行match结果如下:
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59GET user/app/_search { "from": 0, "query": { "bool": { "should": [ { "match": { "appPackageNameLists": { "query": "com.outfit7.mytalkingtomfree", "boost": 1 } } } ] } }, "size": 100 } { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.2876821, "hits": [ { "_index": "user2", "_type": "app", "_id": "4be2411e-089d-4a6c-b773-6ca31e36c675", "_score": 0.2876821, "_source": { "gaid": "4be2411e-089d-4a6c-b773-6ca31e36c675", "appPackageNameLists": [ "com.joynow.killplane2", "com.google.android.youtube", "root.rootchecker", "com.mxtech.videoplayer.pro", "com.teslacoilsw.launcher", "com.whatsapp", "com.lbe.parallel.intl", "com.outfit7.mytalkingtomfree", "fbs.com", "com.tencent.mm" ], "region": "IN" } } ] } }
取这个包名中的部分数据如com进行match结果(是匹配不到的):
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36GET user/app/_search { "from": 0, "query": { "bool": { "should": [ { "match": { "appPackageNameLists": { "query": "com", "boost": 1 } } } ] } }, "size": 100 } { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } }
不包含对象嵌套的数组,
其实没有那么复杂
最后
以上就是自信猫咪最近收集整理的关于Elasticsearch数组的理解的全部内容,更多相关Elasticsearch数组内容请搜索靠谱客的其他文章。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复