Elasticsearch结构化搜索 term filter来搜索数据

115 阅读 0 评论 76 点赞

我是靠谱客的博主精明铅笔，最近开发中收集的这篇文章主要介绍Elasticsearch结构化搜索 term filter来搜索数据，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

案例：

一. 根据用户ID、是否隐藏、帖子ID、发帖日期来搜索帖子

插入一些测试帖子数据

POST /post/_doc/_bulk
{ "index": { "_id": 1 }}
{ "articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 2 }}
{ "articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden": false, "postDate": "2017-01-02" }
{ "index": { "_id": 3 }}
{ "articleID" : "JODL-X-1937-#pV7", "userID" : 2, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 4 }}
{ "articleID" : "QQPX-R-3956-#aD8", "userID" : 2, "hidden": true, "postDate": "2017-01-02" }

刚开始的时候，先用4个字段，因为整个es是支持json document格式的，所以说扩展性和灵活性非常之好。如果后续随着业务需求的增加，要在document中增加更多的field，那么我们可以很方便的随时添加field。但是如果是在关系型数据库中，比如mysql，我们建立了一个表，现在要给表中新增一些column，那就很坑爹了，必须用复杂的修改表结构的语法去执行。而且可能对系统代码还有一定的影响。

看一下es默认生成的mapping结构

GET /post/_mapping
{
"post" : {
"mappings" : {
"properties" : {
"articleID" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"hidden" : {
"type" : "boolean"
},
"postDate" : {
"type" : "date"
},
"userID" : {
"type" : "long"
}
}
}
}
}

现在es 5.2以上版本，type 等于 text时，默认会设置两个field，一个是field本身，比如articleID，就是分词的；还有一个的话，就是field.keyword，articleID.keyword，默认不分词，会最多保留256个字符

根据用户ID搜索帖子

GET /post/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"userID": 1
}
}
}
}
}

term filter/query：对搜索文本不分词，直接拿去倒排索引中匹配，你输入的是什么，就去匹配什么
比如说，如果对搜索文本进行分词的话，“helle world” --> “hello”和“world”，两个词分别去倒排索引中匹配
term。不分词则，“hello world” --> “hello world”，直接去倒排索引中匹配“hello world”

搜索没有隐藏的帖子

GET /post/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"hidden": false
}
}
}
}
}

根据发帖日期搜索帖子

GET /post/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"postDate": "2017-01-01"
}
}
}
}
}

根据帖子ID搜索帖子

GET /post/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"articleID": "XHDK-A-1293-#fJ3"
}
}
}
}
}
结果：你会发现这里没有搜索到，因为进行分词了
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
如下不分词：
GET /post/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"articleID.keyword": "XHDK-A-1293-#fJ3"
}
}
}
}
}
下面可以看到不分词：
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "post",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"articleID" : "XHDK-A-1293-#fJ3",
"userID" : 1,
"hidden" : false,
"postDate" : "2017-01-01"
}
}
]
}
}

articleID.keyword，是es新的版本内置建立的field，就是不分词的。所以一个articleID过来的时候，会建立两次索引，一次是自己本身，是要分词的，分词后放入倒排索引；另外一次是基于articleID.keyword，不分词，保留256个字符最多，直接一个字符串放入倒排索引中。

所以term filter，对text过滤，可以考虑使用内置的field.keyword来进行匹配。但是有个问题，默认就保留256个字符。所以尽可能还是自己去手动建立索引，指定not_analyzed吧。在新版本的es中，不需要指定not_analyzed也可以，将type=keyword即可。

查看分词

GET /post/_analyze
{
"field": "articleID",
"text": "XHDK-A-1293-#fJ3"
}
结果如下：可以看到字符自动过滤掉了，大写也变成小写了
{
"tokens" : [
{
"token" : "xhdk",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "a",
"start_offset" : 5,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "1293",
"start_offset" : 7,
"end_offset" : 11,
"type" : "<NUM>",
"position" : 2
},
{
"token" : "fj3",
"start_offset" : 13,
"end_offset" : 16,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}

默认是analyzed的text类型的field，建立倒排索引的时候，就会对所有的articleID分词，分词以后，原本的articleID就没有了，只有分词后的各个word存在于倒排索引中。
term，是不对搜索文本分词的，XHDK-A-1293-#fJ3 --> XHDK-A-1293-#fJ3；但是articleID建立索引的时候，XHDK-A-1293-#fJ3 --> xhdk，a，1293，fj3

重建索引

#先删除
DELETE /post
#建立mapping
PUT /post
{
"mappings": {
"properties": {
"articleID" : {
"type": "keyword"
}
}
}
}
#重新创建数据
POST /post/_doc/_bulk
{ "index": { "_id": 1 }}
{ "articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 2 }}
{ "articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden": false, "postDate": "2017-01-02" }
{ "index": { "_id": 3 }}
{ "articleID" : "JODL-X-1937-#pV7", "userID" : 2, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 4 }}
{ "articleID" : "QQPX-R-3956-#aD8", "userID" : 2, "hidden": true, "postDate": "2017-01-02" }
#重新看mapping 已经建立成功了
GET /post/_mapping
{
"post" : {
"mappings" : {
"properties" : {
"articleID" : {
"type" : "keyword"
},
"hidden" : {
"type" : "boolean"
},
"postDate" : {
"type" : "date"
},
"userID" : {
"type" : "long"
}
}
}
}
}

重新根据帖子ID和发帖日期进行搜索

GET /post/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"articleID": "XHDK-A-1293-#fJ3"
}
}
}
}
}
结果如下：发现现在可以直接搜索出来了
{
"took" : 135,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "post",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"articleID" : "XHDK-A-1293-#fJ3",
"userID" : 1,
"hidden" : false,
"postDate" : "2017-01-01"
}
}
]
}
}