【Es】ElasticSearch 自定义分词器1.分词器2.自定义分词器

230 阅读 0 评论 152 点赞

我是靠谱客的博主过时衬衫，这篇文章主要介绍【Es】ElasticSearch 自定义分词器1.分词器2.自定义分词器，现在分享给大家，希望可以做个参考。

在这里插入图片描述

1.分词器

转载：https://blog.csdn.net/gwd1154978352/article/details/83343933

分词器首先看文章：【Elasticsearch】Elasticsearch analyzer 分词器

【Es】ElasticSearch 自定义分词器

Elasticsearch中，内置了很多分词器（analyzers），例如standard （标准分词器）、english（英文分词）和chinese （中文分词），默认的是standard，

standard tokenizer：以单词边界进行切分
standard token filter：什么都不做
lowercase token filter：将所有字母转换为小写
stop token filer（默认被禁用）：移除停用词，比如a the it等等

修改分词器设置

启用english，停用词token filter

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "es_std": {
          "type": "standard",
          "stopwords": "_english_"
        }
      }
    }
  }
}

标准分词测试代码

GET /my_index/_analyze
{
  "analyzer": "standard", 
  "text": "a dog is in the house"
}

结果

{
  "tokens": [
    {
      "token": "a",
      "start_offset": 0,
      "end_offset": 1,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "dog",
      "start_offset": 2,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "is",
      "start_offset": 6,
      "end_offset": 8,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "in",
      "start_offset": 9,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "the",
      "start_offset": 12,
      "end_offset": 15,
      "type": "<ALPHANUM>",
      "position": 4
    },
    {
      "token": "house",
      "start_offset": 16,
      "end_offset": 21,
      "type": "<ALPHANUM>",
      "position": 5
    }
  ]
}

设置的英文分词测试代码

GET /my_index/_analyze
{
  "analyzer": "es_std",
  "text":"a dog is in the house"
}

结果

{
  "tokens": [
    {
      "token": "dog",
      "start_offset": 2,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "house",
      "start_offset": 16,
      "end_offset": 21,
      "type": "<ALPHANUM>",
      "position": 5
    }
  ]
}

2.自定义分词器

PUT /my_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "&_to_and": {
          "type": "mapping",
          "mappings": ["&=> and"]
        }
      },
      "filter": {
        "my_stopwords": {
          "type": "stop",
          "stopwords": ["the", "a"]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip", "&_to_and"],
          "tokenizer": "standard",
          "filter": ["lowercase", "my_stopwords"]
        }
      }
    }
  }
}

内容解析
在这里插入图片描述

测试代码

GET /my_index/_analyze
{
  "text": "tom&jerry are a friend in the house, <a>, HAHA!!",
  "analyzer": "my_analyzer"
}

测试结果

{
  "tokens": [
    {
      "token": "tomandjerry",
      "start_offset": 0,
      "end_offset": 9,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "are",
      "start_offset": 10,
      "end_offset": 13,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "friend",
      "start_offset": 16,
      "end_offset": 22,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "in",
      "start_offset": 23,
      "end_offset": 25,
      "type": "<ALPHANUM>",
      "position": 4
    },
    {
      "token": "house",
      "start_offset": 30,
      "end_offset": 35,
      "type": "<ALPHANUM>",
      "position": 6
    },
    {
      "token": "HAHA",
      "start_offset": 42,
      "end_offset": 46,
      "type": "<ALPHANUM>",
      "position": 7
    }
  ]
}

type中的使用

PUT /my_index/_mapping/my_type
{
  "properties": {
    "content": {
      "type": "text",
      "analyzer": "my_analyzer"
    }
  }
}

最后

以上就是过时衬衫最近收集整理的关于【Es】ElasticSearch 自定义分词器1.分词器2.自定义分词器的全部内容，更多相关【Es】ElasticSearch内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：大数据-Elasticsearch
浏览次数：230 次浏览
发布日期：2023-09-10 18:20:32
本文链接：https://www.kaopuke.com/article/k-p-k_14_uzo_6_fy_13_jg2.html

Eclipse设置项目默认编码为 UTF-8

P8773 [蓝桥杯 2022 省 A] 选数异或

STC8H开发(四): FwLib_STC8 封装库的介绍和注意事项目录动机使用 FwLib_STC8 开发的注意事项结束

FPGA的EDA工具常见报错分析8. 关于Nios II中Verify failed between address 0xxxx and 0xxxx错误的解决,错误一般的提示为：Verifying 000xxxxx ( 0%)% C; Q0 H2 R J7 W* Z9 O R* rVerify failed between address 0xxxxxx and 0xxxxxx( O" ^0 u; e9 E: E7 XLeaving target processor paused网上的人总结

【Es】ElasticSearch 自定义分词器1.分词器2.自定义分词器

1.分词器

2.自定义分词器

最后

评论列表共有 0 条评论

发表评论取消回复

【Es】ElasticSearch 自定义分词器1.分词器2.自定义分词器

1.分词器

2.自定义分词器

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复