ElasticSearch学习记录ElasticSearch-分布式搜索引擎

65 阅读 0 评论 43 点赞

我是靠谱客的博主漂亮滑板，这篇文章主要介绍ElasticSearch学习记录ElasticSearch-分布式搜索引擎，现在分享给大家，希望可以做个参考。

ElasticSearch-分布式搜索引擎

文章目录

ElasticSearch-分布式搜索引擎
- 全文检索
- 简介
- 安装
- - 传统方式安装
  - 开启远程访问
  - docker方式安装
- kibana
- - 传统方式安装
  - docker方式安装
  - compose方式安装
- 核心概念索引映射文档
- - - 索引<Index>
    - 映射<Mapping>
    - 文档<Document>
    - 基本操作
    - - 索引<index>
      - 映射<mapping>
      - 文档<document>
- 高级查询Query DSL
- - - 说明
    - 语法
    - 常见检索
- 索引原理
- - - 倒排索引
    - 索引模型
- 分词器
- 过滤查询
- SpringBoot整合ES
- - ElasticsearchOperations
  - RestHighLevelClient
- 聚合查询
- 整合应用

全文检索

全文检索是计算机程序通过扫描文章中的每一个词、对每一个词建立一个索引，指明该词在文章中出现的次数和位置。当用户查询时根据建立的索引查找，类似于通过字典的检索字表查字的过程。

全文检索(Full-Text Retrieval)以文本作为检索对象，找出含有指定词汇的文本。全面、准确和快速是衡量全文检索系统的关键指标。

只处理文本、不处理语义
搜索时英文不区分大小写
结果列表有相关度排序

简介

什么是ElasticSearch

ElasticSearch简称ES,是基于Apache Lucene构建的开源搜索引擎，是当前最流行的企业级搜索引擎。Lucene本身既可以被认为迄今为止性能最好的一款开源搜索引擎工具包，单是lucene的API相对复杂，需要深厚的搜索理论。很难集成到实际的应用中去。ES是采用JAVA语言编写，提供了简单易用的RestFul API，开发者可以使用其简单的RestFul API,开发相关的搜索功能，从而避免了lucene的复杂性。

安装

传统方式安装

# 0. 环境准备
- centos 7 windows macos
- jdk11+ 并配置环境变量

# 1. 下载ES
- https://www.elastic.co/cn/downloads/elasticsearch

# 2. 安装ES不用使用root用户，创建普通用户

# 添加用户名
useradd temp
# 修改密码
passwd  temp

# 3. 解压缩ES安装包

tar -zxvf elasticsearch-8.3.2-linux-x86_64.tar.gz

# 4. 查看目录结构

目录名称	目录作用	注释
bin	启动ES服务脚本目录
config	ES配置文件目录
data	ES数据存放目录	启动ES服务后，自动生成
jdk	ES提供需要指定的jdk目录
lib	ES依赖第三库的目录
logs	ES的日志目录
modules	ES模块目录
plugins	插件目录

# 5. 启动ES服务
./elasticsearch-8.3.2/bin/elasticsearch

注意若环境中已经配置了JAVA_HOME环境变量，且Jdk版本与elasticsearch所需的版本不一致时，需要执行第6步操作

# 6. 配置环境变量
vim /etc/profile
export ES_JAVA_HOME=指定为ES安装目录中jdk目录
source /etc/profile

# 7. 请求成功后，可以直接访问本机的9200端口,若出现如下数据，则代表elasticsearch启动成功

{
  "name" : "SK-20211114GEEB",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "LEaWJhvHQWujCI70FaM6oQ",
  "version" : {
    "number" : "8.3.3",
    "build_flavor" : "default",
    "build_type" : "zip",
    "build_hash" : "801fed82df74dbe537f89b71b098ccaff88d2c56",
    "build_date" : "2022-07-23T19:30:09.227964828Z",
    "build_snapshot" : false,
    "lucene_version" : "9.2.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

开启远程访问

# 1. 修改ES安装包中config/elasticsearch.yml配置文件
vim elasticsearch.yml
# 2. 修改配置
network.host: 0.0.0.0
# 3. 处理ES集群配置，配置单节点启动，修改config/elasticsearch.yml配置文件
cluster.initial_master_nodes: ["node-1"]
# 4. 重启启动ES服务(Linux版本启动时有几率报错,需要查询百度)

docker方式安装

# 1. 获取镜像
docker pull elasticsearch:7.14.0
# 2. 运行ES
docker run -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.14.0
# 3. 访问ES
[ip]:9200

kibana

简介

Kibana是一个针对Elasticsearch的开源分析及可视化平台，使用Kibana可以查询、查看并与存储在ES索引的数据进行交互操作，使用Kibana能执行高级的数据分析、并能以图表、表格和地图的形式查看数据。

安装

传统方式安装

# 1. 下载 Kibana
https://www.elastic.co/cn/downloads/kibana
# 2. 解压
tar -zxvf kibana-8.3.3-linux-x86_64.tar.gz
# 3. 编辑kibana配置文件
vim /etc/kibana/config/kibana.yml
# 4. 修改配置
server.host: "0.0.0.0"  # 开启kibana远程访问
elasticsearch.hosts: ["http://localhost:9200"] # ES服务地址
# 5. 启动kibana
./etc/kibana/bin/kibana
# 6. 访问web页面
http://localhost:5601

docker方式安装

# 1. 拉取镜像
docker pull kibana:7.14.0
# 2. 运行kibana
docker run -d --name kibana -p 5601:5601 kibana:7.14.0
# 3. 进入容器，修改ES连接配置信息，重启kibana容器，访问web页面
docker exec -it [容器ID] bash
http://localhost:5601
# 4. 基于数据卷加载配置文件运行
a. 从容器复制kibana配置文件
docker cp [容器ID]:[sourceFile路径] [targetDir路径]
b. 修改配置文件为对应ES服务器地址
vim [targetDir路径]/kibana.yml
c. 通过数据卷加载配置文件方式启动
docker run -d -v /root/kibana.yml:/usr/share/kibana/config/kibana.yml --name kibana -p 5601:5601 kibana:7.14.0

compose方式安装

# 编辑docker-compose.yml配置文件
version： "3.8"
volumns:
	data: 
	config:
	plugin:
# 声明使用的网路，若都在同一台宿主机上则可以直接使用服务名进行访问
networks:
	es:
services:
	elasticsearch:
		image:	elasticsearch:7.14.0
		ports:  
	 	  - "9200:9200"
	 	  - "9300:9300"
		networks:
	  	 - "es"
		enviroment:
	  	 - "discovery.type=single-node"
	  	 - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
		volumes:
     	  - data:/usr/share/elasticsearch/data
      	  - config:/usr/share/elasticsearch/config
      	  - plugin:/usr/share/elasticsearch/plugins
     kibana:
     	image: kibana:7.14.0
     	port: 
     	  - "5601:5601"
     	network:
          - "es"
        volumns: 
          - ./kibana.yml:/usr/share/kibana/config/kibana.yml

# kibana配置文件
server.port： "0"
server.shutdownTimeout: "5s"
elasticsearch.hosts: ["http://elasticsearch:9200"]
monitoring.ui.container.elasticsearch.enabled: true

# 使用docker-compose进行启动
docker-compose -v
# 启动服务
docker-compose up -d
# 关闭服务
docker-compose down

核心概念索引映射文档

索引

一个索引就是一个拥有几分相似特征的文档的集合。一个索引由一个名字来标识(必须全部由小写字母组成)

映射

映射是定义一个文档和它所包含的字段如何被存储和索引的过程。在默认配置下，ES可以根据插入数据自动创建mapping，也支持手动创建mapping。mapping中主要包含字段名、字段类型等。

文档

文档是索引中存储的一条条数据。一条文档是一个可被索引的最小单元

基本操作

索引

创建

# 1. 创建索引
- PUT /索引名 
edge： PUT /products
- 注意
	1. ES索引健康状态 red(索引不可用) yellow(索引可用，存在风险) green(健康)
	2. 默认ES在创建索引时会为索引创建1个备份索引和一个primary索引

# 2. 创建索引 进行索引分片配置
- PUT /priducts
{
    "settings": {
        "number_of_shards": 1, # 指定主分片的数量
        "number_of_replicas": 0 # 指定副本分片的数量
    }
}

查询

# 1. 查询所有
GET /_cat/indices
# 2. 查询所有并显示标题
GET /_cat/indices?v

删除

# DELETE [索引名称]
DELETE /products

映射

字段类型

字符串类型： keyword（可以进行分词）、text（一段文本）

数字类型：integer、long

小数类型：double、float

布尔类型： boolean

日期类型： date

创建

# 1. 创建索引&映射
PUT /products 
{
    "settings": {
         "number_of_shards": 1, # 指定主分片的数量
         "number_of_replicas": 0 # 指定副本分片的数量
    },
    "mappings": {
        "properties":{
            "title":{
                "type": "keyword"
            },
            "price":{
                "type": "double"
            },
            "created_at":{
                "type": "date"
            }
        }
    }
}

查询

# 查询指定索引的映射关系
GET /products/_mapping

文档

创建

# 指定文档id
POST /product/_doc/1
{
    "title":"测试",
    "price":8999.88,
    "created_at":"2021-09-15",
    "desc":"描述测试",
}
# 自动生成文档id
POST /product/_doc/
{
    "title":"测试",
    "price":8999.88,
    "created_at":"2021-09-15",
    "desc":"描述测试",
}

查询

# 查询文档信息
GET [索引]/_doc/[文档id]
edge: GET /product/_doc/1

删除

# 删除文档
DELETE [索引]/_doc/[文档id]
edge: DELETE /product/_doc/1

更新

# 1. 先删除原始文档，在新增文档
PUT [索引]/_doc/[文档id]
edge： PUT /product/_doc/1
{
    [属性名]：[属性值],
    [属性名]：[属性值]
}

# 2. 基于指定字段进行更新(7.14.0版本)
POST [索引]/_doc/[文档id]/_update
edge： POST /product/_doc/1/_update
{
    "doc": {
         [属性名]：[属性值],
         [属性名]：[属性值]
    }
}
edge: 8.3.3版本
POST /products/_update/1/
{
  "doc": {
    "title":"更新测试"
  }
}

批量操作

批量新增两条数据

POST /products/_doc/_bulk
{"index": {"_id":"1"}}
    {"id":"1","title":"测试","price":8999.88,"created_at":"2021-09-15","desc":"描述测试"}
{"index": {"_id":"2"}} 
    {"id":"2","title":"测试","price":8999.88,"created_at":"2021-10-15","desc":"描述测试2"}

批量更新和删除操作

POST /products/_doc/_bulk
{"update": {"_id":"1"} }
    {"doc": {"title":"测试","price":8999.88,"created_at":"2021-09-15","desc":"描述测试"}}
{"delete": {"_id":"2"}}   
{"index":{}}
    {"title":"测试2","price":8999.88,"created_at":"2021-10-15", "desc":"描述测试2"}

说明:批量时不会因为一个失败而全部失败，而是继续执行后续操作，在返回时按照执行的状态返回！

高级查询Query DSL

说明

ES提供了一种强大的检索数据方式，这种检索方式称之为Query DSL<Domain Specified Language>,Query DSL是利用REST API传递JSON格式的请求体(Request Body)数据与ES进行交互

语法

# edge1:GET /索引名/_doc/_search {json格式请求体数据}
# edge2:GET /索引名/_search {json格式请求体数据}

常见检索

查询所有(match_all)

match_all关键字：返回索引中的全部文档

GET /products/_search
{
    "query": {
        "match_all": {}
    }
}

关键词查询(term)

term关键字：用来使用关键词查询

GET /products/_search
{
    "query": {
        "term": {
            "price": {
                "value": 4999
            }
        }
    }
}

NOTE1: ES中默认使用分词器为标准分词器（StandardAnaluzer），标准分词器对于英文单词分词，对于中文单字分词

keyword类型：不分词，搜索时需要使用全部内容进行搜索

text类型： text类型存储时,es会使用标准分词器，将中文单字分词，将英文进行单词分词; 搜索时中文需要使用单字搜索，英文需要使用单词搜索

NOTE2: keyword、integer、double、data、long、boolean or ip存储时不进行分词

范围查询(range)

range关键字：用来查询指定范围内的文档

GET /products/_search
{
    "query": {
        "range": {
            "price": {
                "gte": 1400,
                "lte": 9999
            }
        }
    }
}

前缀查询(prefix)

prefix关键字：用来检索含有指定前缀的关键词的相关文档

GET /product/_search
{
    "query": {
        "prefix": {
            "title": {
                "value": "ipho"
            }
        }
    }
}

统配符查询(wildcard)

wildcard关键字：通配符查询 *？用来匹配一个任意字符 用来匹配多个任意字符

GET /products/_search
{
    "query": {
        "wildcard": {
            "description": {
                "value": "iphon*"
            }
        }
    }
}

多id查询(ids)

ids关键字：值为数据类型，用来根据一组id获取多个对应的文档

GET /products/_search
{
    "query": {
        "ids": {
           "values": ["1","2"]
        }
    }
}

模糊查询(fuzzy)

fuzzy关键字：用来模糊查询含有指定关键的文档

GET /products/_search
{
    "query": {
        "fuzzy": {
            "description": "iphone"
        }
    }
}

注意： fuzzy 模糊查询最大模糊错误必须在0-2之间

搜索关键词长度为2不允许存在模糊
搜索关键词长度为3-5允许一次模糊
搜索关键词长度大于5允许最大2模糊

布尔查询(bool)

bool关键字：用来组合多个条件实现负责查询

must：相当于&&同时成立

should：相当于 || 成立一个就行

must_not: 相当于！不能满足任何一个

GET /products/_search
{
    "query": {
        "bool": {
            "must": [{
                "term": {
                    "price": {
                        "value": 4999
                    }
                }
            }]
        }
    }
}

多字段查询(mutil_match)

GET /products/_search
{
    "query": {
        "mutil_match": {
            "query": "iphone",
            "fields": ["title","description"]
        }
    }
}
注意：字段类型分词则将查询条件分词之后进行查询该字段；若该字段不分词就会将查询条件作为整体进行查询

默认字段分词查询(query_string)

GET /products/_search
{
    "query": {
        "query_string": {
            "query": "iphone",
            "default_field": "description"
        }
    }
}
注意：查询字段分词就将查询条件分词查询，查询字段不分词就将查询条件不分词查询

高亮查询(highlight)

高亮前提：搜索字段必须支持分词

highlight关键字：可以让符合条件的文档中的关键词高亮

GET /products/_search
{
    "query": {
        "term": {
            "description":{
                "value": "iphone"
            }
        }
    },
    "highlight": {
        "fields": {
            "*": {}
        }
    }
}

自定义高亮html标签：可以在highlight中使用pre_tags和post_tags

GET /products/_search
{
    "query": {
        "term": {
            "description":{
                "value": "iphone"
            }
        }
    },
    "highlight": {
        "fields": {
            "pre_tags": ["<span style='color:red'>"],
            "post_tags": ["</span>"],
            "*": {}
        }
    }
}

全文高亮

GET /products/_search
{
    "query": {
        "term": {
            "description":{
                "value": "iphone"
            }
        }
    },
    "highlight": {
        "fields": {
            "pre_tags": ["<span style='color:red'>"],
            "post_tags": ["</span>"],
            "require_field_match": "false",
            "*": {}
        }
    }
}

返回指定条数(size)

size关键字：指定查询结果中返回指定条数。默认返回10条

GET /products/_search
{
    "query": {
        "match_all": {
        }
    },
    "size": 5
}

分页查询(from)

from关键字：用来指定起始返回位置。和size关键字连用实现分页效果

GET /products/_search
{
    "query": {
        "match_all": {
        }
    },
    "size": 5，
    "from": 0
}

指定字段排序(sort)

GET /products/_search
{
    "query": {
        "match_all": {
        }
    },
    "sort": [
        {
            "price":{
                "order": "desc"
            }
        }
    ]
}

返回指定字段(_source)

_source关键字：是一个数组，在数组中用来指定展示哪些字段

GET /products/_search
{
    "query": {
        "match_all": {
        }
    },
    "_source": ["title","description"]
}

索引原理

倒排索引

倒排索引(Inverted Index)也叫反向索引，有反向索引必有正向索引。正向索引是通过key找value，反向索引时通过value找key。ES底层在检索时底层使用的就是倒排索引

索引模型

{
    "products":{
        "mappings":{
             "properties":{
                "title":{
                    "type": "keyword"
                },
                "price":{
                    "type": "double"
                },
                "created_at":{
                    "type": "date"
                }
           }
        }
    }
}

录入数据

_id	title	price	description
1	蓝月亮洗衣液	19.9	很好用
2	iphone13	19.9	很好用
3	小浣熊干吃面	1.5	很好吃的

在ES中除了text类型需要进行分词，其他类型不进行分词

索引构建过程

title 字段：

item	索引
蓝月亮洗衣液	1
iphone13	2
小浣熊干吃面	3

price字段：

item	索引
19.9	[1,2]
1.5	3

description字段：

item	索引
很	[1:1(出现次数):3(字符串长度),2:1(出现次数):3(字符串长度),3:1(出现次数):4(字符串长度)]
好	[1:1(出现次数):3(字符串长度),2:1(出现次数):3(字符串长度),3:1(出现次数):4(字符串长度)]
用	[1:1(出现次数):3(字符串长度),2:1(出现次数):3(字符串长度)]
吃	3:1(出现次数):4(字符串长度)
的	3:1(出现次数):4(字符串长度)

注意：ES分别为每个字段都建立了一个倒排索引。因此查询时查询字段的item，就能知道文档ID，就能快速找到文档信息。

分词器

Analysis和Analyzer

Analysis:文本分析是把全文本转换一系列单词(term/token)的过程，也叫分词(Analyzer)。Analysis是通过Analyzer实现的。 分词是将文档通过Analyzer分成一个一个的Term，每个Term都指向包含这个Term的文档。

Analyzer组成

注意：在ES中默认使用标准分词器:StandardAnalyzer 特点：中文单字分词英文单词分词

分析器(analyzer) 由三种构件组成的：character filters,tokenizer,token filters

character filters 字符过滤器
- 在一段文本进行分词之前，先进行预处理，比如最常见的就是，过滤html标签(hello:hello、&：and、I&you:I and you)
tokenizer 分词器
- 英文分词可以根据空格将单词分开，中文分词比较复杂，可以采用机器学习算法来分词
token filters Token过滤器
- 将切分的单词进行加工。 大小写转换，去掉停用词，加入同义词

注意：

三者顺序：character filters -> tokenizer -> token filters
三者个数：character filters(0个或多个) + tokenizer+token filters(0个或多个)

内置分词器

Standard Analyzer - 默认分词器，英文按单词切分，小写处理
Simple Analyzer - 按照单词切分(符号被过滤)，小写处理
Stop Analyzer - 小写处理，停用词过滤(the,a,is)
Whitespace Analyzer - 按空格切分，不转小写
Keyword Analyzer - 不分词，直接将输入当做输出

内置分词器测试

标准分词器
- 特点：按照单词分词英文统一转为小写过滤标点符号中文单字分词

POST /_analyze
{
    "analyzer":"standard",
    "text":"la la la,啦啦啦"
}

结果

{
  "tokens": [
    {
      "token": "la",
      "start_offset": 0,
      "end_offset": 2,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "la",
      "start_offset": 3,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "la",
      "start_offset": 6,
      "end_offset": 8,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "啦",
      "start_offset": 9,
      "end_offset": 10,
      "type": "<IDEOGRAPHIC>",
      "position": 3
    },
    {
      "token": "啦",
      "start_offset": 10,
      "end_offset": 11,
      "type": "<IDEOGRAPHIC>",
      "position": 4
    },
    {
      "token": "啦",
      "start_offset": 11,
      "end_offset": 12,
      "type": "<IDEOGRAPHIC>",
      "position": 5
    }
  ]
}

Simple分词器
- 特点：按照单词分词英文统一转为小写过滤标点符号中文按照空格分词

POST /_analyze
{
    "analyzer":"simple",
    "text":"la la la,啦啦啦"
}

结果

{
  "tokens": [
    {
      "token": "la",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "la",
      "start_offset": 3,
      "end_offset": 5,
      "type": "word",
      "position": 1
    },
    {
      "token": "la",
      "start_offset": 6,
      "end_offset": 8,
      "type": "word",
      "position": 2
    },
    {
      "token": "啦啦啦",
      "start_offset": 9,
      "end_offset": 12,
      "type": "word",
      "position": 3
    }
  ]
}

Whitespace分词器
- 特点：中英文按照按照空格分词英文不转为小写不去掉标点符号

POST /_analyze
{
    "analyzer":"whitespace",
    "text":"la la La , 啦啦啦"
}

结果

{
  "tokens": [
    {
      "token": "la",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "la",
      "start_offset": 3,
      "end_offset": 5,
      "type": "word",
      "position": 1
    },
    {
      "token": "La",
      "start_offset": 6,
      "end_offset": 8,
      "type": "word",
      "position": 2
    },
    {
      "token": ",",
      "start_offset": 9,
      "end_offset": 10,
      "type": "word",
      "position": 3
    },
    {
      "token": "啦啦啦",
      "start_offset": 11,
      "end_offset": 14,
      "type": "word",
      "position": 4
    }
  ]
}

创建索引时设置分词器

- PUT /索引名
{
    "settings": {
        "number_of_shards": 1, # 指定主分片的数量
        "number_of_replicas": 0 # 指定副本分片的数量
    }，
    "mappings": {
        "properties":{
            "title":{
                "type": "text",
                "analyzer":"whitespace"
            },
            "price":{
                "type": "double"
            },
            "created_at":{
                "type": "date"
            }
        }
    }
}

中文分词器

在ES中支持的中文分词器有许多，如smartCN、IK等，推荐使用IK分词器

安装IK分词器

开源分词器IK的github：https://github.com/medcl/elasticsearch-analysis-ik/releases

使用博客：https://cloud.tencent.com/developer/article/1622831

注意 IK分词器的版本需要和安装的ES版本一致
注意 Docker容器运行ES安装插件的目录为 /usr/share/elasticsearch/plugins

Linux安装

# 下载
wget https://objects.githubusercontent.com/github-production-release-asset-2e65be/2993595/41176576-21ed-4526-9e2a-8e729ce2d47e?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220918%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220918T053758Z&X-Amz-Expires=300&X-Amz-Signature=57fb00fd5d07673058797bf3b7e56bade52985d85db6eb31f50ca303de0e5845&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=2993595&response-content-disposition=attachment%3B%20filename%3Delasticsearch-analysis-ik-8.3.3.zip&response-content-type=application%2Foctet-stream
# 安装unzip命令
yum install -y unzip
# 将IK分词器压缩文件解压到elasticsearch安装目录的plugins目录下命名ik目录
pwd
/usr/share/elasticsearch/plugins
unzip elasticsearch-analysis-ik-8.3.3.zip -d ik
#删除源压缩文件
rm -rf elasticsearch-analysis-ik-8.3.3.zip
# 重启es服务
systemctl    restart  elasticsearch
# 查看es安装的插件
#es 命令查看插件列表
cd /usr/share/elasticsearch
./bin/elasticsearch-plugin list
ik

windows安装

1.将IK解析器解压到ES的plugins目录中
2.重启ES服务

kibana 查看es插件列表

# kibana  查看es插件列表
GET /_cat/plugins

IK使用

IK有两种颗粒度的拆分：

ik_smart: 会做最粗粒度的拆分( ik_smart算法会将"我爱你中国"分为 “我爱你” “中国”。)
ik_max_word: 会做最细粒度的拆分(ik_max_word算法会将"我爱你中国"分为 “我爱你” “爱你” “中国”。)

POST /_analyze
{
    "analyzer":"ik_smart",
    "text":"我爱你中国"
}

结果

{
  "tokens": [
    {
      "token": "我爱你",
      "start_offset": 0,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "中国",
      "start_offset": 3,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}

POST /_analyze
{
    "analyzer":"ik_max_word",
    "text":"我爱你中国"
}

结果

{
  "tokens": [
    {
      "token": "我爱你",
      "start_offset": 0,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "爱你",
      "start_offset": 1,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 1
    },
    {
      "token": "中国",
      "start_offset": 3,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 2
    }
  ]
}

IK中扩展词和禁用词

IK支持自定义扩展词典和停用词典

扩展词典：有些词并不是关键词，但是也希望被ES用来作为检索的关键词，可以将这些词加入扩展词典
停用词典：有些词是关键词，但是出于业务场景不想这些关键词被检索到，可以将这些词放入停用词典

定义扩展词典和停用词典可以修改IK分词器中config目录中IKAnalyzer.cfg.xml配置文件

#1. 修改配置文件
vim IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict">ext_dict.dic</entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords">文件位置</entry>
</properties>
# 2. 在IK分词器目录下config目录中创建ext_dict.dic文件，编码需要为UTF-8
vim ext_dict.dic 加入扩展词
# 3. 在IK分词器目录下config目录中创建ext_stop.dic文件，编码需要为UTF-8
vim ext_stop.dic 加入停用词
# 4. 重启ES服务

注意：词典的编码必须为UTF-8,否则不发生效！

过滤查询

过滤查询

过滤查询,ES的查询分为两种：查询(query)和过滤(filter)。查询就是上述的确有查询，默认会计算每个返回文档的得分，然后根据得分排序。而过滤(filter)只会筛选出符合的文档，并不计算得分，而且可以缓存文档。所以，只从性能考虑，过滤比查询快。

过滤适合在大范围筛选数据，而查询则适合精确匹配数据。一般应用是，应先使用过滤操作过滤数据，然后使用查询匹配数据。

使用

GET /test/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "match_all": {} // 查询条件
                }
            ],
            "filter": {}  //过滤条件
        }
    }
}

注意：有了Filter过滤时，先进行过滤，后进行精准匹配；且过滤查询时必须使用布尔检索

过滤类型

常见的过滤类型： term，terms，ranage，exists，ids等

GET /test/_search 
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "name": {
              "value": "VALUE"
            }
          }
        }
      ],
      "filter": [
        {
          "ids": {
            "values": [
              "ID"
            ]
          }
        }
      ]
    }
  }
}

SpringBoot整合ES

引入依赖

 <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

配置客户端

@Configuration
public class RestConfigClient2 extends AbstractElasticsearchConfiguration {


    @Bean
    @Override
    public RestHighLevelClient elasticsearchClient() {

        ClientConfiguration configuration = ClientConfiguration
                .builder()
                .connectedTo("localhost:9200")
                .build();
        return RestClients.create(configuration).rest();
    }
}

客户端对象

ElasticsearchOperations(主要以对象为主，面向对象思路)
RestHighLevelClient(使用restful风格来操作es)

ElasticsearchOperations

特点：始终使用面向对象的方式操作ES

相关注解

# @Document注解，作用于类上，代表一个对象为一个文档
## indexName属性：对应索引名称
## createIndex属性：是否创建索引

# @Id注解，作用于属性上，将对象id字段与ES中文档_id对应

# @Field注解，作用于类上，代表一个对象为一个文档
## type属性：指定字段类型
## analyzer属性：指定分词器

@Document(indexName = "product", createIndex = true)
public class Product {

    @Id
    private Integer id;

    @Field(type = FieldType.Keyword)
    private String title;

    @Field(type = FieldType.Double)
    private Double price;

    @Field(type = FieldType.Text)
    private String desc;


    public Integer getId() {
        return id;
    }

    public void setId(Integer id) {
        this.id = id;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public Double getPrice() {
        return price;
    }

    public void setPrice(Double price) {
        this.price = price;
    }

    public String getDesc() {
        return desc;
    }

    public void setDesc(String desc) {
        this.desc = desc;
    }
}

测试

注意：spring官方现在尚不支持8.0版本，可以正常存入，当时解析请求结果时，会出错

package com.example;

import com.example.pojo.Product;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.data.elasticsearch.core.ElasticsearchOperations;
import org.springframework.data.elasticsearch.core.SearchHit;
import org.springframework.data.elasticsearch.core.SearchHits;
import org.springframework.data.elasticsearch.core.query.Query;

import java.util.Iterator;

@SpringBootTest
class SpringbootEsApplicationTests {

    @Autowired
    private ElasticsearchOperations elasticsearchOperations;

    /**
     * 当ID不存在时，添加文档
     * 当ID存在时，更新文档
     */
    @Test
    void contextLoads() {
        Product product = new Product();

        product.setDesc("你好中国,我爱你中国");
        product.setPrice(12.2);
        product.setId(2);
        product.setTitle("中国1");
        Product p = elasticsearchOperations.save(product);
        System.out.println();
    }

    /**
     * 查询文档信息
     */
    @Test
    public void testSearch() {
        Product product = elasticsearchOperations.get("1", Product.class);
        System.out.println(product);
    }

    /**
     * 删除文档信息
     */
    @Test
    void contextLoads1() {
        Product product = new Product();

        product.setId(2);
        product.setTitle("中国1");
        elasticsearchOperations.delete(product);
    }

    /**
     * 删除文档信息
     */
    @Test
    void testDeleteAll() {
        elasticsearchOperations.delete(Query.findAll(), Product.class);
    }

    @Test
    void testFindAll() {
        SearchHits<Product> search = elasticsearchOperations.search(Query.findAll(), Product.class);

        Iterator<SearchHit<Product>> iterator = search.stream().iterator();

        while (iterator.hasNext()) {
            SearchHit<Product> next = iterator.next();
            System.out.println(next);
        }
    }
}

RestHighLevelClient

索引、映射

 /**
     * 创建索引和映射
     */
    @Test
    public void testIndexAndMapping() {
        try {
            CreateIndexRequest createIndexRequest = new CreateIndexRequest("test_002");

            createIndexRequest.mapping("{n" +
                    "        "properties":{n" +
                    "            "title":{n" +
                    "                "type": "keyword"n" +
                    "            },n" +
                    "            "price":{n" +
                    "                "type": "double"n" +
                    "            },n" +
                    "            "created_at":{n" +
                    "                "type": "date"n" +
                    "            }n" +
                    "        }n" +
                    "    }", XContentType.JSON);
            CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(createIndexRequest, RequestOptions.DEFAULT);
            System.out.println("创建状态：" + createIndexResponse.isAcknowledged());
            // 关闭资源
            restHighLevelClient.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * 删除索引
     */
    @Test
    public void testDeleteIndex() {

        try {
            AcknowledgedResponse acknowledgedResponse = restHighLevelClient.indices().delete(new DeleteIndexRequest("product"), RequestOptions.DEFAULT);
            System.out.println(acknowledgedResponse.isAcknowledged());
            restHighLevelClient.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

索引、文档

package com.example;

import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;

import java.io.IOException;
import java.util.Calendar;
import java.util.Date;
import java.util.HashMap;
import java.util.TimeZone;

public class RestHighLevelClientForDocTest extends SpringbootEsApplicationTests {

    private final RestHighLevelClient restHighLevelClient;

    @Autowired
    public RestHighLevelClientForDocTest(RestHighLevelClient restHighLevelClient) {
        this.restHighLevelClient = restHighLevelClient;
    }

    /**
     * 创建文档
     */
    @Test
    public void testCreateDoc() {
        IndexResponse response = null;
        try {
            IndexRequest request = new IndexRequest("test_002");
            HashMap<String, Object> paramMap = new HashMap<>();

            paramMap.put("created_at", new Date());
            paramMap.put("price", 1.52);
            paramMap.put("title", "葵花籽");
            request.id("3")
                    .source(paramMap);
            response = restHighLevelClient.index(request, RequestOptions.DEFAULT);
            System.out.println(response.toString());
            System.out.println(response.status());
            restHighLevelClient.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * 更新文档
     */
    @Test
    public void testUpdateDoc() {
        UpdateResponse response = null;
        try {
            UpdateRequest request = new UpdateRequest("test_002", "2");

            Calendar now = Calendar.getInstance();
            now.setTimeZone(TimeZone.getTimeZone("Asia/Shanghai"));//important

            Date time = now.getTime();

            HashMap<String, Object> paramMap = new HashMap<>();

            paramMap.put("created_at", time);
            paramMap.put("price", 5.52);
            paramMap.put("title", "焦糖味葵花籽");
            request.doc(paramMap);
            response = restHighLevelClient.update(request, RequestOptions.DEFAULT);
            System.out.println(response.toString());
            System.out.println(response.status());
            restHighLevelClient.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * 通过id获取文档信息
     */
    @Test
    public void testGetDocById() {
        GetRequest getRequest = new GetRequest("test_002", "2");
        GetResponse getResponse = null;
        try {
            getResponse = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);
            String id = getResponse.getId();
            String sourceAsString = getResponse.getSourceAsString();
            restHighLevelClient.close();
            System.out.println(id);
            System.out.println(sourceAsString);
        } catch (IOException e) {
            e.printStackTrace();
        }


    }

    /**
     * 删除文档信息
     */
    @Test
    public void testDelDoc() {
        try {
            DeleteRequest deleteRequest = new DeleteRequest("test_002", "2");
            DeleteResponse response = restHighLevelClient.delete(deleteRequest, RequestOptions.DEFAULT);
            System.out.println(response.status());
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

索引、文档查询

package com.example;

import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.elasticsearch.search.sort.SortOrder;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;

import java.io.IOException;
import java.util.Map;

public class RestHighLevelClientForDocSearchTest extends SpringbootEsApplicationTests {

    private final RestHighLevelClient restHighLevelClient;

    @Autowired
    public RestHighLevelClientForDocSearchTest(RestHighLevelClient restHighLevelClient) {
        this.restHighLevelClient = restHighLevelClient;
    }

    /**
     * 查询所有
     */
    @Test
    public void testSearch() {

        SearchRequest searchRequest = new SearchRequest("test_002");
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        sourceBuilder.query(QueryBuilders.matchAllQuery());
        searchRequest.source(sourceBuilder);
        SearchResponse response = null;
        try {
            response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
            SearchHit[] hits = response.getHits().getHits();
            // 结果
            for (SearchHit hit : hits) {
                System.out.println(hit.getId() + ":" + hit.getSourceAsString());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * 分页查询、排序、返回指定字段、高亮、过滤查询
     */
    @Test
    public void testSearchByPageLimit() {

        SearchRequest searchRequest = new SearchRequest("test_002");
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder
                .requireFieldMatch(false)
                .field("title")
                .field("price")
                .preTags("<span style='color:red'>")
                .postTags("</span>");
        sourceBuilder.query(QueryBuilders.termQuery("title", "花"))
                .from(0)
                .size(10).sort("price", SortOrder.DESC)
                .fetchSource(new String[]{"title"}, new String[]{})
                .highlighter(highlightBuilder)
                .postFilter(QueryBuilders.rangeQuery("price").gt(1.5).lt(1.56))
        ;
        searchRequest.source(sourceBuilder);
        SearchResponse response = null;
        try {
            response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
            SearchHit[] hits = response.getHits().getHits();
            // 结果
            for (SearchHit hit : hits) {
                System.out.println(hit.getId() + ":" + hit.getSourceAsString());
                Map<String, HighlightField> highlightFields = hit.getHighlightFields();
                highlightFields.forEach((k, v) -> {
                    System.out.println(k + ":" + v.getFragments()[0]);
                });
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * 查询所有 term 关键词查询
     */
    @Test
    public void testQuery() {
        // term 查询
        query(QueryBuilders.termQuery("title", "葵花籽"));
        // id 查询
        query(QueryBuilders.idsQuery().addIds("2"));
        // range 查询
        query(QueryBuilders.rangeQuery("price").gt(1.5).lt(1.6));
        // prefix 查询
        query(QueryBuilders.prefixQuery("title", "葵花"));
    }

    public void query(QueryBuilder queryBuilder) {
        SearchRequest searchRequest = new SearchRequest("test_002");
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        sourceBuilder.query(queryBuilder);
        searchRequest.source(sourceBuilder);
        SearchResponse response = null;
        try {
            response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
            SearchHit[] hits = response.getHits().getHits();
            // 结果
            for (SearchHit hit : hits) {
                System.out.println(hit.getId() + ":" + hit.getSourceAsString());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

聚合查询

简介

聚合：聚合(Aggregation)是es除搜索功能外提供的针对es数据做统计分析的功能。聚合有助于根据搜索查询提供聚合数据。聚合查询是数据库中重要的功能特性，ES作为搜索引擎兼数据库，同样提供了强大的聚合分析能力。ES主要基于查询条件来对数据进行分桶、计算的方法。有点类似于SQL中的group by 再加一些聚合函数方法的操作。

注意事项：text类型是不支持聚合的。

# 构建索引和映射
PUT /fruit
{
  "mappings": {
    "properties": {
      "title": {
        "type": "keyword"
      },
      "price": {
        "type": "double"
      },
      "desc": {
        "type": "text",
        "analyzer": "ik_max_word"
      }
    }
  }
}
# 录入数据
PUT /fruit/_bulk
{"index": {}}
    {"title":"面包","price":8999.88,"desc":"面包真好吃"}
{"index": {}} 
    {"title":"忘崽牛奶","price":456.88,"desc":"忘崽牛奶真好喝，一盒牛奶换一崽"}
{"index": {}}
    {"title":"小辣条","price":4.88,"desc":"小辣条真好吃，就是有点辣嗓子"}
{"index": {}} 
    {"title":"透心凉","price":456.78,"desc":"透心凉，心飞扬"}

根据字段分组

GET /fruit/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "price_group": {
      "terms": {
        "field": "price",
        "size": 10
      }
    }
  }
}

最大值

GET /fruit/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "max_group": {
      "max": {
        "field": "price"
      }
    }
  }
}

最小值

GET /fruit/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "min_group": {
      "min": {
        "field": "price"
      }
    }
  }
}

平均值

# 平均值
GET /fruit/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "price"
      }
    }
  }
}

整合应用

package com.example;

import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.terms.ParsedDoubleTerms;
import org.elasticsearch.search.aggregations.bucket.terms.Terms;
import org.elasticsearch.search.aggregations.metrics.ParsedSum;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;

import java.io.IOException;
import java.util.List;

public class RestHighLevelClientForAggreationTest extends SpringbootEsApplicationTests {

    private final RestHighLevelClient restHighLevelClient;

    @Autowired
    public RestHighLevelClientForAggreationTest(RestHighLevelClient restHighLevelClient) {
        this.restHighLevelClient = restHighLevelClient;
    }

    /**
     * 按照字段进行分组
     */
    @Test
    public void testAggregationByGroup() {

        SearchRequest searchRequest = new SearchRequest("fruit");
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        sourceBuilder.query(QueryBuilders.matchAllQuery()).aggregation(AggregationBuilders.terms("price_group").field("price"));
        searchRequest.source(sourceBuilder);
        SearchResponse response = null;
        try {
            response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

            ParsedDoubleTerms parsedDoubleTerms = response.getAggregations().get("price_group");

            List<? extends Terms.Bucket> buckets = parsedDoubleTerms.getBuckets();
            for (Terms.Bucket bucket : buckets) {
                System.out.println(bucket.getKeyAsString() + ":" + bucket.getDocCount());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * 求和
     */
    @Test
    public void testAggregationBySum() {

        SearchRequest searchRequest = new SearchRequest("fruit");
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        sourceBuilder.query(QueryBuilders.matchAllQuery()).aggregation(AggregationBuilders.sum("sum_group").field("price"));
        searchRequest.source(sourceBuilder);
        SearchResponse response = null;
        try {
            response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

            ParsedSum parsedSum = response.getAggregations().get("sum_group");
            System.out.println(parsedSum.getValue());
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

最后

以上就是漂亮滑板最近收集整理的关于ElasticSearch学习记录ElasticSearch-分布式搜索引擎的全部内容，更多相关ElasticSearch学习记录ElasticSearch-分布式搜索引擎内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：elasticsearch
浏览次数：65 次浏览
发布日期：2024-08-15 15:40:02
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_7_o_14_fw_13_z_22_4.html

ElasticSearch学习记录ElasticSearch-分布式搜索引擎

ElasticSearch-分布式搜索引擎

文章目录

全文检索

简介

安装

传统方式安装

开启远程访问

docker方式安装

kibana

传统方式安装

docker方式安装

compose方式安装

核心概念索引映射文档

索引

映射

文档

基本操作

索引

映射

文档

高级查询Query DSL

说明

语法

常见检索

索引原理

倒排索引

索引模型

分词器

过滤查询

SpringBoot整合ES

ElasticsearchOperations

RestHighLevelClient

聚合查询

整合应用

最后

评论列表共有 0 条评论

发表评论取消回复

ElasticSearch学习记录ElasticSearch-分布式搜索引擎

ElasticSearch-分布式搜索引擎

文章目录

全文检索

简介

安装

传统方式安装

开启远程访问

docker方式安装

kibana

传统方式安装

docker方式安装

compose方式安装

核心概念 索引 映射 文档

索引

映射

文档

基本操作

索引

映射

文档

高级查询Query DSL

说明

语法

常见检索

索引原理

倒排索引

索引模型

分词器

过滤查询

SpringBoot整合ES

ElasticsearchOperations

RestHighLevelClient

聚合查询

整合应用

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

核心概念索引映射文档

发表评论取消回复