ElasticSearch学习笔记

198 阅读 0 评论 131 点赞

我是靠谱客的博主舒心滑板，这篇文章主要介绍ElasticSearch学习笔记，现在分享给大家，希望可以做个参考。

1. ElasticSearch概述

2. ES与Solr的差别

2.1. Solr简介

2.2. Lucene简介

2.3. ES VS Solr

3. ElasticSearch 安装

4. Kibana安装

5. ES核心概念

6. IK分词器

7. Restful风格说明

8. 关于文档的基本操作

9. 集成SpringBoot

10. 实战：模拟全文搜索-京东搜索

1. ElasticSearch概述

2. ES与Solr的差别

2.1. Solr简介

2.2. Lucene简介

2.3. ES VS Solr

3. ElasticSearch 安装

官网

ElasticSearch: https://mirrors.huaweicloud.com/elasticsearch/?C=N&O=D
logstash: https://mirrors.huaweicloud.com/logstash/?C=N&O=D
kibana: https://mirrors.huaweicloud.com/kibana/?C=N&O=D

认识目录

测试访问

这个，没有测试，等后期回来再看！head的插件

https://blog.csdn.net/weixin_43824233/article/details/109552172

4. Kibana安装

开箱即用

配置文件

server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://192.168.1.30:9201"]
kibana.index: ".kibana"
i18n.locale: "zh-CN"			# 中文汉化

访问测试

5. ES核心概念

索引
字段类型（mapping）
文档（document）

6. IK分词器

下载链接
解压放入到es对应的plugins下即可
重启观察ES，发现ik插件被加载了

也可以通过bin目录下elasticsearch-plugin list 查看已经加载的插件

使用kibana测试

ik_smart: 最少切分

ik_max_word为最细粒度划分！穷尽词库的可能，字典！

ik分词器增加自己的配置！---注意文件编码一致utf-8

重启ES 和 Kibana

7. Restful风格说明

基础测试

创建一个索引！

PUT /索引名/~类型名~/文档id
{请求体}

# PUT 创建命令  test1 索引 type1 类型 1 id
PUT test1/type1/1
{
  "name": "xiaofan",
  "age": 28
}

# 返回结果
# 警告信息： 不支持在文档索引请求中的指定类型
# 而是使用无类型的断点(/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "test1",	# 索引
  "_type" : "type1",	# 类型（已经废弃）
  "_id" : "1",			# id
  "_version" : 1,		# 版本
  "result" : "created",	# 操作类型
  "_shards" : {			# 分片信息
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

指定字段的类型（创建规则）

获取具体的索引规则

# GET test2

{
  "test2" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "integer"
        },
        "birthday" : {
          "type" : "date"
        },
        "name" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1599708623941",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "ANWnhwArSMSl8k8iipgH1Q",
        "version" : {
          "created" : "7080099"
        },
        "provided_name" : "test2"
      }
    }
  }
}

# 查看默认的规则
PUT /test3/_doc/1
{
  "name": "狂神说Java",
  "age": 28,
  "birthday": "1997-01-05"
}

# GET test3

{
  "test3" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "birthday" : {
          "type" : "date"
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1599708906181",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "LzPLCDgeQn6tdKo3xBBpbw",
        "version" : {
          "created" : "7080099"
        },
        "provided_name" : "test3"
      }
    }
  }
}

修改索引 POST

# 只会修改指定项，其他内容保证不变
POST /test3/_doc/1/_update
{
  "doc": {
    "name":"暴徒狂神"
  }
}

# GET test3/_doc/1

{
  "_index" : "test3",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "暴徒狂神",
    "age" : 28,
    "birthday" : "1997-01-05"
  }
}

8. 关于文档的基本操作

基本操作（简单的查询）

put /kuangshen/user/1
{
  "name": "狂神说",
  "age": 23,
  "desc": "一顿操作猛如虎，一看工资2500",
  "tags": ["码农", "技术宅", "直男"]
}

put /kuangshen/user/2
{
  "name": "张三",
  "age": 28,
  "desc": "法外狂徒",
  "tags": ["旅游", "渣男", "交友"]
}

put /kuangshen/user/3
{
  "name": "李四",
  "age": 30,
  "desc": "不知道怎么描述",
  "tags": ["旅游", "靓女", "唱歌"]
}

GET kuangshen/user/1


GET kuangshen/user/_search?q=name:狂神

复杂操作(排序、分页、高亮、模糊查询、标准查询！)

# 模糊查询
GET kuangshen/user/_search
{
  "query": {
    "match": {
      "name": "狂神"
    }
  }
}

# 对查询结果进行字段过滤
GET kuangshen/user/_search
{
  "query": {
    "match": {
      "name": "狂神"
    }
  },
  "_source": ["name", "desc"]
}

# 排序
GET kuangshen/user/_search
{
  "query": {
    "match": {
      "name": "狂神"
    }
  },
  "sort":[{
    "age": "asc"
  }]
}

# 分页
GET kuangshen/user/_search
{
  "query": {
    "match": {
      "name": "狂神"
    }
  },
  "sort":[{
    "age": "asc"
  }], 
  "from": 0,
  "size": 2
}

布尔值条件查询

# 多条件查询 must 相当于and
GET kuangshen/user/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "狂神"
        }},
        {"match": {
          "age": 23
        }}
      ]
    }
  }
}

# 多条件查询 should 相当于or
GET kuangshen/user/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {
          "name": "狂神说"
        }},
        {"match": {
          "age": 25
        }}
      ]
    }
  }
}

# 多条件查询 must_not 相当于 not
GET kuangshen/user/_search
{
  "query": {
    "bool": {
      "must_not": [
        {"match": {
          "age": 25
        }}
      ]
    }
  }
}


# 过滤查询1 age > 24
GET kuangshen/user/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "狂神"
        }}
      ],
      "filter": [
        {"range": {
          "age": {
            "gt": 24
          }
        }}
      ]
    }
  }
}

# 过滤器2  22<age<30 
GET kuangshen/user/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "狂神"
        }}
      ],
      "filter": [
        {"range": {
          "age": {
            "lt": 30,
            "gt": 22
          }
        }}
      ]
    }
  }
}

多条件查询

GET kuangshen/user/_search
{
  "query": {
    "match": {
      "tags": "技术 男"
    }
  }
}

keyword类型不会被分词器解析

term: 精确匹配

# 定义类型
PUT xiaofan_test_db
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "desc": {
        "type": "keyword"
      }
    }
  }
}


PUT /xiaofan_test_db/_doc/1
{
  "name": "小范说Java Name",
  "desc": "小范说Java Desc"
}

PUT /xiaofan_test_db/_doc/2
{
  "name": "小范说Java Name",
  "desc": "小范说Java Desc 2"
}

# 按照keyword类型精准匹配
GET xiaofan_test_db/_search
{
  "query": {
    "term": {
      "desc": "小范说Java Desc"
    }
  }
}
# 结果：
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.6931471,
    "hits" : [
      {
        "_index" : "test_db",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.6931471,
        "_source" : {
          "name" : "小范说Java Name",
          "desc" : "小范说Java Desc"
        }
      }
    ]
  }
}

# 按照text类型匹配
GET xiaofan_test_db/_search
{
  "query": {
    "term": {
      "name": "小"
    }
  }
}

# 结果：
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.18232156,
    "hits" : [
      {
        "_index" : "test_db",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.18232156,
        "_source" : {
          "name" : "小范说Java Name",
          "desc" : "小范说Java Desc"
        }
      },
      {
        "_index" : "test_db",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.18232156,
        "_source" : {
          "name" : "小范说Java Name",
          "desc" : "小范说Java Desc 2"
        }
      }
    ]
  }
}

多个值匹配精确查询

PUT /test_db/_doc/3
{
  "t1": "22",
  "t2": "2020-09-10"
}

PUT /test_db/_doc/4
{
  "t1": "33",
  "t2": "2020-09-11"
}

GET test_db/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "t1": "22"
          }
        },
         {
          "term": {
            "t1": "33"
          }
        }
      ]
    }
  }
}

高亮查询

GET kuangshen/user/_search
{
  "query": {
    "match": {
      "name": "狂神"
    }
  },
  "highlight": {
    "pre_tags": "<p class='key' style='color:red'>",
    "post_tags": "</p>", 
    "fields": {
      "name": {}
    }
  }
}

# 结果显示：
#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.3862942,
    "hits" : [
      {
        "_index" : "kuangshen",
        "_type" : "user",
        "_id" : "1",
        "_score" : 1.3862942,
        "_source" : {
          "name" : "狂神说",
          "age" : 23,
          "desc" : "一顿操作猛如虎，一看工资2500",
          "tags" : [
            "码农",
            "技术宅",
            "直男"
          ]
        },
        "highlight" : {
          "name" : [
            "<p class='key' style='color:red'>狂</p><p class='key' style='color:red'>神</p>说"
          ]
        }
      },
      {
        "_index" : "kuangshen",
        "_type" : "user",
        "_id" : "4",
        "_score" : 1.0892314,
        "_source" : {
          "name" : "狂神说前端",
          "age" : 25,
          "desc" : "大王叫我来巡山",
          "tags" : [
            "码农1",
            "技术宅1",
            "直男1"
          ]
        },
        "highlight" : {
          "name" : [
            "<p class='key' style='color:red'>狂</p><p class='key' style='color:red'>神</p>说前端"
          ]
        }
      }
    ]
  }
}

9. 集成SpringBoot

官网

添加依赖

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

自定义配置

package com.xiaofan.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ElasticSearchClientConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient() {
        RestHighLevelClient client = new RestHighLevelClient(
            RestClient.builder(
                new HttpHost("192.168.1.30", 9201, "http")
            )
        );

        return client;
    }


}

编写测试类

package com.xiaofan;

import com.alibaba.fastjson.JSON;
import com.xiaofan.pojo.User;
import org.apache.http.entity.ContentType;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;
import java.util.ArrayList;
import java.util.concurrent.TimeUnit;

@SpringBootTest
class EsApiApplicationTests {

	public static final String INDEX = "xiaofan_test_index";

	@Autowired
	@Qualifier(value = "restHighLevelClient")
	private RestHighLevelClient client;

	// 创建索引
	@Test
	void testCreateIndex() throws IOException {
		// 1. 创建索引请求
		CreateIndexRequest request = new CreateIndexRequest(INDEX);
		// 2. 客户端执行请求， IndicesClient，请求后获得响应
		CreateIndexResponse createIndexResponse = client.indices().create(request, RequestOptions.DEFAULT);
		System.out.println(createIndexResponse);
	}

	// 测试索引存在
	@Test
	void testExistsIndex() throws IOException {
		GetIndexRequest request = new GetIndexRequest(INDEX);
		boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
		System.out.println(exists);
	}

	// 删除索引
	@Test
	void testDeleteIndex() throws IOException {
		DeleteIndexRequest request = new DeleteIndexRequest(INDEX);
		AcknowledgedResponse acknowledgedResponse = client.indices().delete(request, RequestOptions.DEFAULT);
		System.out.println(acknowledgedResponse.isAcknowledged());
	}

	// 添加文档
	@Test
	void testAddDocument() throws IOException {
		User user = new User("狂神说", 28);
		IndexRequest request = new IndexRequest(INDEX);
		// 规则 PUT /index/_doc/1
		request.id("1");
		request.timeout(TimeValue.timeValueSeconds(1));
		// 将数据放入请求 json
		request.source(JSON.toJSONString(user), XContentType.JSON);
		IndexResponse response = client.index(request, RequestOptions.DEFAULT);
		System.out.println(response.toString());
		System.out.println(response.status());
	}

	// 获取文档 判断是否存在 GET /index/_doc/1
	@Test
	void testIsExists() throws IOException {
		GetRequest request = new GetRequest(INDEX, "1");
		// 不获取返回的 _source 的上下文了
		request.fetchSourceContext(new FetchSourceContext(false));
		request.storedFields("_none_");

		boolean exists = client.exists(request, RequestOptions.DEFAULT);
		System.out.println(exists);
	}

	// 获取文档

	/**
	 * 返回结果：
	 * {"age":28,"name":"狂神说"}
	 * {"_index":"xiaofan_test_index","_type":"_doc","_id":"1","_version":1,"_seq_no":0,"_primary_term":1,"found":true,"_source":{"age":28,"name":"狂神说"}}
	 */
	@Test
	void testGetDocument() throws IOException {
		GetRequest request = new GetRequest(INDEX, "1");
		GetResponse response = client.get(request, RequestOptions.DEFAULT);
		System.out.println(response.getSourceAsString());
		System.out.println(response);
	}

	// 更新文档
	@Test
	void testUpdateDocument() throws IOException {
		UpdateRequest request = new UpdateRequest(INDEX, "1");
		request.timeout("1s");

		User user = new User("小范说Java", 18);
		request.doc(JSON.toJSONString(user), XContentType.JSON);

		UpdateResponse updateResponse = client.update(request, RequestOptions.DEFAULT);
		System.out.println(updateResponse);
	}

	// 删除文档
	@Test
	void testDeleteDocument() throws IOException {
		DeleteRequest request = new DeleteRequest(INDEX, "1");
		request.timeout("1s");

		DeleteResponse deleteResponse = client.delete(request, RequestOptions.DEFAULT);
		System.out.println(deleteResponse);

	}

	// 批量插入数据（修改，删除类似操作）
	@Test
	void testBulkRequest() throws IOException {
		BulkRequest request = new BulkRequest();
		request.timeout("10s");

		ArrayList<User> users = new ArrayList<>();
		users.add(new User("kuangshen1", 21));
		users.add(new User("kuangshen2", 22));
		users.add(new User("kuangshen3", 23));
		users.add(new User("xiaofan1", 18));
		users.add(new User("xiaofan2", 19));

		// 批处理请求， 修改，删除，只要在这里修改相应的请求就可以
		for (int i = 0; i < users.size(); i++) {
			request.add(new IndexRequest(INDEX)
					.id(String.valueOf(i + 1))
					.source(JSON.toJSONString(users.get(i)), XContentType.JSON));
		}

		BulkResponse bulkResponse = client.bulk(request, RequestOptions.DEFAULT);
		//是否失败，返回false表示成功
		System.out.println(bulkResponse.hasFailures());
	}

	// 查询文档
	@Test
	void testSearch() throws IOException {
		SearchRequest searchRequest = new SearchRequest(INDEX);
		// 构建搜索条件
		SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

		// 查询条件， 可以使用QueryBuilders工具类实现
		// QueryBuilders.termQuery 精确
		// QueryBuilders.matchLLQuery() 匹配所有
		TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "kuangshen1");
		// MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
		sourceBuilder.query(termQueryBuilder);
		sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

		searchRequest.source(sourceBuilder);

		SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
		System.out.println(JSON.toJSON(searchResponse.getHits()));
		System.out.println("======================================");
		for (SearchHit documentFields : searchResponse.getHits().getHits()) {
			System.out.println(documentFields.getSourceAsMap());
		}

	}

}

10. 实战：模拟全文搜索-京东搜索

首先明确数据从哪里来

数据库中获取
消息队列中获取
爬虫
......

爬取数据 : 获取请求返回的页面信息, 筛选出屋面想要的信息就可以了!

导入依赖 JSoup, 解析网页 ==> 爬电影, 音乐, 用tika

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.4.1</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.newer</groupId>
    <artifactId>xxh-es-jd</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>xxh-es-jd</name>
    <description>Demo project for Spring Boot</description>

    <properties>
        <java.version>1.8</java.version>
    </properties>

    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.10.2</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.60</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-thymeleaf</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-devtools</artifactId>
            <scope>runtime</scope>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-configuration-processor</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <excludes>
                        <exclude>
                            <groupId>org.projectlombok</groupId>
                            <artifactId>lombok</artifactId>
                        </exclude>
                    </excludes>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

测试使用爬虫

package com.wang.wangesjd.utils;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.net.URL;

public class HtmlParseUtil {
    public static void main(String[] args) throws IOException {
        //获取请求 https://search.jd.com/Search?keyword=java
        //前提: 需要联网, 而且不能获取到AJAX!
        String url = "https://search.jd.com/Search?keyword=java";

        //设置超时时间 30S
        int timeOut = 30000;

        //解析网页 ==> Document就是浏览器的Document对象
        Document document = Jsoup.parse(new URL(url), timeOut);
        //所有你在JS中可以使用的方法, 这里都能用!
        Element element = document.getElementById("J_goodsList");
//        System.out.println(element.html());
        //获取所有的li元素
        Elements elements = element.getElementsByTag("li");
        //获取元素中的内容, 这里的el就是每一个li标签了
        for (Element el : elements) {
            //关于这种图片特别多的网站, 所有的图片都是延迟加载的!
            //JD 放在了这个class data-lazy-img
            String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");
            String price = el.getElementsByClass("p-price").eq(0).text();
            String title = el.getElementsByClass("p-name").eq(0).text();

            System.out.println("===================================================");
            System.out.println(img);
            System.out.println(price);
            System.out.println(title);
        }

    }
}

注意：JD 貌似图片使用了反爬虫技术, 要获取的属性名和我们在前端调试时看到的不一样...

提取工具类

package com.wang.wangesjd.utils;

import com.wang.wangesjd.pojo.Content;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.net.URL;
import java.net.URLEncoder;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.List;

public class HtmlParseUtil {
    public static void main(String[] args) throws IOException {
        //URL会将符号转义!
//        HtmlParseUtil.parseJD("C%2B%2B").forEach(System.out::println);
        //查询中文需要URL转码
//        HtmlParseUtil.parseJD("心理学").forEach(System.out::println);
        HtmlParseUtil.parseJD("C++").forEach(System.out::println);
    }

    public static List<Content> parseJD(String keywords) throws IOException {
        //URL会对符号和汉字转码
        //要先转码再拼接, 否则URL无法解析 (因为会将url中的符号也一起转码, 无法识别)
        String urlKeywords = URLEncoder.encode(keywords, "UTF-8");

        //获取请求 https://search.jd.com/Search?keyword=java
        //前提: 需要联网, 而且不能获取到AJAX!
        String url ="https://search.jd.com/Search?keyword=" + urlKeywords + "&enc=utf-8";



        //设置超时时间 30S
        int timeOut = 30000;

        //解析网页 ==> Document就是浏览器的Document对象
        Document document = Jsoup.parse(new URL(url), timeOut);
        //所有你在JS中可以使用的方法, 这里都能用!
        Element element = document.getElementById("J_goodsList");
        //获取所有的li元素
        Elements elements = element.getElementsByTag("li");

        List<Content> goodsList = new ArrayList<>();
        //获取元素中的内容, 这里的el就是每一个li标签了
        for (Element el : elements) {
            //关于这种图片特别多的网站, 所有的图片都是延迟加载的!
            //JD 放在了这个class data-lazy-img
            String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");
            String price = el.getElementsByClass("p-price").eq(0).text();
            String title = el.getElementsByClass("p-name").eq(0).text();

            Content content = new Content();
            content.setImg(img)
                    .setPrice(price)
                    .setTitle(title);
            goodsList.add(content);
        }
        return goodsList;
    }
}

注意

URL解析时会转义符号和中文, 因此如果我们想传递中文或者符号的关键字, 需要先转义
不能将拼接后的url转义, 这样会导致URL中正常的符号也被转义, 导致无法识别, 正确的做法是先将被拼接的转义, 再拼接即可

编写实体类和业务层

package com.wang.wangesjd.pojo;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import lombok.experimental.Accessors;

@Data
@AllArgsConstructor
@NoArgsConstructor
@Accessors(chain = true)
public class Content {
    private String img;
    private String price;
    private String title;
}

业务层：这里有个小坑 ==> SpringBoot接管类, 如果是静态方法, 使用自动装载无法使用静态方法

package com.newer.service;

import com.alibaba.fastjson.JSON;
import com.newer.pojo.Content;
import com.newer.utils.HtmlParseUtil;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.MatchQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

@Service
public class ContentService {

    @Autowired
    private RestHighLevelClient restHighLevelClient;



    //解析数据, 放入ES索引中
    public Boolean parseContent(String keywords) throws IOException {
        List<Content> contents =  new HtmlParseUtil().parseJD(keywords);
        //把查询的数据放入ES中
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout(TimeValue.timeValueMinutes(2L));

        for (int i = 0; i < contents.size(); i++) {
            bulkRequest.add(new IndexRequest("jd_goods")
                    .source(JSON.toJSONString(contents.get(i)), XContentType.JSON));
        }

        BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        return !bulk.hasFailures();
    }

    //获取这些数据(从ES索引中), 实现搜索功能
    public List<Map<String, Object>> searchPage(String keyword, int pageNo, int pageSize) throws IOException {
        if (pageNo <= 1) {
            pageNo = 1;
        }
        //条件搜索
        SearchRequest searchRequest = new SearchRequest("jd_goods");
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        //精准匹配
        MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("title", keyword);
        sourceBuilder.query(matchQueryBuilder)
                .timeout(TimeValue.timeValueMinutes(1L));
        //分页
        sourceBuilder.from(pageNo)
                .size(pageSize);
        //执行搜索
        searchRequest.source(sourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        //解析结果
        List<Map<String, Object>> list = new ArrayList<>();
        for (SearchHit hit : searchResponse.getHits().getHits()) {
            list.add(hit.getSourceAsMap());
        }
        return list;
    }

    //实现搜索高亮
    public List<Map<String, Object>> searchPageHighLightBuilder(String keyword, int pageNo, int pageSize) throws IOException {
        if (pageNo <= 1) {
            pageNo = 1;
        }
        //条件搜索
        SearchRequest searchRequest = new SearchRequest("jd_goods");
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        //精准匹配
        MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("title", keyword);
        sourceBuilder.query(matchQueryBuilder)
                .timeout(TimeValue.timeValueMinutes(1L));
        //高亮
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        //定义要高亮的标签和样式
        highlightBuilder.field("title")
                .preTags("<span style='color:red'>")
                .postTags("</span>")
                .requireFieldMatch(false);          //是否需要高亮多个字段
        sourceBuilder.highlighter(highlightBuilder);
        //分页
        sourceBuilder.from(pageNo)
                .size(pageSize);
        //执行搜索
        searchRequest.source(sourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        //解析结果
        List<Map<String, Object>> list = new ArrayList<>();
        for (SearchHit hit : searchResponse.getHits().getHits()) {

            //解析高亮的字段
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            HighlightField title = highlightFields.get("title");
            Map<String, Object> sourceAsMap = hit.getSourceAsMap();  //这里是原来的结果(不含高亮)

            if(title != null) {
                Text[] fragments = title.fragments();
                String n_title = "";
                for (Text text : fragments) {
                    n_title += text;
                }
                //将高亮字段替换没有高亮的字段
                sourceAsMap.put("title", n_title);
            }

            list.add(sourceAsMap);

        }
        return list;
    }


}

页面跳转

package com.newer.controller;

import com.newer.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.util.List;
import java.util.Map;

@RestController
public class ContentController {

    @Autowired
    private ContentService contentService;

    @GetMapping("/parse/{keyword}")
    public Boolean parse(@PathVariable String keyword) throws IOException {
        return contentService.parseContent(keyword);
    }

    @GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
    public List<Map<String, Object>> search(@PathVariable("keyword") String keyword,
                                            @PathVariable("pageNo") int pageNo,
                                            @PathVariable("pageSize") int pageSize) throws IOException {
        return contentService.searchPageHighLightBuilder(keyword, pageNo, pageSize);
    }


}

前端

<!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org">

<head>
    <meta charset="utf-8"/>
    <title>狂神说Java-ES仿京东实战</title>
    <link rel="stylesheet" th:href="@{/css/style.css}"/>
    <!--前端使用Vue,实现前后端分离-->
    <script src="https://unpkg.com/axios/dist/axios.min.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/vue/dist/vue.js"></script>
</head>

<body class="pg">
<div class="page" id="app">
    <div id="mallPage" class=" mallist tmall- page-not-market ">

        <!-- 头部搜索 -->
        <div id="header" class=" header-list-app">
            <div class="headerLayout">
                <div class="headerCon ">
                    <!-- Logo-->
                    <h1 id="mallLogo">
                        <img th:src="@{/images/jdlogo.png}" alt="">
                    </h1>

                    <div class="header-extra">

                        <!--搜索-->
                        <div id="mallSearch" class="mall-search">
                            <form name="searchTop" class="mallSearch-form clearfix">
                                <fieldset>
                                    <legend>天猫搜索</legend>
                                    <div class="mallSearch-input clearfix">
                                        <div class="s-combobox" id="s-combobox-685">
                                            <div class="s-combobox-input-wrap">
                                                <input v-model="keyword" type="text" autocomplete="off" value="dd" id="mq"
                                                       class="s-combobox-input" aria-haspopup="true">
                                            </div>
                                        </div>
                                        <button type="submit" @click.prevent="searchKey" id="searchbtn">搜索</button>
                                    </div>
                                </fieldset>
                            </form>
                            <ul class="relKeyTop">
                                <li><a>狂神说Java</a></li>
                                <li><a>狂神说前端</a></li>
                                <li><a>狂神说Linux</a></li>
                                <li><a>狂神说大数据</a></li>
                                <li><a>狂神聊理财</a></li>
                            </ul>
                        </div>
                    </div>
                </div>
            </div>
        </div>

        <!-- 商品详情页面 -->
        <div id="content">
            <div class="main">
                <!-- 品牌分类 -->
                <form class="navAttrsForm">
                    <div class="attrs j_NavAttrs" style="display:block">
                        <div class="brandAttr j_nav_brand">
                            <div class="j_Brand attr">
                                <div class="attrKey">
                                    品牌
                                </div>
                                <div class="attrValues">
                                    <ul class="av-collapse row-2">
                                        <li><a href="#"> 狂神说 </a></li>
                                        <li><a href="#"> Java </a></li>
                                    </ul>
                                </div>
                            </div>
                        </div>
                    </div>
                </form>

                <!-- 排序规则 -->
                <div class="filter clearfix">
                    <a class="fSort fSort-cur">综合<i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">人气<i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">新品<i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">销量<i class="f-ico-arrow-d"></i></a>
                    <a class="fSort">价格<i class="f-ico-triangle-mt"></i><i class="f-ico-triangle-mb"></i></a>
                </div>

                <!-- 商品详情 -->
                <div class="view grid-nosku">

                    <div class="product" v-for="result in results">
                        <div class="product-iWrap">
                            <!--商品封面-->
                            <div class="productImg-wrap">
                                <a class="productImg">
                                    <img :src="result.img">
                                </a>
                            </div>
                            <!--价格-->
                            <p class="productPrice">
                                <em>{{result.price}}</em>
                            </p>
                            <!--标题, 我们传递的是一个html-->
                            <p class="productTitle">
                                <a v-html="result.title"></a>
                            </p>
                            <!-- 店铺名 -->
                            <div class="productShop">
                                <span>店铺： 狂神说Java </span>
                            </div>
                            <!-- 成交信息 -->
                            <p class="productStatus">
                                <span>月成交<em>999笔</em></span>
                                <span>评价 <a>3</a></span>
                            </p>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</div>



<script>
    new Vue({
        el: '#app',
        data: {
            keyword: '',    //搜索的关键字
            results: []      //搜索的结果
        },
        methods: {
            searchKey() {
                let keyword = this.keyword;
                console.log(keyword);
                //对接后端的接口
                axios.get('search/' + keyword + "/1/20").then(response => {
                    console.log(response.data);
                    this.results = response.data;    //绑定数据
                })
            }
        }
    })

</script>

</body>
</html>

注意：由于后端高亮传回来的是一个html, 我们需要解析, 不能单纯的双向绑定, 因此用 v-html

以上来自狂神 ElasticSearch视频，参考网上一些大佬博客

最后

以上就是舒心滑板最近收集整理的关于ElasticSearch学习笔记的全部内容，更多相关ElasticSearch学习笔记内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：ElasticSearch
浏览次数：198 次浏览
发布日期：2023-12-18 00:10:23
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_23_o_2_fz_14__7__6_1.html

ElasticSearch学习笔记

1. ElasticSearch概述

2. ES与Solr的差别

2.1. Solr简介

2.2. Lucene简介

2.3. ES VS Solr

3. ElasticSearch 安装

4. Kibana安装

5. ES核心概念

6. IK分词器

7. Restful风格说明

8. 关于文档的基本操作

9. 集成SpringBoot

10. 实战：模拟全文搜索-京东搜索

最后

评论列表共有 0 条评论

发表评论取消回复

ElasticSearch学习笔记

1. ElasticSearch概述

2. ES与Solr的差别

2.1. Solr简介

2.2. Lucene简介

2.3. ES VS Solr

3. ElasticSearch 安装

4. Kibana安装

5. ES核心概念

6. IK分词器

7. Restful风格说明

8. 关于文档的基本操作

9. 集成SpringBoot

10. 实战：模拟全文搜索-京东搜索

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复