[ElasticSearch Series-08] ElasticSearch handles the relationship between objects

ElasticSearch series overall column

Content	Link address
[1] ElasticSearch download and installation	https://zhenghuisheng.blog.csdn.net/article/details /129260827
[2] ElasticSearch concepts and basic operations	https://blog.csdn.net/zhenghuishengq/article/details/134121631
[3] ElasticSearch’s advanced query Query DSL	https://blog.csdn.net/zhenghuishengq/article/details/134159587
[4] Aggregation query operation of ElasticSearch	https://blog.csdn.net/zhenghuishengq/article /details/134159587
[5] SpringBoot integrates elasticSearch	https://blog.csdn.net/zhenghuishengq/article/details/134212200
[6] The construction of Es cluster architecture and the core concepts of clusters	https://blog.csdn.net/zhenghuishengq/article/details/134258577
[7] ES development scenarios and index sharding setting and optimization	https: //blog.csdn.net/zhenghuishengq/article/details/134302130
[8] ElasticSearch Processing relationships between objects	https://blog.csdn.net/zhenghuishengq/article/details/134327295

Es handles the relationships between objects

1. Es handles the relationships between objects
- 1. Object type
- - 1.1, kibana operation of object type
  - 1.2, Java operations of object types
- 2. Nested type
- - 2.1, Nested type kibana operation
  - 2.2, Java operations of nested types
- 3. Parent-child type
- - 3.1, Kibana operation of parent-child type
  - 3.2, Java code of parent-child type

1. Es handles the relationship between objects

es is a non-relational database of the nosql type, and when dealing with relationships, it is often not good at dealing with such relationships. Unlike mysql, which handles such relationships through normalization.

Normalization is helpful to reduce data redundancy, reduce database space, and make overall maintenance easier. However, multi-step queries are required during querying, and join table queries will also increase the entire query time; anti-normalization requires data redundancy. There is no need to consider the association relationship, and there is no need to perform these join operations. The performance is higher when reading data, but the shortcomings are also obvious. It is more troublesome to modify the data. It may be caused by the modification of a field. Modification of multiple pieces of data.

In ElasticSearch, we mainly consider the trend of this non-relational database. There are four main internal methods to handle this type of related data scenarios, namely object type, nested type, parent-child relationship type, and application-side association< /strong>

1, object type

1.1, kibana operation of object type

For example, if the document contains the data type of the object, for example, in the index of the article article, there is an attribute called the object attribute user, that is, each article contains the information of a user user. The statement to create the index is as follows

PUT /article { "mappings": { "properties": { "title":{ "type":"text" }, "createTime":{ "type": "date" }, "user":{ "properties": { "username":{ "type":"keyword" }, "age":{ "type":"long" }, "sex":{ "type":"text" } } } } } }

Then insert a piece of data into this index and set the user’s information

PUT /article/_doc/1 { "title":"ElasticSearch learning", "createTime":"2023-11-09T00:00:00", "user":{ "username":"zhenghuisheng", "age":"18", "sex":"male" } }

The user’s query is as follows, querying by user name, here you can directly query data through Object.Property

GET /article/_search { "query": { "match": { "user.username": "zhenghuisheng" } } }

1.2, java operations of object types

Before creating an index, you need to obtain the es connection through configuration. The configuration class is as follows

@Bean public RestHighLevelClient esRestClient(){ RestHighLevelClient client = new RestHighLevelClient( RestClient.builder(new HttpHost("xxx.33.xxx.xxx", 9200, "http"))); return client; }

The java code to create articl index and insert data is as follows. The client parameter inside is the data of springboot integration chapter.

//Insert data IndexRequest userIndex = new IndexRequest("article"); User user = new User(); user.setUsername("zhenghuisheng"); user.setAge(18); user.setSex("male"); //adding data userIndex.source(JSON.toJSONString(user), XContentType.JSON); //client is the client integrated with springBoot earlier, imported through resource client.index(userIndex, ElasticSearchConfig.COMMON_OPTIONS);

The method of querying data is as follows. When setting this field, even subfields can be set directly by splicing user.username

SearchRequest request = new SearchRequest("article"); SearchSourceBuilder builder = new SearchSourceBuilder(); builder.query(QueryBuilders.matchQuery("user.username","zhenghuisheng")); request.source(builder); SearchResponse search = client.search(request, RequestOptions.DEFAULT); System.out.println(search);

2, nested types

2.1, nested type kibana operation

Nested objects refer to objects in an object array that can be indexed independently. That is to say, the data will be segmented internally in es, but when querying, it may be because of this operation that incorrect data will be queried, such as English names, firstName and lastName, but due to combination problems, it will be unnecessary The fields are queried, such as zhan san, li si. However, when querying zhan si, these two pieces of data will be queried, and they do not exist.

In order to solve this nested type problem, you can use the keyword nested type. This bottom layer saves the document in two index libraries. When doing a query, there will be a join Join query

As in the following case, first create an index data, still create an index of articles, and then there is an author’s information inside.

PUT /article { "mappings": { "properties": { "title":{ "type": "text" }, "author":{ "type": "nested", "properties": { "first_name":{ "type":"keyword" }, "last_name":{ "type":"keyword" } } } } } }

Insert a piece of data into this document, as shown below. There are two authors in it, stored in the form of an array.

POST /article/_doc/1 { "title":"ElasticSearch Teaching", "author":[ { "first_name":"zheng", "last_name":"huisheng" }, { "first_name":"li", "last_name":"si" } ] }

Then when querying, you only need to use nested to query the data. The following path path is the corresponding object that needs to be queried. In this way, unnecessary data will not be queried during the query. come out

GET /article/_search { "query": { "nested": { //Fixed matching, you can enter it directly and follow the query "path": "author", "query": { "bool": { "must": [ { "match": { "author.first_name": "zheng" } }, { "match": { "author.last_name": "si" } } ] } } } } }

When aggregating queries, you also need to specify the nested attribute value and set the path to the query object.

GET /article/_search { "aggs": { "author": { "nested": { "path":"author" } } } }

2.2, Java operations of nested types

The java code corresponding to the nested type is as follows. First, create an index and set the parameters.

@Test public void createIndex() throws Exception{ XContentBuilder mapping = XContentFactory.jsonBuilder() .startObject() .startObject("properties") .startObject("title") .field("type","text") .endObject() .startObject("author") .field("type","nested") .startObject("properties") .startObject("first_name") .field("type","keyword") .endObject() .startObject("last_name") .field("type","keyword") .endObject() .endObject() .endObject() .endObject() .endObject(); CreateIndexRequest request = new CreateIndexRequest("article") .settings(Settings.builder() .put("number_of_shards", 3) //Set the number of shards .put("number_of_replicas", 1) //Set the number of replicas .build()) .mapping(mapping); //Execute creation CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT); System.out.println("The execution result is" + response); }

The way to insert data is as follows. It is the same as the above operation with kibana. Insert two pieces of data.

//Insert data IndexRequest userIndex = new IndexRequest("article"); List<Author> list = new ArrayList<>(); Author author1 = new Author(); author1.setFirstName("zheng"); author1.setLastName("huisheng"); Author author2 = new Author(); author2.setFirstName("li"); author2.setLastName("si"); list.add(author1); list.add(author2); Article article = new Article(); article.setTitle("ElasticSearch Teaching"); article.setAuthor(list); //adding data userIndex.source(JSON.toJSONString(article), XContentType.JSON); client.index(userIndex, ElasticSearchConfig.COMMON_OPTIONS);

The next step is to query the data, just by building this NestedQueryBuilder

//Query data SearchRequest request = new SearchRequest("article"); String path = "author"; QueryBuilder builder = new NestedQueryBuilder(path, QueryBuilders.boolQuery() .must(QueryBuilders.matchQuery("author.first_name", "zheng")) .must(QueryBuilders.matchQuery("author.last_name","si")), ScoreMode.None); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(builder); request.source(searchSourceBuilder); SearchResponse search = client.search(request, RequestOptions.DEFAULT); System.out.println(search);

3, parent-child type type

3.1, kibana operation of parent-child type

In ElasticSearch, there are also documents with a parent-child relationship. The relationship between parent-child documents is connected internally through join, and in es, the independence between parent documents and child documents is achieved, that is to say, when a certain parent document needs to be updated At this time, there is no need to modify the data of the subdocument, and the top object type is used, that is, as long as certain data is updated, a large range of data may be updated. Through document independence, operations between individual documents will not affect each other.

But there is also a disadvantage, that is, the efficiency of joint tables of big data is definitely not high, but the advantage is that the efficiency will be higher during updates.

When using this parent-child relationship document, you need to determine the relationship between the parent index and the child index when building the index. As shown below, the join keyword needs to be used to indicate the connection relationship. In relations, set the key to the name of the parent index and the value to the name of the child index.

"teacher_student_relation": { "type": "join", //Specify the type "relations": { //Confirm the relationship "teacher": "student" //teacher is the parent document and student is the child document } }

For example, create a user index that corresponds to the information of student and teacher respectively. Set the number of shards to 3, teacher as the parent document, and student as the child document.

PUT /user { "settings": { "number_of_shards": 3 }, "mappings": { "properties": { "relation": { "type": "join", "relations": { "teacher": "student" } }, "username": { "type": "keyword" }, "sex": { "type": "text" } } } }

Next, insert a piece of data into the parent document. You still need the relation attribute, and the table name is teacher parent document.

PUT /user/_doc/1 { "username":"Tom", "sex":"male", "relation":{ "name":"teacher" //Indicates that it is the parent document } }

Next, specify the subdocument. In order to solve the performance of this join query, you need to use the routing function to route the subdocument and the parent document to the same shard. Secondly, you also need to specify the attributes of the parent and subdocument. In addition to setting the subdocument’s In addition to the name, you also need to specify the name of the parent document

PUT /user/_doc/student1?routing=1 { "username":"zhenghuisheng", "sex":"male", "relation":{ "name":"student", "parent":"teacher" } }

So here are some query methods, such as querying by id as follows

GET /user/_doc/1 //Query based on parent document id GET /user/_doc/student1?routing=1 //Query through subdocuments

You can also query whether the subdocument contains certain data. It should be noted that has_child is used and the type is type

//Query whether the subdocument contains certain data GET /user/_search { "query": { "has_child": { "type": "student", "query": { "match": { "username": "zhenghusiheng" } } } } }

At the same time, there is also a query whether the parent document contains certain data. Here you need to use has_parent and the type is parent_type

GET /user/_search { "query": { "has_parent": { "parent_type": "teacher", "query": { "match": { "username": "Tom" } } } } }

3.2, java code of parent-child type

First, create the index and set the copy information.

XContentBuilder mapping = XContentFactory.jsonBuilder() .startObject() .startObject("properties") .startObject("teacher_student_relation") .field("type","join") .startObject("relations") .field("teacher","student") .endObject() .endObject() .startObject("username") .field("type","keyword") .endObject() .startObject("sex") .field("type","text") .endObject() .endObject() .endObject(); CreateIndexRequest request = new CreateIndexRequest("user") .settings(Settings.builder() .put("number_of_shards", 3) .put("number_of_replicas", 1) .build()) .mapping(mapping); CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT); System.out.println("The execution result is" + response);

Then data is inserted. First, the parent document inserts data. It needs to set the index name, and it is best to set the document id. Later, when querying the sub-document, it is also necessary to specify the same id as the parent document through routing, so it is best to specify it yourself.

//Specify index and routing IndexRequest request = new IndexRequest("user"); request.id("teacher1"); User user = new User(); user.setUsername("Tom"); user.setSex("male"); Relation relation = new Relation(); relation.setName("teacher"); user.setRelation(relation); request.source(JSON.toJSONString(user), XContentType.JSON); IndexResponse response = client.index(request, ElasticSearchConfig.COMMON_OPTIONS); System.out.println(response);

Then the data of the sub-document is inserted, and a route needs to be specified. Through this route, the sub-document data and the parent document data can be placed on the same shard, which will help improve the associated query of join. In addition, you also need to set the value of this parent

//Specify index and routing IndexRequest request = new IndexRequest("user").routing("teacher1"); User user = new User(); user.setUsername("zhenghuisheng"); user.setSex("male"); Relation relation = new Relation(); relation.setName("student"); relation.setParent("teacher"); user.setRelation(relation); request.source(JSON.toJSONString(user), XContentType.JSON); IndexResponse response = client.index(request, ElasticSearchConfig.COMMON_OPTIONS); System.out.println(response);

The main differences between nested documents and parent-child documents are as follows:Nested documents are implemented through nested documents, and their documents are stored together in a redundant manner. Its data reading performance is relatively high, but its update performance is low; Parent-child documents are implemented through join. The data of parent-child documents are independent, but additional parent-child relationships need to be maintained. The performance of reading data is relatively poor

The scenario of nested documents is mainly suitable for query-based data, and the updated data is relatively small; the scenario of parent-child documents is mainly suitable for child documents that may be frequently updated.