ElasticSearch – Manipulate index libraries and documents based on JavaRestClient

Table of Contents

1. RestClient operation index library

1.1. What is RestClient?

1.2. JavaRestClient implements creation and deletion of index libraries

1.2.1. Preface

1.2.1. Initialize JavaRestClient

1.2.2. Create index library

1.2.3. Determine whether the index library exists

1.2.4. Delete the index library

1.3. JavaRestClient implements CRUD of documents

1.3.1. Initialize JavaRestClient

1.3.2. Add documents (hotel data) to the index database

1.3.3. Query hotel data based on id

1.3.4. Modify hotel data based on id

1.3.5. Delete document data based on id

1.3.6. Import documents in batches


1. RestClient operation index library

1.1. What is RestClient?

We have already learned how to use DSL statements to operate the index library and documents of es. However, as a java programmer, we will definitely have to operate es through java code in the future. If we want to achieve this, we need to use the RestClient officially provided by es. accomplish.

RestClient is actually a client in various languages officially provided by es. Its role is to help us assemble DSL statements and then send http requests to the es server. We only need to send the request to the client through java code, and then the client He will help us take care of the rest.

Official document address: Elasticsearch Clients | Elastic

1.2. JavaRestClient implements creation and deletion of index libraries

1.2.1, Foreword

Here I will use a hotel demo project to demonstrate the operation of JavaRestClient.

Specifically, this is a hotel’s data, and the sql created is as follows:

CREATE TABLE `tb_hotel` (
  `id` bigint(20) NOT NULL COMMENT 'hotel id',
  `name` varchar(255) NOT NULL COMMENT 'Hotel name; example: 7 Days Hotel',
  `address` varchar(255) NOT NULL COMMENT 'Hotel address; example: Hangtou Road',
  `price` int(10) NOT NULL COMMENT 'Hotel price; example: 329',
  `score` int(2) NOT NULL COMMENT 'Hotel rating; for example: 45, which is 4.5 points',
  `brand` varchar(32) NOT NULL COMMENT 'Hotel brand; for example: Home Inn',
  `city` varchar(32) NOT NULL COMMENT 'City; example: Shanghai',
  `star_name` varchar(16) DEFAULT NULL COMMENT 'Hotel star rating, from low to high: 1 star to 5 stars, 1 diamond to 5 diamonds',
  `business` varchar(255) DEFAULT NULL COMMENT 'Business district; example: Hongqiao',
  `latitude` varchar(32) NOT NULL COMMENT 'Latitude; example: 31.2497',
  `longitude` varchar(32) NOT NULL COMMENT 'Longitude; example: 120.3925',
  `pic` varchar(255) DEFAULT NULL COMMENT 'Hotel picture; example:/img/1.jpg',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

When we create the index library later, we need to consider mapping constraints based on the above SQL data.

1.2.1, initialize JavaRestClient

a) Introduce the RestHighLevelClient dependency of es

 <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
        </dependency>

b) Since the default ES version of SpringBoot is 7.6.2, we need to override the default ES version here.

Just add the following version information to the yml configuration file.

<properties>
    <java.version>1.8</java.version>
    <elasticsearch.version>7.12.1</elasticsearch.version>
</properties>

c) Initialize RestHighLevelClient.

Here we create a test class HotelIndexTest to demonstrate the related methods of RestClient operation.

@SpringBootTest
class HotelIndexTest {

    private RestHighLevelClient client;

    @BeforeEach
    public void setUp() {
        client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http//cloud server ip:9200")
                //If it is a cluster in the future, you can continue to connect multiple nodes through HttpHost.create
        ));
    }

    @AfterEach
    public void tearDown() throws IOException {
        client.close();
    }

}

1.2.2, Create index library

Here you need to consider how mapping should be established based on the table structure provided previously.

Specific considerations include: field name, data type, whether to participate in search, whether to segment words, and if so, what is the segmenter?

You can use Kibana to write here first.

PUT /hotel
{
  "mappings": {
    "properties": {
      "id": {
        // According to the definition of the database, the type here is set to long
        // But this is special. In the index library, id is special and will be of string type in the future.
        // And because id will not be segmented in the future, it is of keyword type.
        // id will definitely participate in crud in the future, so index will default to true.
        "type": "keyword"
      },
      "name": {
        //The name of the hotel needs to be searched and segmented.
        "type": "text",
        "analyzer": "ik_max_word", "copy_to": "all"
      },
      "address": {
        // Sometimes we need to query nearby hotels based on address, and word segmentation is also necessary ("For example, No. 58, Lane 315, Longhua West Road, Xuhui")
        "type": "text",
        "analyzer": "ik_max_word",
        "copy_to": "all"
      },
      "price": {
        //In the future, hotels will be filtered based on price range, so search is needed and word segmentation is not necessary.
        "type": "integer"
      },
      "score": {
        //This is the same as price
        "type": "integer"
      },
      "brand": {
        //The hotel brand definitely does not need word segmentation, but it must participate in the search.
        "type": "keyword",
        "copy_to": "all"
      },
      "city": {
        //The city name does not require word segmentation, but it needs to participate in the search
        "type": "keyword",
        "copy_to": "all"
      },
      "star_name": {
        //One star, two stars, three stars... participles are meaningless, only combinations make sense.
        //Some people just want to stay in a 5-star hotel, so they must participate in the search.
        "type": "keyword"
      },
      "business": {
        //Business districts such as: Hongqiao, the Bund... These definitely do not require word segmentation, but they must participate in the search.
        "type": "keyword",
        "copy_to": "all"
      },
      "pic": {
        //The picture here is a url path, no word segmentation is needed, and no one will search for this url.
        //So this url can be treated as a keyword.
        "type": "keyword",
        "index": false
      },
      "location": {
        //There are two special ways in es to express geographical coordinates
        //"geo_point": represents a point on the map
        //"geo_shape": represents an area on the map, which is composed of multiple points.
        //Then the hotel must belong to a point (after all, from the perspective of the earth, no matter how big the hotel is, it is just a point)
        // geo_point consists of longitude and latitude, and is a string composed of these two together.
        "type": "geo_point"
      },
      "all": {
        // In the future, name, address, brand... these fields will most likely need to participate in the search
        // This means that for the keywords entered by the user, our backend needs to search based on multiple words.
        // And we can imagine that when searching in the following es, the efficiency of searching based on multiple fields is definitely lower than the efficiency of searching in one field.
        //This will be clear by comparing the following databases.
        //The most important thing is that we also hope that users can search for relevant content by entering their name, and that users can also search for related content by entering their brand...
        // There is a field "copy_to" in es, which copies the value of the current field to the specified field.
        //Here we will copy all the fields that need to be searched into the all field and it will be ok
        //This also enables searching for the contents of multiple fields in one field.
        "type": "text",
        "analyzer": "ik_max_word"
      }
    }
  }
}

Interpretation of custom all field:

In the future, name, address, brand… these fields will most likely need to participate in the search, which means that our backend needs to search based on multiple words for the keywords entered by the user, and we can imagine the following es for searching. At this time, the efficiency of searching based on multiple fields is definitely lower than the efficiency of searching in one field. It will be clear when comparing the following databases~

The most important thing is that we also hope that users can search for relevant content by entering a name, and that users can also search for related content by entering a brand… There is a field “copy_to” in es, which copies the value of the current field to the specified field. Here we copy all the fields that need to be searched into the all field and it is ok. This allows us to search for the contents of multiple fields in one field.

Moreover, optimization has been done here. It does not actually copy the document, but creates an index. When you check in the future, you will not be able to see these fields, but you can search them (similar to finding based on a pointer) Where the data is located).

The code to create the index library is as follows:

 @Test
    public void testCreateHotelIndex() throws IOException {
        //1.Create Request object
        CreateIndexRequest request = new CreateIndexRequest("hotel");
        //2. Write request parameters (MAPPING_TEMPLATE is a static constant, the content is the DSL statement to create the index library)
        request.source(MAPPING_TEMPLATE, XContentType.JSON);
        //3. Initiate a request
        client.indices().create(request, RequestOptions.DEFAULT);
    }
  • The construction parameter of CreateIndexRequest is the name of the index library requested to be created.
  • MAPPING_TEMPLATE: It is a custom static constant whose content is the DSL statement for creating the index library.
  • client.indices(): The return value of this method is an object (indices is the plural form of index), which contains all methods of operating the index library.
  • RequestOptions.DEFAULT: means taking the default method.

After execution, it was found that the operation was successful~

Then go to Elastic DevTools and do GET, and you can see the new index library~

1.2.3. Determine whether the index library exists

The code to determine whether the index library exists is as follows:

 @Test
    public void testExistsHotelIndex() throws IOException {
        //1.Create Request object
        GetIndexRequest request = new GetIndexRequest("hotel");
        //2.Send request
        boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
        System.out.println(exists);
    }

Many times, we can first write client.indices().exists to see what parameters are needed.

After running, you can see that it passed (true because the index library added in the previous case exists).

1.2.4, delete index library

The code for judging deletion of the index library is as follows:

 @Test
    public void testDeleteHotelIndex() throws IOException {
        //1.Create Request object
        DeleteIndexRequest request = new DeleteIndexRequest("hotel");
        //2.Send request
        client.indices().delete(request, RequestOptions.DEFAULT);
    }

After querying again, I found that the query could not be found, indicating that the deletion was successful.

1.3. CRUD of JavaRestClient implementation document

1.3.1, initialize JavaRestClient

The initialization operation here is the same as the initialization of the index library (essentially connecting to the JavaRestClient client).

@SpringBootTest
class HotelDocumentTest {

    private RestHighLevelClient client;

    @BeforeEach
    public void setUp() {
        client = new RestHighLevelClient(RestClient.builder(
                HttpHost.create("http://cloud server ip:9200")
        ));
    }

    @AfterEach
    public void tearDown() throws IOException {
        client.close();
    }

}

1.3.2. Add documents (hotel data) to the index database

Ps: You need to create the corresponding index library before operating the document.

Here I first get the data from the database through MyBatis-Puls, and then add the document.

The entity class is as follows (the construction method is rewritten here mainly for the location attribute (geographical location), which combines longitude and latitude into one):

@Data
@NoArgsConstructor
public class HotelDoc {
    private Long id;
    private String name;
    private String address;
    private Integer price;
    private Integer score;
    private String brand;
    private String city;
    private String starName;
    private String business;
    private String location;
    private String pic;

    public HotelDoc(Hotel hotel) {
        this.id = hotel.getId();
        this.name = hotel.getName();
        this.address = hotel.getAddress();
        this.price = hotel.getPrice();
        this.score = hotel.getScore();
        this.brand = hotel.getBrand();
        this.city = hotel.getCity();
        this.starName = hotel.getStarName();
        this.business = hotel.getBusiness();
        this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
        this.pic = hotel.getPic();
    }
}

@NoArgsConstructor: Generates a no-argument constructor.

Write code to add documentation:

 @Test
    public void testAddDocument() throws IOException {
        //1. Get hotel data
        Hotel hotel = hotelService.getById(5865979L);
        //2. Convert documents (mainly geographical location)
        HotelDoc hotelDoc = new HotelDoc(hotel);
        //3. Convert to JSON format
        String hotelJson = objectMapper.writeValueAsString(hotelDoc);
        //4.Construct request
        IndexRequest request = new IndexRequest("hotel").id(hotel.getId().toString());
        //5. Add request parameters (json format)
        request.source(hotelJson, XContentType.JSON);
        //6.Send request
        client.index(request, RequestOptions.DEFAULT);
    }

After running, it was found that it passed

Query on Kibana to get the corresponding data

1.3.3. Query hotel data based on id

It is worth noting here: what is queried through client.get is a GetResponse object, and the original data inside needs to be obtained.

code show as below:

 @Test
    public void testGetDocument() throws IOException {
        //1.Construct request
        GetRequest request = new GetRequest("hotel").id("5865979");
        //2.Send request
        GetResponse response = client.get(request, RequestOptions.DEFAULT);
        //3. Convert to json
        String json = response.getSourceAsString();
        System.out.println(json);
    }

After running, you can get the corresponding data

1.3.4. Modify hotel data based on id

There are two ways to modify document data (mentioned before):

  • Full update (that is, the added document demonstrated above): If the document with the same ID is written again, the old document will be deleted and the new document will be added.
  • Partial update (demonstrate this): Only some fields are updated.
 @Test
    public void testUpdateDocument() throws IOException {
        //1.Construct request
        UpdateRequest request = new UpdateRequest("hotel", "5865979");
        //2. Fill in the parameters
        request.doc(
            "name", "The Strongest Hotel on Earth",
                "price", "99999"
        );
        //3.Send request
        client.update(request, RequestOptions.DEFAULT);
    }

Query via GET on Kibana as follows:

1.3.5. Delete document data based on id

The code to delete the document is as follows:

 @Test
    public void testDeleteDocument() throws IOException {
        //1.Construct request
        DeleteRequest request = new DeleteRequest("hotel", "5865979");
        //2.Send request
        client.delete(request, RequestOptions.DEFAULT);
    }

1.3.6, batch import documents

For example, to import all data of a hotel, the code is as follows:

 @Test
    public void testBulkDocument() throws IOException {
        //1. Get all hotel data
        List<Hotel> hotelList = hotelService.list();
        //2.Construct request
        BulkRequest request = new BulkRequest();
        //3. Prepare parameters
        for(Hotel hotel : hotelList) {
            //Convert to document (mainly geographical location)
            HotelDoc hotelDoc = new HotelDoc(hotel);
            String json = objectMapper.writeValueAsString(hotelDoc);
            request.add(new IndexRequest("hotel").id(hotel.getId().toString()).source(json, XContentType.JSON));
        }
        //4.Send request
        client.bulk(request, RequestOptions.DEFAULT);
    }

After running, you can see that it passed

Afterwards, randomly query a hotel data on Kibana and it will all exist.