6. Solr configures DataImport to import index data and IK word segmentation query

Table of Contents


1. Introduction, download and installation of Apache Solr

2. Core kernel instance, IK tokenizer, Solr (stand-alone, cluster)

3. Solr basic commands (start, stop, system information)

4. Solr’s solrconfig.xml configuration and managed.schema mode

5. Solr Admin UI operations (XML, JSON add|modify|delete|query index)

6. Solr configures DataImport to import index data and IK word segmentation query

7. Use Solr in Java, historical version (after 7.0.0, 5.0.0~6.6.6, before 4.10.4)

8. Traditional Spring integration with Solr

9. Spring Boot integrates Solr


Solr configures DataImport to import index data

  • Table of contents
  • Solr configures DataImport to import index data
    • 1. Configure the jar package (3)
    • 2. Modify and add the core instance configuration file (three locations need to be changed)
      • (1) solrconfig.xml file, add requestHandler configuration
      • (2) Create a new data-config.xml file and add configuration
        • \
        • \
        • \child element\
      • (3) Modify the schema.xml file and add the fields defined in data-config
    • 3. Import data in full or incrementally
      • Import in full
      • incremental import
    • 4. After Execute execution
  • participle query

Solr configures DataImport to import index data

DataImportHandler provides a configurable way to import data to solr, which can be imported all at once or incrementally. It can also provide configurable task scheduling declaratively, so that data can be updated from relational database to solr server at regular intervals.

1. Configure jar package (3)

solr-dataimporthandler-extras-version.jar, solr-dataimporthandler-version.jar (in dist file)

mysql-connector-java-version.jar (Maven library download)

Import the above three jars into \server\solr-webapp\webapp\WEB-INF\lib

2. Modify and add the core instance configuration file (three locations need to be changed)

(1) solrconfig.xml file, add requestHandler configuration

Point to a custom file: data-config.xml, as the name suggests, is to configure related data sources

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>

(2) Create a new data-config.xml file and add configuration

\Solr\solr-8.5.1\server\solr\mycore\conf

The driver url used by MySQL5 is com.mysql.jdbc.Driver, and com.mysql.cj.jdbc.Driver is used after MySQL6. If the version does not match, an error that the driver class is outdated will be reported

<?xml version="1.0" encoding="UTF-8" ?>
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource driver="com.mysql.cj.jdbc.Driver"
url="jdbc:mysql://localhost:3306/yiyun_mall?autoReconnect=true &amp;useSSL=false &amp;characterEncoding=utf-8 &amp;serverTimezone=UTC"
user="root"
password="root"/>
\t\t\t
<document name="salesDoc">
<entity name="o_item" query="select id,title,sell_point,price,image,cid,created,updated from yy_item" >
<field name="id" column="id" />
<field name="title" column="title" />
<field name="sell_point" column="sell_point" />
<field name="price" column="price" />
<field name="image" column="image" />
<field name="cid" column="cid" />
<field name="created" column="created" />
<field name="updated" column="updated" />
</entity>
</document>
</dataConfig>

parameter description
name The name of dataSource, the configuration file can have multiple datasources, use name to distinguish
type data source type, such as JDBC
driver Database driver package, put it in the lib directory in advance
url Database connection url
user Database username
password Database password

It is used to configure how to import data from the database to build a document object, which mainly consists of one or more entities
child element attribute

parameter description
name Entity name
dataSource dataSource name
query Get all Data SQL
deltaImportQuery SQL used when getting incremental data
deltaQuery Get SQL of pk
parentDeltaQuery Get SQL of pk of parent Entity

child element

parameter description
name Field in Schema.xml
column Database query column name

(3) Modify the schema.xml file and add the fields defined in data-config

If it already exists such as id and name, you don’t need to add it again

 <!-- own ID -->
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="title" type="text_ik" indexed="true" stored="true"/>
<field name="sell_point" type="string" indexed="true" stored="true"/>
<field name="price" type="pdouble" indexed="true" stored="true"/>
<field name="image" type="string" indexed="true" stored="true"/>
<field name="cid" type="pint" indexed="true" stored="true"/>
<field name="created" type="pdate" indexed="true" stored="true"/>
<field name="updated" type="pdate" indexed="true" stored="true"/>

3. Import data in full or incrementally

Start solr. Select mycore, select the dataimport command interface

command parameters description
clean Decided to delete the previous index before building the index
commit Decided whether to commit after this operation
optimize decide whether to optimize after this operation

You can choose whether to import in full or incrementally, and select the configured entity (entity name)

Full import

The so-called full index generally refers to reading all the data that needs to be imported from the database each time, then submitting it to Solr Server, and finally deleting all the index data of the specified core for reconstruction. Full import is generally performed when data is imported for the first time or when backup data is restored

How Full Import works
(1) Execute the Query of this Entity to get all the data
(2) For each row of data Row, get the pk, and assemble the Query of the sub-Entity
(3) Execute the Query of the sub-Entity to obtain the data of the sub-Entity

Incremental import

If you want to use incremental import, the prerequisite table must have two fields, one is the deletion flag field, that is, the tombstone flag: isdeleted, and the other is the data creation time field: create_date. The field names do not have to be isdeleted and create_date, but must To include two fields to indicate that. According to the comparison between the data creation time and the last incremental import operation time, the data that needs to be incrementally imported can be queried through the SQL statement, and the data marked as deleted can be queried according to the isdeleted field. The ID primary key of these data needs to be passed to solr, so that solr can synchronously delete related documents in the index to realize incremental data update. If the data in the data table is physically deleted and there is no logical flag field, it is difficult to find the deleted data, so this is why the logical deletion flag field is needed.

How Delta Import Works
(1) Find the child Entity until there is no more;
(2) Execute Entity’s deltaQuery to obtain the pk of the changed data;
(3) Merge the pk obtained by sub-Entity parentDeltaQuery;
(4) For each pk Row, assemble the parentDeltaQuery of the parent Entity;
(5) Execute parentDeltaQuery to obtain the pk of the parent Entity;
(6) Execute deltaImportQuery to obtain its own data;
(7) If there is no deltaImportQuery, assemble Query

4.After Execute

Wait a while and refresh. Can be viewed in Overview

word segmentation query

field#type must specify a tokenizer