Table of Contents
1. Introduction, download and installation of Apache Solr
2. Core kernel instance, IK tokenizer, Solr (stand-alone, cluster)
3. Solr basic commands (start, stop, system information)
4. Solr’s solrconfig.xml configuration and managed.schema mode
5. Solr Admin UI operations (XML, JSON add|modify|delete|query index)
6. Solr configures DataImport to import index data and IK word segmentation query
7. Use Solr in Java, historical version (after 7.0.0, 5.0.0~6.6.6, before 4.10.4)
8. Traditional Spring integration with Solr
9. Spring Boot integrates Solr
Solr configures DataImport to import index data
- Table of contents
- Solr configures DataImport to import index data
-
- 1. Configure the jar package (3)
- 2. Modify and add the core instance configuration file (three locations need to be changed)
-
- (1) solrconfig.xml file, add requestHandler configuration
- (2) Create a new data-config.xml file and add configuration
-
- \
- \
- \
child element\
- \
- (3) Modify the schema.xml file and add the fields defined in data-config
- 3. Import data in full or incrementally
-
- Import in full
- incremental import
- 4. After Execute execution
- participle query
Solr configures DataImport to import index data
DataImportHandler provides a configurable way to import data to solr, which can be imported all at once or incrementally. It can also provide configurable task scheduling declaratively, so that data can be updated from relational database to solr server at regular intervals.
1. Configure jar package (3)
solr-dataimporthandler-extras-version.jar, solr-dataimporthandler-version.jar (in dist file)
mysql-connector-java-version.jar (Maven library download)
Import the above three jars into \server\solr-webapp\webapp\WEB-INF\lib
2. Modify and add the core instance configuration file (three locations need to be changed)
(1) solrconfig.xml file, add requestHandler configuration
Point to a custom file: data-config.xml, as the name suggests, is to configure related data sources
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </requestHandler>
(2) Create a new data-config.xml file and add configuration
\Solr\solr-8.5.1\server\solr\mycore\conf
The driver url used by MySQL5 is com.mysql.jdbc.Driver, and com.mysql.cj.jdbc.Driver is used after MySQL6. If the version does not match, an error that the driver class is outdated will be reported
<?xml version="1.0" encoding="UTF-8" ?> <?xml version="1.0" encoding="UTF-8" ?> <dataConfig> <dataSource driver="com.mysql.cj.jdbc.Driver" url="jdbc:mysql://localhost:3306/yiyun_mall?autoReconnect=true &useSSL=false &characterEncoding=utf-8 &serverTimezone=UTC" user="root" password="root"/> \t\t\t <document name="salesDoc"> <entity name="o_item" query="select id,title,sell_point,price,image,cid,created,updated from yy_item" > <field name="id" column="id" /> <field name="title" column="title" /> <field name="sell_point" column="sell_point" /> <field name="price" column="price" /> <field name="image" column="image" /> <field name="cid" column="cid" /> <field name="created" column="created" /> <field name="updated" column="updated" /> </entity> </document> </dataConfig>
parameter | description |
---|---|
name | The name of dataSource, the configuration file can have multiple datasources, use name to distinguish |
type | data source type, such as JDBC |
driver | Database driver package, put it in the lib directory in advance |
url | Database connection url |
user | Database username |
password | Database password |
It is used to configure how to import data from the database to build a document object, which mainly consists of one or more
parameter | description |
---|---|
name | Entity name |
dataSource | dataSource name |
query | Get all Data SQL |
deltaImportQuery | SQL used when getting incremental data |
deltaQuery | Get SQL of pk |
parentDeltaQuery | Get SQL of pk of parent Entity |
child element
parameter | description |
---|---|
name | Field in Schema.xml |
column | Database query column name |
(3) Modify the schema.xml file and add the fields defined in data-config
If it already exists such as id and name, you don’t need to add it again
<!-- own ID --> <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="title" type="text_ik" indexed="true" stored="true"/> <field name="sell_point" type="string" indexed="true" stored="true"/> <field name="price" type="pdouble" indexed="true" stored="true"/> <field name="image" type="string" indexed="true" stored="true"/> <field name="cid" type="pint" indexed="true" stored="true"/> <field name="created" type="pdate" indexed="true" stored="true"/> <field name="updated" type="pdate" indexed="true" stored="true"/>
3. Import data in full or incrementally
Start solr. Select mycore, select the dataimport command interface
command parameters | description |
---|---|
clean | Decided to delete the previous index before building the index |
commit | Decided whether to commit after this operation |
optimize | decide whether to optimize after this operation |
You can choose whether to import in full or incrementally, and select the configured entity (entity name)
Full import
The so-called full index generally refers to reading all the data that needs to be imported from the database each time, then submitting it to Solr Server, and finally deleting all the index data of the specified core for reconstruction. Full import is generally performed when data is imported for the first time or when backup data is restored
How Full Import works
(1) Execute the Query of this Entity to get all the data
(2) For each row of data Row, get the pk, and assemble the Query of the sub-Entity
(3) Execute the Query of the sub-Entity to obtain the data of the sub-Entity
Incremental import
If you want to use incremental import, the prerequisite table must have two fields, one is the deletion flag field, that is, the tombstone flag: isdeleted, and the other is the data creation time field: create_date. The field names do not have to be isdeleted and create_date, but must To include two fields to indicate that. According to the comparison between the data creation time and the last incremental import operation time, the data that needs to be incrementally imported can be queried through the SQL statement, and the data marked as deleted can be queried according to the isdeleted field. The ID primary key of these data needs to be passed to solr, so that solr can synchronously delete related documents in the index to realize incremental data update. If the data in the data table is physically deleted and there is no logical flag field, it is difficult to find the deleted data, so this is why the logical deletion flag field is needed.
How Delta Import Works
(1) Find the child Entity until there is no more;
(2) Execute Entity’s deltaQuery to obtain the pk of the changed data;
(3) Merge the pk obtained by sub-Entity parentDeltaQuery;
(4) For each pk Row, assemble the parentDeltaQuery of the parent Entity;
(5) Execute parentDeltaQuery to obtain the pk of the parent Entity;
(6) Execute deltaImportQuery to obtain its own data;
(7) If there is no deltaImportQuery, assemble Query
4.After Execute
Wait a while and refresh. Can be viewed in Overview
word segmentation query
field#type must specify a tokenizer