Background: Our project is deploying a graph database for the first time, which requires us to initialize existing business data and relationships into Neo4j as soon as they go online. The amount of data in the development environment has reached millions. The amount of generated environment data is larger.
When I first started developing, since I didn’t know much about Neo4j, the first thing I thought of was to use the code to assemble the create statement universally to create nodes and relationships.
Business description: There are many entity tables in the system, each entity table has its own data, and different entities have a relationship table for maintenance.
My development idea is: 1. First take out all the data in the table and use it as nodes. 2. Find out the relationship of this data according to the relationship table, then assemble the statement and add the data to Neo4j.
The specific code is as follows (Springboot project version 2.2.5RELEASE):
pom.xml
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-neo4j</artifactId> </dependency>
Configure the configuration file as follows:
spring: data: neo4j: uri: bolt://localhost:7687 username: neo4j password: neo4j
Use Java code to assemble CQL statements and use native sessions.
Neo4jConfig.java
@Configuration public class Neo4jConfig {<!-- --> @Value("${spring.data.neo4j.uri}") private String uri; @Value("${spring.data.neo4j.username}") private String userName; @Value("${spring.data.neo4j.password}") private String password; @Bean public org.neo4j.ogm.config.Configuration getConfiguration() {<!-- --> org.neo4j.ogm.config.Configuration configuration = new org.neo4j.ogm.config.Configuration. Builder().uri(uri).connectionPoolSize(100).credentials(userName, password).withBasePackages("com.troy.keeper.desc.repository").build(); return configuration; } @Bean public SessionFactory sessionFactory() {<!-- --> return new SessionFactory(getConfiguration()); } @Bean("neo4jTransaction") public Neo4jTransactionManager neo4jTransactionManager(SessionFactory sessionFactory) {<!-- --> return new Neo4jTransactionManager(sessionFactory); }
Interface entryController.java
@GetMapping("initDataToNeo4j") public void initDataToNeo4j() {<!-- --> service.initDataToNeo4j(); }
Service.java
//Node data is added according to your actual business. What I correspond to here is the data of all tables, because the results of all tables in my business are basically the same, that is, the node attributes are the same. The data of each table is a map, and the key is the table name as the label of the node. Map<String, List<NodeData>> nodeDataMap; //Relational data, use the relationship of each table data as a RelationData entity List<RelationData> relationDatas; //After data assembly is completed, create nodes neo4jUtil.creatNode(nodeDataMap); //Perform relationship binding neo4jUtil.bindRelation(relationDatas);
NodeData.java
private String id; //Attribute id private String name;//property name private String table;//as node label
RelationData.java
//relationship id private String id; //relationship name private String relationName; //Because my relationship here spans entities, I need to specify the end tag private String endLableName; //Because my relationship here spans entities, I need to specify the start tag private String startLableName; //The value of the starting node private String startValue; //The value of the end node private String endWhereValue;
Neo4jUtil.java
@Component public class Neo4jUtil {<!-- --> @Resource private Session session; /** * Delete the nodes under the label (including the relationship between nodes) * @param lableName * @return */ public Integer deleteByLable(String lableName) {<!-- --> if (StringUtils.isEmpty(lableName)) {<!-- --> return 0; } String cypherSql = String.format("MATCH (r:`%s`) DETACH DELETE r ", lableName); Result query = session.query(cypherSql, new HashMap<>(16)); session.clear(); return query.queryStatistics().getNodesDeleted(); } //Create node public void creatNode(Map<String, List<NodeData>> nodeDataMap) {<!-- --> if (nodeDataMap == null) {<!-- --> return ; } for(String key:nodeDataMap.keySet()){<!-- --> List<NodeData> data= nodeDataMap.get(key); if (StringUtils.isEmpty(key)) {<!-- --> continue; } //If there is no data under the table, only a node with no attributes will be created. if (data== null || data.isEmpty()) {<!-- --> String sql =String.format("create (:`%s`)",key); session.query(sql, new HashMap<>(16)); continue; } //Because it is a full import, you can delete all nodes and relationships under this label first, and add them yourself according to your business requirements. deleteByLable(key); for (NodeData nodeData:data) {<!-- --> //Compatible with Chinese and special symbols String labels = ":`" + String.join("`:`", key) + "`";; String id = nodeData.getId(); String name = nodeData.getName(); String property = String.format("{id:'%s',name:'%s'} ", id,name); String sql = String.format("create (%s%s)", labels, property); session.query(sql, new HashMap<>(16)); } } } //Binding relationship public void bindRelation( List<RelationData> relations) {<!-- --> if (relations== null) {<!-- --> return; } for (RelationData relation:relations) {<!-- --> String id = relation.getId(); String relationName = relation.getRelationName(); String startLableName = relation.getStartLableName(); String endLableName = relation.getEndLableName(); String startValue = relation.getStartValue(); String endValue = relation.getEndValue(); String property = String.format("{id:'%s',name:'%s'} ", id,relationName); String cypherSql = String.format("MATCH (n:`%s`),(m:`%s`) where n.id ='%s' and m.id= '%s' CREATE (n)-[ r:%s%s]->(m)", startLableName,endLableName,startValue,endValue,relationName,property); session.query(cypherSql, new HashMap<>(16)); } } }
Then execute the controller interface to extract and import data into Neo4j. The environment I used when developing has about 70,000 nodes and 1.2 million relationships. It took more than two hours to run with local Neo4j, and 8 hours with server deployment (cross-region). . . .
Too slow
Later, I checked the information and said that create is suitable for use when the amount of data is small. For large amounts of data import, you can use neo4j-admin import. Next, use neo4j-admin import. See Java initialization data into Neo4j (Part 2) )