1. Set up Spark project structure
Add dependencies in the pom.xml file of the SparkProject module and wait for the dependency package to be downloaded, as shown above.
? <!-- Spark and Scala version numbers --> <properties> <scala.version>2.11</scala.version> <spark.version>2.1.1</spark.version> </properties> <!-- Mysql component <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.7.22.1</version> </dependency> dependencies --> <!-- Dependencies of each component of Spark --> <dependencies> <!-- https://mvnrepository.com/artifact/com.thoughtworks.paranamer/paranamer --> <dependency> <groupId>com.thoughtworks.paranamer</groupId> <artifactId>paranamer</artifactId> <version>2.8</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.11</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-mllib_2.11</artifactId> <version>2.1.1</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka-0-10_2.11</artifactId> <version>2.3.0</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka-0-8_${scala.version}</artifactId> <version>2.3.0</version> </dependency> <dependency> <groupId>net.jpountz.lz4</groupId> <artifactId>lz4</artifactId> <version>1.3.0</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.18</version> </dependency> <dependency> <groupId>org.apache.flume.flume-ng-clients</groupId> <artifactId>flume-ng-log4jappender</artifactId> <version>1.7.0</version> </dependency> <!-- <dependency>--> <!-- <groupId>org.apache.spark</groupId>--> <!-- <artifactId>spark-streaming-flume-sink_2.10</artifactId>--> <!-- <version>1.5.2</version>--> <!-- </dependency>--> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.12</artifactId> <version>2.4.8</version> </dependency> </dependencies> <!-- Configure maven packaging plug-in and packaging type --> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.8.1</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> </plugins> </build> ?
2. Solve the problem of being unable to create scala files
3. Write the LoggerLevel trait
Add the following code under the trait
Logger.getLogger("org").setLevel(Level.ERROR)
At this time, a guide package is needed
The complete code is as follows:
import org.apache.log4j.{Level, Logger} trait LoggerLevel { Logger.getLogger("org").setLevel(Level.ERROR) }
4. Write the getLocalSparkSession method
Here is the complete code:
import org.apache.spark.sql.SparkSession object SparkUnit { /** * A class parameter **/ def getLocalSparkSession(appName: String): SparkSession = { SparkSession.builder().appName(appName).master("local[2]").getOrCreate() } def getLocalSparkSession(appName: String, support: Boolean): SparkSession = { if (support) SparkSession.builder().master("local[2]").appName(appName).enableHiveSupport().getOrCreate() else getLocalSparkSession(appName) } def getLocalSparkSession(appName: String, master: String): SparkSession = { SparkSession.builder().appName(appName).master(master).getOrCreate() } def getLocalSparkSession(appName: String, master: String, support: Boolean): SparkSession = { if (support) SparkSession.builder().appName(appName).master(master).enableHiveSupport().getOrCreate() else getLocalSparkSession(appName, master) } def stopSpark(ss: SparkSession) = { if (ss != null) { ss.stop() } } }
The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Cloud native entry-level skills treeHomepageOverview 16648 people are learning the system