2. Using Go language, for a hash table that uses the linking method to resolve conflicts, describe the process of inserting the keywords 5,28,19,15,20,33,12,17,10 into the table. Assume there are 9 slots in the table, and let its hash function be h(k)=k mod 9. Wen Xin’s words: In this problem, we use the Go language […]
Tag: spark
20 | Spark performance optimization case analysis (Part 2)
In the last issue, we talked about software performance optimization that must be conducted through performance testing and based on an understanding of software architecture and technology. Today, we use several Spark performance optimization cases to see how the performance optimization principles mentioned are implemented. If you forget the principles of performance optimization, you can […]
15 | Representatives of streaming computing: Storm, Flink, Spark Streaming
The big data technology introduced earlier mainly processes and calculates large-scale data on storage media. This type of calculation is also called big data batch computing. As the name implies, the data is calculated in batches, such as one day’s access logs, all order data in history, etc. These data are usually stored on the […]
7. spark sql programming
Directory Overview The difference between RDD, Datasets, DataFrames Datasets, DataFrames and RDDs getting Started people.json SparkSession Create DataFrames DataFrame operations Run sql queries programmatically Create Datasets Convert DataFrames to RDDs and back Using reflection inference mode Encoding issues Programmatically Specify Schema The problem of incomplete code in official documents Finish Overview The spark version is […]
13 | With the same essence, why can Spark be more efficient?
In the last issue, we discussed the programming model of Spark. In this issue, we talk about the architectural principles of Spark. Like MapReduce,Spark also follows the basic principle of big data computing that mobile computing is more cost-effective than moving data. However, compared with MapReduce’s rigid Map and Reduce staged calculations, Spark’s computing framework […]
Wen Xin Yi Yan VS iFlytek Spark VS chatgpt (128) — Introduction to Algorithms 11.1 3 questions
3. Using Go language, try to explain how to implement a direct addressing table. The keywords of each element in the table do not have to be different, and each element can have satellite data. All three dictionary operations (INSERT, DELETE and SEARCH) should run in O(1) time (don’t forget that DELETE deals with the […]
Read from Spark.sql to Lightgbm model storage
Summary This article will introduce the steps to read from Spark.sql to Lightgbm model storage Overall architecture process Import essential toolkit, data reading, data preprocessing, model building, model evaluation, field filtering, model storage Technical details 1. Import necessary tool packages from pyspark.conf import SparkConf #SparkConf contains various parameters for spark cluster configuration from pyspark.sql import […]
Error when compiling Spark source code locally in idea
Report the error content first [INFO] Scanning for projects… [INFO] ————————————————– ————————– [INFO] Detecting the operating system and CPU architecture [INFO] ————————————————– ————————– [INFO] os.detected.name: osx [INFO] os.detected.arch: x86_64 [INFO] os.detected.version: 10.15 [INFO] os.detected.version.major: 10 [INFO] os.detected.version.minor: 15 [INFO] os.detected.classifier: osx-x86_64 [INFO] ————————————————– ————————– [INFO] Reactor Build Order: [INFO] [INFO] Spark Project Parent POM [pom] […]
Spark memory management
Introduction Since there is an overflow mechanism, why does OOM still occur? What are these two memories used for? set spark.executor.memory = 4g; set spark.executor.memoryOverhead = 3g; Is it possible to directly use JVM garbage collection for memory management? 1. Problems to be solved by the memory management mechanism Big data processing frameworks such as […]
Run the spark program in IDEA (build Spark development environment)
It is recommended that you write and build Hadoop’s fully distributed cluster environment and Spark cluster environment on Linux. The environment built in IDEA below is only for developing and learning spark programs on the window system. You do not need to install hadoop and spark on the window system. , the spark program can […]