6. Using Go language, assume that n keywords are stored in a hash table of size m and conflicts are resolved through the chaining method. At the same time, the length of each chain is known, including the length L of the longest chain. Please describe the process of selecting an element uniformly and randomly […]
Tag: spark
SparkSQL’s Analyzed LogicalPlan generation process
After AstBuilder processing, the Unresolved LogicalPlan was obtained. There are two objects that have not been resolved in the logical operator tree: UnresolvedRelation and UnresolvedAttribute. The main role of the Analyzer is to parse these two nodes or expressions into typed objects. In this process, Catalog related information is needed. Because it inherits from the […]
Wen Xin Yi Yan VS iFlytek Spark VS chatgpt (133) — Introduction to Algorithms 11.2 5 questions
5. Using Go language, assume that a set with n keywords is stored in a hash table of size m. Try to explain that if these keywords all originate from the universe U, and |U|>nm, then there is a subset of size n in U, which consists of all keywords hashed to the same slot, […]
Wen Xin Yi Yan VS iFlytek Spark VS chatgpt (132) — Introduction to Algorithms 11.2 4 questions
4. Using Go language, explain how to allocate and release the storage space occupied by elements by linking all unoccupied slots into a free linked list inside the hash table. It is assumed that a slot can store a flag, an element plus one or two pointers. All dictionary and free linked list operations should […]
Write programs based on Spark SQL to complete simple indicators.
Table of Contents data preparation: I have prepared the data information of members of an aerospace company as the data for this project. The details are as follows: Demand indicators: In order to visualize the ratio of the number of members among cities in various provinces, the cities ranked by the number of aerospace company […]
WSL + Vscode one-stop to build Hadoop pseudo-distributed + Spark environment
Wsl + Vscode one-stop to build Hadoop + Spark environment If you want to build an environment such as Linux, Hadoop, Spark, etc., the common practice now is to install a virtual machine on VM, Virtualbox and other software This article introduces how to build a relevant environment on the windows subsystem (Windows Subsystem for […]
9.spark adaptive query-AQE’s dynamic adjustment of Join strategy
Directory Overview Dynamically adjust Join strategy principle Actual combat Dynamically Optimize Skewed Join principle Actual combat Overview broadcast hash join is similar to broadcast variable in Spark shared variables. If Spark join can adopt this strategy, the join performance will be the best. Adaptive Query Execution (Adaptive Query Execution) Dynamically adjust Join strategy principle Actual […]
React+Spark large model, build contextual AI Q&A page (expandable)
Foreword The core functions of the open source project I wrote recently ran smoothly, and I had a sudden idea two days ago. Regarding whether the project can involve large models to assist users in using the platform, I went to study the recently popular domestic large model-iFlytek Spark model. Get large model api Console […]
Based on Spark and Scala data statistics
Table of Contents Preface Spark Scala Data Sources process Preparation Download plugin Create a new normal Scala project Upload jsonl file to Hadoop Code (five indicator requirements) 1. Cinemas with statistical attendance rates higher than 50% The running results are as follows: 2. Count how many cinemas there are with the same name The running […]
Hive3 on Spark3 configuration
1. Software environment 1.1 Big data component environment Big Data Component Version Hive 3.1.2 Spark spark-3.0.0-bin-hadoop3.2 1.2 Operating system environment OS Version MacOS Monterey 12.1 Linux – CentOS 7.6 2. Construction of big data components 2.1 Hive environment construction 1) Hive on Spark description Hive engines include: default mr, spark, Tez. Hive on Spark: Hive […]