SparkSQL’s Analyzed LogicalPlan generation process

After AstBuilder processing, the Unresolved LogicalPlan was obtained. There are two objects that have not been resolved in the logical operator tree: UnresolvedRelation and UnresolvedAttribute. The main role of the Analyzer is to parse these two nodes or expressions into typed objects. In this process, Catalog related information is needed. Because it inherits from the […]

WSL + Vscode one-stop to build Hadoop pseudo-distributed + Spark environment

Wsl + Vscode one-stop to build Hadoop + Spark environment If you want to build an environment such as Linux, Hadoop, Spark, etc., the common practice now is to install a virtual machine on VM, Virtualbox and other software This article introduces how to build a relevant environment on the windows subsystem (Windows Subsystem for […]

9.spark adaptive query-AQE’s dynamic adjustment of Join strategy

Directory Overview Dynamically adjust Join strategy principle Actual combat Dynamically Optimize Skewed Join principle Actual combat Overview broadcast hash join is similar to broadcast variable in Spark shared variables. If Spark join can adopt this strategy, the join performance will be the best. Adaptive Query Execution (Adaptive Query Execution) Dynamically adjust Join strategy principle Actual […]

Based on Spark and Scala data statistics

Table of Contents Preface Spark Scala Data Sources process Preparation Download plugin Create a new normal Scala project Upload jsonl file to Hadoop Code (five indicator requirements) 1. Cinemas with statistical attendance rates higher than 50% The running results are as follows: 2. Count how many cinemas there are with the same name The running […]

Hive3 on Spark3 configuration

1. Software environment 1.1 Big data component environment Big Data Component Version Hive 3.1.2 Spark spark-3.0.0-bin-hadoop3.2 1.2 Operating system environment OS Version MacOS Monterey 12.1 Linux – CentOS 7.6 2. Construction of big data components 2.1 Hive environment construction 1) Hive on Spark description Hive engines include: default mr, spark, Tez. Hive on Spark: Hive […]