hadoop mapreduce api calls WordCount native and cluster code

Run code natively package com.example.hadoop.api.mr; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; public class WordCount { /** * Text: refers to StringWritable * (LongWritable, Text) Input on the map side: these two parameters will never change, Text: text data, LongWritable: offset (offset […]

MapReduce programming: data filtering and saving, UID deduplication

Article directory MapReduce programming: data filtering and saving, UID deduplication 1. Experimental goals 2. Experimental requirements and precautions 3. Experimental content and steps Attachment: series of articles MapReduce programming: data filtering and saving, UID deduplication 1. Experimental objectives Proficient in writing Mapper class, Reducer class and main function Proficient in local testing methods Proficient in […]

Big data computing-Mapreduce framework

1. Programming to implement file merging and deduplication operations For two input files, namely file A and file B, please write a MapReduce program to merge the two files and remove duplicate content to obtain a new output file C. Below is a sample input file and output file for reference. (The input file is […]

MapReduce programming: join operations and aggregation operations

Article directory MapReduce programming: join operations and aggregation operations 1. Experimental goals 2. Experimental requirements and precautions 3. Experimental content and steps Attachment: series of articles MapReduce programming: join operations and aggregation operations 1. Experimental objectives Understand the distributed processing workflow of the MapReduce computing framework Master the use of mapreduce computing framework to implement […]

Experiment 1. Basic programming methods of MapReduce

1. Experiment purpose Understand the MapReduce workflow; Master the basic programming methods of MapReduce; 2. Experimental platform Operating system: Linux (Ubuntu16.04 recommended); Hadoop version: 2.7.1; JDK version: 1.7 or above; Java IDE: IDEA 3. Experiment content (1). Word deduplication: deduplicate all words in a file and output them as deduplicated words (1)Write MapReduce code (2) […]

How does the number of MapReduce tasks affect execution efficiency? Performance optimization starts here

Before starting the text, please answer this question: Question: The input is 3 files, a.txt 300MB, b.txt 100MB, c.txt 58.MB. Use the example program of MapReduce to calculate Wordcount. How many MapTasks should there be? A, 5 B, 4 C, 3 D, 2 This is a very simple question in MR knowledge points. The knowledge […]

Supplement: MapReduce TopN case

1. Demand Process the output results of the traffic case and output the information of the top 10 users in terms of traffic usage. 2. Demand analysis 3. Write code (1)FlowBean The FlowBean code implements WritableComparable on the original basis and implements the compareTo method: package com.wolf.mr.topn; import org.apache.hadoop.io.WritableComparable; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; […]

05-MapReduce(2) Serialization

Table of Contents 1. Overview of serialization 2. Custom bean object implements serialization interface (Writable) 3. Serialization case practice (1) Requirements (2) Demand analysis (3) Write MapReduce program 1) Build package 2) Write FlowBean.java 3) Write FlowCountMapper.java 4) Write FlowCountReducer.java 5) Write FlowCountDriver.java 6) Test program 7) Cluster distributed operation 1. Serialization Overview 2. The […]