Redis: HyperLogLog application

This article will first explain common statistical strategies for billion-level data, then introduce HyperLogLog related concepts and commands, and finally use application code examples to explain.

Table of Contents

Use of billions of data

HyperLogLog related concepts

Why use HyperLogLog for deduplication statistics?

Related commands

Application code example


The use of billions of data

Normally, the program’s statistical requirements for data include:

  • Aggregation statistics: Statistics of the aggregation results of multiple collection elements (i.e., operations between collections), such as common friends and multi-criteria filtering.
  • Sorting statistics: Sort data, such as displaying the latest lists, rankings, etc.
  • Binary statistics: a set whose values are only 0 and 1, such as a check-in and clock-in scenario.
  • Cardinality statistics: Count the unique elements in a set, such as UV and PV statistics.

HyperLogLog is often used for base statistics, such as counting UV, PV, DAU, and MAU, to evaluate the usage of a function.

UV: Unique Visitor, independent visitor, generally refers to the client IP, which needs to be deduplicated.

PV: Page View, page views, no need to remove duplicates.

DAU: Daily Active User, daily active users, often used to reflect the operation of websites, Internet applications, etc.

MAU: Mouthly Active User, monthly active users.

HyperLogLog only supports counting and cannot obtain specific data, so it is only suitable for scenarios where the main goal is to count efficiently and in large quantities, and the content of the stored data is not too concerned, such as daily registration. Number of IPs, number of daily IP visits, number of real-time page visits PV, number of visiting users UV, etc.

In addition, it should be noted that HyperLogLog does not support precise statistics, and its error is about 0.81%. (Data source: “>Redis new data structure: the HyperLogLog – )

Why should we use HyperLogLog for deduplication statistics?
  • mysql: mysql needs to be divided into databases and tables after five million data, and it is obviously unable to support billions of data.
  • Redis hash structure: With billions of data, one IP requires 15b, 7*15b*100000000/8/1000/1000/1000≈1.3GB. One week of data will use 1.3G of content. If it is more than 100 million, If the time is one month, the memory cannot be stored at all, and BigKey problems will occur.
  • HyperLogLog: Only 12Kb is needed to store 2^64 counts.
  1. PFADD key element: Add the specified element to HyperLogLog.
  2. PFCOUNT key: Returns the cardinality estimate for the given HyperLogLog.
  3. PFMERGE destkey sourcekey: Merge multiple HyperLogLogs into one HyperLogLog.

Application code example

This article will simulate the user’s IP access to the web page, start three threads to continuously send random IP addresses to the server, use HyperLogLog to record UVs and print them regularly.

  1. Start a new java program to simulate sending requests:
    public class Main2 {
        public static void main(String[] args) throws InterruptedException {
            new Main2().createSendThread();
            Thread.sleep(1000000);
        }
    
        public void createSendThread() {
            System.out.println("------Five threads start sending requests, each request comes from a different IP address--------");
            new Thread(new SendThread()).start();
            new Thread(new SendThread()).start();
            new Thread(new SendThread()).start();
        }
    
        public class SendThread implements Runnable {
            @Override
            public void run() {
                while (true){
                    try {
                        Random r = new Random();
                        String ip = r.nextInt(256) + "." + r.nextInt(256) + "." + r.nextInt(256) + "." + r.nextInt(256);
                        String postData = "ip=" + ip;
    
                        URL url = new URL("http://192.168.146.1:8080/sendIP");// Target URL
                        HttpURLConnection connection = (HttpURLConnection) url.openConnection();//Open the connection
    
                        connection.setRequestMethod("POST");//Set the request method to POST
                        connection.setDoOutput(true);//Allow output data
                        //Set request headers
                        connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
                        connection.setRequestProperty("Content-Length", String.valueOf(postData.length()));
                        // Get the output stream and write the request body data
                        OutputStream os = connection.getOutputStream();
                        os.write(postData.getBytes());
                        os.flush();
                        os.close();
                        // send request
                        connection.connect();
                        System.out.println("A request occurred" + ip);
                        // Get response
                        int responseCode = connection.getResponseCode();
                        if (responseCode == HttpURLConnection.HTTP_OK) {
                            // read response
                            BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
                            String inputLine;
                            StringBuilder response = new StringBuilder();
    
                            while ((inputLine = in.readLine()) != null) {
                                response.append(inputLine);
                            }
    
                            // print response
                            System.out.println("Response: " + response.toString());
    
                            in.close();
                        } else {
                            System.out.println("GET request failed with response code: " + responseCode);
                        }
                        // Disconnect
                        connection.disconnect();
                        //Add appropriate thread sleep time to control request frequency
                        Thread.sleep(10); // Sleep for 100 milliseconds, which can be adjusted as needed
                    } catch (IOException | InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }
        }
    }
  2. The SpringBoot project is responsible for receiving requests and recording them into the HyperLogLog data type of Redis:

Controller:

@RestController
public class ReceiveIPController {

    @Autowired
    ReceiveIPService service;

    @PostMapping("/sendIP")
    public void receiveIP(@RequestParam String ip){
        System.out.println("Receive request:" + ip);
        service.saveIP(ip);
    }

    @GetMapping("/countIP")
    public String countIP(){
        return service.countIP();
    }
}

Service:

@Service
public class ReceiveIPService {

    @Autowired
    RedisTemplate redisTemplate;

    public void saveIP(String ip){
        redisTemplate.opsForHyperLogLog().add("ip",ip);
    }

    public String countIP() {
        Long ip2 = redisTemplate.opsForHyperLogLog().size("ip");
        return String.valueOf(ip2);
    }
}

After starting the program you will find:

HyperLogLog recorded more than 60,000 IPs, but only used more than 10,000 bits.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. MySQL entry-level skills treeDatabase compositionTable 68139 people are learning the system