[Redis extension] Redis two advanced data structures – HyperLogLog, BitMap

1. HyperLoglog-cardinality statistics

1. HyperlogLog data type characteristics

  • Redis HyperLogLog is an algorithm used for cardinality statistics to complete the statistics of independent totals
  • The advantage of HyperLogLog is that when the number or volume of input elements is very, very large, the space required to calculate the cardinality is always fixed and small. With a cost of 12 KB of memory, the cardinality of nearly 2^64 different elements can be calculated.
  • Because HyperLogLog only calculates the cardinality based on the input elements, and does not store the input elements themselves, HyperLogLog cannot return the individual elements of the input like a set.
  • Its bottom layer uses the string data type
  • It is an imprecise statistical algorithm with a standard error of 0.81%

What is cardinality
The number of unique elements in the dataset.

2. Application scenarios

  • Page visits (UV): A user who visits multiple times can only be counted as one person.
    The traditional implementation stores the user’s id and compares each time. This method wastes space when there are more users, and our purpose is just to count, Hyperloglog can help us use the smallest space to complete.

  • That is, if fault tolerance is allowed or the accuracy requirements are not so high, then Hyperloglog must be used!

  • If fault tolerance is not allowed, just use set or your own data type!

3. Commonly used API and use Spring client test

redis native API

# Add elements and statistics
127.0.0.1:6379> PFADD myelemx a b c d e f g h i j k # add element
(integer) 1
127.0.0.1:6379> type myelemx # The bottom layer of hyperloglog uses String
string
127.0.0.1:6379> PFCOUNT myelemx # Estimate the cardinality of myelemx
(integer) 11
127.0.0.1:6379> PFADD myelemy i j k z m c b v p q s
(integer) 1
127.0.0.1:6379> PFCOUNT myelemy
(integer) 11

# merge
127.0.0.1:6379> PFMERGE myelemz myelemx myelemy # Merge myelemx and myelemy to become myelemz
OK
127.0.0.1:6379> PFCOUNT myelemz # estimate cardinality
(integer) 17

Spring client operation API

opsForHyperLogLog().add(pfKey, i)-add data

opsForHyperLogLog().size(pfKey) – the cardinality of statistical data

/**
 * Test the operation of HyperlogLog
 */
@Test
public void testHyperLogLog()
    // Add 100 00 unique numbers, 100 00 repeated numbers - a total of 200,000 numbers
    String pfKey = "test:hll:01";
    for (int i = 0; i < 10000; i ++ ) {<!-- -->
        redisTemplate.opsForHyperLogLog().add(pfKey, i);
    }
    for (int i = 0; i < 10000; i ++ ) {<!-- -->
        int r = (int)(Math. random() * 10000);
        redisTemplate.opsForHyperLogLog().add(pfKey, r);
    }
    // Count the number of all unique bases in the specified key
    long size = redisTemplate.opsForHyperLogLog().size(pfKey);
    System.out.println(size);
}

opsForHyperLogLog().union(unionKey, pfKey2, pfKey3, pfKey4) Merge the numbers of pfKey2, pfKey3, pfKey4 into unionKey

/**
 * Merge data - and count the combined cardinality
 */
@Test
public void testHyperLogLogUnion() {<!-- -->
    String pfKey2 = "test:hll:02";
    String pfKey3 = "test:hll:03";
    String pfKey4 = "test:hll:04";
    for (int i = 0; i < 10000; i ++ ) {<!-- -->
        redisTemplate.opsForHyperLogLog().add(pfKey2, i);
    }
    for (int i = 5000; i < 15000; i ++ ) {<!-- -->
        redisTemplate.opsForHyperLogLog().add(pfKey3, i);
    }
    for (int i = 10000; i < 20000; i ++ ) {<!-- -->
        redisTemplate.opsForHyperLogLog().add(pfKey4, i);
    }
    // Merge three sets of numbers
    String unionKey = "test:hll:union";
    redisTemplate.opsForHyperLogLog().union(unionKey, pfKey2, pfKey3, pfKey4);

    // Count the combined cardinality
    long size = redisTemplate.opsForHyperLogLog().size(unionKey);
    System.out.println(size);
}

2. BitMap-bitmap

1. BitMap data structure characteristics

  • With bit storage, information states are only 0 and 1
    • It can be seen as a Byte array
    • Boolean values that can store large amounts of continuous data

2. Application scenarios

  • Check-in statistics, status statistics

Statistical user information, active, inactive! Log in , Not log in! Check in, 365 check in! Both states can be used
Bitmaps!

3. Commonly used API and Spring client testing

redis native API

command description
pfadd key element1 [elememt2…] Add the specified element to the HyperLogLog
pfcount key [key] Returns the cardinality estimate for the given HyperLogLog.
pfmerge mergekey sourcekey [sourcekey…] Merge multiple HyperLogLogs into one HyperLogLog
Command Description
setbit key offset value Set the value for the offset bit of the specified key
getbit key offset Get The value of the offset bit
bitcount key [start end] The number of bits whose statistical string is set to 1, you can also specify the statistical range by byte
bitop operation destkey key[key…] Perform bit operation on one or more string keys that save binary bits, and save the result to destkey.
BITPOS key bit [start] [end] Returns the first bit in the string that is set to 1 or 0. start and end can only be byte-wise, not bit-wise
# Add data and get data
127.0.0.1:6379> setbit sign 0 1 # Set the 0th bit of sign to 1
(integer) 0
127.0.0.1:6379> setbit sign 2 1 # Set the second bit of sign to 1, if not set, the default is 0
(integer) 0
127.0.0.1:6379> setbit sign 3 1
(integer) 0
127.0.0.1:6379> setbit sign 5 1
(integer) 0
127.0.0.1:6379> type sign # The bottom layer is also of type String
string

127.0.0.1:6379> getbit sign 2 # Get the value of the second bit
(integer) 1
127.0.0.1:6379> getbit sign 3
(integer) 1
127.0.0.1:6379> getbit sign 4 # If not set, the default is 0- and false
(integer) 0

# The number of 1 in the statistical data - that is, the number of true
127.0.0.1:6379> BITCOUNT sign # Count the digits of 1 in the sign
(integer) 4

spring client operation API

  • opsForValue().setBit(bitKey, 1, true)– set state
    The default status of each bit is false
  • opsForValue().getBit(bitKey, 0))– query status
  • redisConnection.bitCount(bitKey.getBytes())– count the number of true status
/**
 * Test the operation of BitMaps
 * Records - query and statistics
 */
@Test
public void testBitMap() {<!-- -->
    String bitKey = "test:bit:01";
    // record data status - default false
    redisTemplate.opsForValue().setBit(bitKey, 1, true);
    redisTemplate.opsForValue().setBit(bitKey, 4, true);
    redisTemplate.opsForValue().setBit(bitKey, 7, true);

    // Inquire
    System.out.println(redisTemplate.opsForValue().getBit(bitKey, 0));
    System.out.println(redisTemplate.opsForValue().getBit(bitKey, 1));
    System.out.println(redisTemplate.opsForValue().getBit(bitKey, 2));

    // statistics
    Object execute = redisTemplate. execute(new RedisCallback() {<!-- -->
        @Override
        public Object doInRedis(RedisConnection redisConnection) throws DataAccessException {<!-- -->
            return redisConnection. bitCount(bitKey. getBytes());
        }
    });

    System.out.println(execute);
}

connection.bitOp()– bit operation

/**
 * OR operation
 * Count the Boolean values of 3 sets of data, and perform OR operation on these 3 sets of data.
 */
@Test
public void testBitMapOperation() {<!-- -->
    String bitKey2 = "test:bm:02";
    redisTemplate.opsForValue().setBit(bitKey2, 0, true);
    redisTemplate.opsForValue().setBit(bitKey2, 1, true);
    redisTemplate.opsForValue().setBit(bitKey2, 2, true);

    String bitKey3 = "test:bm:03";
    redisTemplate.opsForValue().setBit(bitKey3, 2, true);
    redisTemplate.opsForValue().setBit(bitKey3, 3, true);
    redisTemplate.opsForValue().setBit(bitKey3, 4, true);

    String bitKey4 = "test:bm:04";
    redisTemplate.opsForValue().setBit(bitKey4, 4, true);
    redisTemplate.opsForValue().setBit(bitKey4, 5, true);
    redisTemplate.opsForValue().setBit(bitKey4, 6, true);

    // Merge processing
    String bitKeyOR = "test:bm:or";
    Object obj = redisTemplate. execute(new RedisCallback() {<!-- -->
        @Override
        public Object doInRedis(RedisConnection connection) throws DataAccessException {<!-- -->
            connection.bitOp(RedisStringCommands.BitOperation.OR,
                    bitKeyOR.getBytes(), bitKey2.getBytes(), bitKey3.getBytes(), bitKey4.getBytes());
            return connection. bitCount(bitKeyOR. getBytes());
        }
    });

    System.out.println(obj); // The number of statistics

    // After merging, the status of each bit
    System.out.println(redisTemplate.opsForValue().getBit(bitKeyOR, 0));
    System.out.println(redisTemplate.opsForValue().getBit(bitKeyOR, 1));
    System.out.println(redisTemplate.opsForValue().getBit(bitKeyOR, 2));
    System.out.println(redisTemplate.opsForValue().getBit(bitKeyOR, 3));
    System.out.println(redisTemplate.opsForValue().getBit(bitKeyOR, 4));
    System.out.println(redisTemplate.opsForValue().getBit(bitKeyOR, 5));
    System.out.println(redisTemplate.opsForValue().getBit(bitKeyOR, 6));
}