The road to growth if you want to become proficient in algorithms and SQL – the longest substring with at least K repeating characters

The road to growth if you want to master algorithms and SQL – the longest substring with at least K repeating characters

  • Preface
  • 1. The longest substring with at least K repeating characters
    • 1.1 The premise of sliding window: two-stage nature
    • 1.2 Manually add restrictions to make them two-stage
    • 1.3 Complete code (sliding window)
    • 1.4 Another way to solve the problem (recursion)

Foreword

The path to growth if you want to be proficient in algorithms and SQL – Series Navigation

1. The longest substring with at least K repeating characters

Original title link

When we see questions of this type Finding the length of an interval, our first reaction may be to use a sliding window to complete it. But what are the important points to consider with sliding windows?

  • Under what conditions can the right boundary of the sliding window be moved (expanded)?
  • Under what conditions can the left border of the sliding window be moved (shrunk)?

1.1 The premise of sliding window: two-stage property

For this question: We assume that the length of the sliding window interval is n, and try to assume that the characters in this interval must meet the requirements of the question. So when the sliding window moves to the right, after the character at the n + 1 position is added to the window, will it also meet the question requirements?

  • If the character at the new position has appeared in the original interval, then the number of occurrences of the characters in the window interval has been satisfied > k, and it must still be satisfied at this time.
  • If the character at the new position has not appeared in the original interval, after the new character is added, the elements in the new window must not satisfy: the number of occurrences of all characters > k.

When we use the sliding window, if we get an optimal solution, we know:

  1. Characters on the left side of the left boundary of the optimal solution must not appear in the optimal solution substring.
  2. The characters on the right side of the right boundary of the optimal solution must not appear in the optimal solution substring.
  3. If it occurs, then the optimal solution is no longer the optimal solution.

It seems like nonsense, but actually what I want to express is: The characters at both ends of the left and right boundaries of the optimal solution, if they are added to the optimal solution, will definitely not meet the conditions. The optimal solution part must meet the conditions.

This is a prerequisite for the use of sliding windows: Bi-stage property. After the above analysis, it can be seen that this question does not have two stages under conventional practice. Therefore we need to manually add a limit.

1.2 Manually add restrictions to make them two-stage

We noticed that there are 26 types of characters in the question, so we can start from this starting point:

  • We assume that the number of character types in the local optimal solution is charCountLimit.
  • Then while traversing the string, we need to maintain the number of character types in the current window < charCountLimit. Under this premise, we can continuously expand the window (right pointer moves).
  • When the number of character types > the limit of charCountLimit, we can shrink the window (move the left pointer).
  • During the traversal process, we maintain the number of occurrences of each character and two attributes within the current window:
  • totalChar: The number of different character types in the current window.
  • sumChar: The number of character types in the window that meet the number of occurrences >=k.
  • In the end, as long as totalChar == sumChar, it means that the condition is met. All characters in the window meet the number of occurrences > k, and the maximum substring length can be updated.

1.3 Complete code (sliding window)

code show as below:

public int longestSubstring(String s, int k) {<!-- -->
    char[] cs = s.toCharArray();
    int res = 0, len = cs.length;
    int[] csCount = new int[26];
    //The current number of string elements: curCharCount
    for (int charCountLimit = 1; charCountLimit <= 26; charCountLimit + + ) {<!-- -->
        //Reset the number of occurrences of each element
        Arrays.fill(csCount, 0);
        // totalChar: the number of different character types in the current window. sumChar: The number of character types in the window that satisfy the number of occurrences >=k
        int left = 0, right = 0, totalChar = 0, sumChar = 0;
        // Count the number of occurrences of each element, charIndex: the subscript corresponding to the character
        while (right < len) {<!-- -->
            //Character corresponding to the right boundary (expressed in array form)
            int rightCharIndex = cs[right] - 'a';
            csCount[rightCharIndex] + + ;
            //The first time it appears, increase the total number of characters in the interval
            if (csCount[rightCharIndex] == 1) {<!-- -->
                totalChar + + ;
            }
            // If the condition is met for the first time (if there are the same characters later, it will definitely be met), then count the number of characters that meet the condition.
            if (csCount[rightCharIndex] == k) {<!-- -->
                sumChar + + ;
            }
            // If the current total number of character types > our limit, we move the left window, hoping to reduce the total number of character types
            while (totalChar > charCountLimit) {<!-- -->
                //Characters corresponding to the left boundary (expressed in array form)
                int leftCharIndex = cs[left] - 'a';
                if (csCount[leftCharIndex] == 1) {<!-- -->
                    totalChar--;
                }
                if (csCount[leftCharIndex] == k) {<!-- -->
                    sumChar--;
                }
                csCount[leftCharIndex]--;
                //Move the left window
                left + + ;
            }
            if (totalChar == sumChar) {<!-- -->
                res = Math.max(res, right - left + 1);
            }
            //Move the right window
            right + + ;
        }
    }
    return res;
}

1.4 Another way to solve problems (recursion)

  1. We directly count the number of occurrences of each character in the entire string.
  2. The question requirement is: each string of the optimal solution appears at least k times. Then we go the other way. We only need to select and delete the characters that do not meet the k times. Will the remaining characters meet the conditions? Then the length of the longest substring required by the question is the total length of characters after deletion.
  3. We assume that a certain character c does not meet the requirements of the question, so we split the string according to c. For each substring (which must not contain the character c after splitting), we can then find the length of the corresponding longest substring.
public int longestSubstring(String s, int k) {<!-- -->
    if (s.length() < k) {<!-- -->
        return 0;
    }
    HashMap<Character, Integer> count = new HashMap();
    for (int i = 0; i < s.length(); i + + ) {<!-- -->
        count.put(s.charAt(i), count.getOrDefault(s.charAt(i), 0) + 1);
    }
    for (char c : count.keySet()) {<!-- -->
        // Find unsatisfied string c
        if (count.get(c) < k) {<!-- -->
            int res = 0;
            // Split according to c, each substring after split does not contain c
            for (String t : s.split(String.valueOf(c))) {<!-- -->
                // Find the maximum substring length
                res = Math.max(res, longestSubstring(t, k));
            }
            return res;
        }
    }
    // If we get to this step, the string here must meet the question requirements, because we have deleted all the characters that do not meet the adjustment
    return s.length();
}