Intensive reading of “Algorithm Questions – Minimum Covering Substring”

Today we look at a leetcode hard difficulty problem: the smallest covering substring.

Title

Given a string s and a string t . Returns the smallest substring in s covering all characters in t. Returns the empty string "" if no substring covering all characters of t exists in s .

Notice:

For a repeated character in t, the number of characters in the substring we are looking for must be no less than the number of characters in t. If such a substring exists in s , we guarantee that it is the only answer.

Example 1:

Input: s = "ADOBECODEBANC", t = "ABC"
Output: "BANC"
Explanation: The minimal covering substring "BANC" contains 'A', 'B' and 'C' from string t. 

Thinking

The easiest way to think about it is that s judges whether the conditions are met one by one from the substrings formed by subscripts 0~n, such as:

  • ADOBEC..

  • DOBE CO..

  • OBECOD..

Because the minimum covering substring is continuous, this method can guarantee to traverse all the substrings satisfying the condition. code show as below:

function minWindow(s: string, t: string): string {
  // t The total length of remaining matches
  let tLeftSize = t.length
  // t Each letter corresponds to the number of occurrences table
  const tCharCountMap = {}

  for (const char of t) {
    if (!tCharCountMap[char]) {
      tCharCountMap[char] = 0
    }
    tCharCountMap[char]++
  }

  let globalResult = ''

  for (let i = 0; i < s.length; i ++ ) {
    let currentResult = ''
    let currentTLeftSize = tLeftSize
    const currentTCharCountMap = { ...tCharCountMap }

    // Find the string that starts with the i subscript and satisfies the condition
    for (let j = i; j < s. length; j ++ ) {
      currentResult += s[j]
      
      // if this item exists in t, decrement it by 1
      if (currentTCharCountMap[s[j]] !== undefined & amp; & amp; currentTCharCountMap[s[j]] !== 0) {
        currentTCharCountMap[s[j]]--
        currentTLeftSize--
      }

      // matched
      if (currentTLeftSize === 0) {
        if (globalResult === '') {
          globalResult = currentResult
        } else if (currentResult. length < globalResult. length) {
          globalResult = currentResult
        }
        break
      }
    }
  }

  return globalResult
};

We use tCharCountMap to store the number of occurrences of each character in t, and subtract 1 every time we find a character that has appeared during traversal until tLeftSize becomes 0, indicating that s completely covers t.

Because this method is executed n + n-1 + n-2 + … + 1 times, the time complexity is O(n2), which cannot be AC, so we need to find a faster solution.

Sliding window

The performance-seeking degradation solution is sliding window or dynamic programming. This topic calculates strings, which is not suitable for dynamic programming.

Is the sliding window suitable?

The problem to be calculated is the substring that satisfies the condition. The substring must be continuous. The sliding window will not miss the result in the continuous substring matching problem, so this solution can definitely be used.

The idea is also very easy to think, that is: If the current string covers t, the left pointer moves to the right, otherwise the right pointer moves to the right. Just like a window scan to see if the condition is met, you need to move the right pointer to the right to judge whether the condition is met. After the condition is met, it may not be optimal, and you need to continue to move the left pointer to the right to find other answers.

One difficulty here is how to efficiently judge whether the string in the current window covers t. There are three ideas:

The first idea is to make a counter for each character, and then make a total counter. Whenever a character is matched, the current character counter and the total counter + 1, so that the total counter can be used to judge directly. But there is a loophole in this method, that is, the total counter does not contain character types, such as matching 100 b in a row, and the total counter is + 1. At this time, c is actually missing. Then when c is matched, the value of the total counter cannot be judged to be covered.

The optimized version of the first method may be binary, such as represented by 26 01s, but unfortunately each character will appear more than 1 times, and it is not a Boolean type, so it is not acceptable to use this method as a trick.

The second method is a stupid method. Every time it recurses, it is judged whether the current number of each character collected in the s string exceeds the number of occurrences of each character in the t string. The disadvantage is that each recursion loops at most 25 times.

The third method that the author thinks of is that a counter is still needed, but this counter notCoverChar is a Set type, which records whether each char is not ready, the so-called ready means the number of times the char appears in the current window >= the number of times the char appears in the t string. At the same time, sCharMap and tCharMap are needed to record the number of occurrences of each character of the two strings. When the right pointer moves to the right, sCharMap corresponds to char counts up, if the char occurs more than t the char occurs, from notCoverChar ; when the left pointer moves to the right, sCharMap corresponding char count decreases, if the char occurs less than t The number of occurrences of the char that the char re-places in notCoverChar.

code show as below:

function minWindow(s: string, t: string): string {
  // s table of occurrences of each letter
  const sCharMap = {}
  // t Each letter corresponds to the number of occurrences table
  const tCharMap = {}
  // What characters are not covered
  const notCoverChar = new Set<string>()

  // Calculate the number of occurrences of each character in t
  for (const char of t) {
    if (!tCharMap[char]) {
      tCharMap[char] = 0
    }
    tCharMap[char] + +
    notCoverChar.add(char)
  }

  let leftIndex = 0
  let rightIndex = -1
  let result = ''
  let currentStr = ''

  // leftIndex | rightIndex will stop if it exceeds the limit
  while (leftIndex < s.length & amp; & amp; rightIndex < s.length) {
    // Uncovered condition: notCoverChar length > 0
    if (notCoverChar. size > 0) {
      // There is no cover t in the window at this time, rightIndex moves to the right to find
      rightIndex++
      const nextChar = s[rightIndex]
      currentStr += nextChar
      if (sCharMap[nextChar] === undefined) {
        sCharMap[nextChar] = 0
      }
      sCharMap[nextChar] + +
      // If tCharMap has this nextChar, and the collected quantity exceeds the quantity in t, this char is ready
      if (
        tCharMap[nextChar] !== undefined & amp; & amp;
        sCharMap[nextChar] >= tCharMap[nextChar]
      ) {
        notCoverChar.delete(nextChar)
      }
    } else {
      // At this time, the window just covers t, record the shortest result
      if (result === '') {
        result = currentStr
      } else if (currentStr. length < result. length) {
        result = currentStr
      }
      // leftIndex is about to move to the right, and the number of corresponding chars in sCharMap will be reduced by 1
      const previousChar = s[leftIndex]
      sCharMap[previousChar]--
      // If the number of previousChar in sCharMap is less than the number of tCharMap, it cannot be covered
      if (sCharMap[previousChar] < tCharMap[previousChar]) {
        notCoverChar.add(previousChar)
      }
      // leftIndex move right
      leftIndex++
      currentStr = currentStr. slice(1, currentStr. length)
    }
  }
  
  return result
};

Some small caches are also used, such as currentStr to record the string in the current window, so that when t can be overwritten, the current string can be obtained at any time without the need to The left and right pointers retraverse.

Summary

For this question, first of all, dynamic programming must be ruled out, and based on the characteristics of continuous substrings, it is first thought that the sliding window can cover all possibilities.

After thinking about the sliding window scheme, you need to think about how to judge with high performance that the string in the current window can cover t, notCoverChar is a good idea.

The discussion address is: Intensive reading of “Algorithm – Minimum Covering Substring” Issue #496 dt-fe/weekly

If you want to join the discussion, please click here, there are new topics every week, released on weekends or Mondays. Front-end intensive reading – help you filter reliable content.

Copyright Notice: Reprint freely – noncommercial – nonderivative – attribution maintained (Creative Commons 3.0 license)

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge algorithm skill treeHome pageOverview 50836 people are studying systematically