The essence of the cache and memory function memoize

I am participating in the “Nuggets·Starting Plan”

Foreword

Memory functions, that is, functions with caching capabilities.

For ordinary calculation functions, more space is exchanged for time, which has certain advantages in a large number of complex calculation scenarios (long recursive or long iterative operations)

Just look at an example

function add(a, b) {<!-- -->
    return a + b;
}

// Assuming memoize can implement function memoization
var memoizedAdd = memoize(add);

memoizedAdd(1, 2) // 3
memoizedAdd(1, 2) // The same parameters, the second time the data is called from the cache, instead of recalculating

Analysis

  • Definition: Function memory refers to caching the last calculation result. When calling next time, if the same parameter is encountered, the data in the cache will be returned directly.
  • Principle: It is very simple to implement such a memoize function. In principle, it only needs to store the parameters and corresponding result data in an object. When calling, judge whether the data corresponding to the parameters is If it exists, the corresponding result data will be returned if it exists.

In fact, it looks a bit like a higher-order function. Our memoize function receives a function as a parameter, and internally processes the function parameters. If the parameters are the same We directly return the cached result, otherwise execute the function once, put the result into the cache object, and finally return the processed function.

However, when Lenovo uses React.mome, this function can receive the second parameter. Its function is to customize the comparison method to decide whether to update or use the cache. Later, we can also provide this capability for User-defined use.

Here we treat add function with memoize and give memoizedAdd as a new function. In terms of use, we can use it normally, but currently the same function will only cache the last running result

Think about the handling of function parameters, running results, and unique key values.

Simple try

According to the above ideas, we try to implement a memoize function, mainly considering the cache object and parameter judgment

Example: A simple multiplication operation

function getSquare(x) {<!-- -->
  return x * x
}

Implement caching capabilities

// Step 1: Add memo cache object
const memo = {<!-- -->}

// Step 2: Parameter and key processing
function getSquare(x) {<!-- -->
  // exist and return directly
  if(memo[x]){<!-- -->
      return memo[x]
  }

  // Does not exist, cache first, then return
  memo[x] = x * x

  return memo[x]
}

The judgment of memo[x] can be optimized to memo.hasOwnProperty(x), excluding the case where the value is undefined

Xiaolu Yishou

First of all, there are multiple function parameters, and the key value processing needs to be adjusted. Finally, the package abstracts the memoize function for easy use

Example: a division operation

const getDivision = (a, b) => a / b

target expectation

const getDivision = (a, b) => a / b
const getKey = (a, b) => `${a}_${b}` // custom key

const memoGetDivision = memoize(getDivision, getKey)

memoGetDivision(4, 2) // 2
memoGetDivision(4, 2) // 2 is not counted, return the result directly

When there are two or more parameters, it is necessary to combine all parameters to generate a unique key, such as: key =${a}_${b}, however, if two numbers are multiplied by a * b When adjusting the order of parameters to b * a, consideration should be given to judging the writing method, but currently we cannot predict whether the function and its parameters passed in by the user will be affected by the order, so , this situation is best left to the user to write the function judgment

For the encapsulation of the memoize function, we receive two parameters and return an anonymous function

  • Parameter 1: Receive a function whose parameters are uncertain
  • Parameter 2: Receive a judgment function whether to use the cache

Memoize function implementation

function memoize(fn, getKey) {
    // cache object
    const memo = {}

    // return anonymous function
    // After wrapping the incoming function fn, the returned memory function memoGetDivision
    return function memoized(...args) { // ...args => 4, 2
        // get key
        const key = getKey(...args)

        // Whether there is cache judgment
        if (memo. hasOwnProperty(key)) {
            return memo[key]
        }

        // By using apply to execute the fn function, get the value and save it in memo
        memo[key] = fn. apply(this, args) // args => [4, 2]

        // return value
        return memo[key]
    }
}

// memo { '4_2': 2 }

Asynchronous function: event callback

A few thoughts:

  1. Several forms of asynchronous functions: callback event callback, promise
  2. The execution result of an asynchronous function is not returned immediately
  3. Situation handling when multiple asynchronous functions are executed at the same time
  4. queue

Ideas:

When an asynchronous function is called multiple times, the key is the unique identifier (that is, the parameters are the same), and all are put into the queue. When any one receives the return value, save the value to the cache, notify all in the queue, and clear it queue

queues[key] = [callback1, callback2, callback3]

Assume that an asynchronous function expensiveOperation will return a notification after 1000ms is executed, first look at the form of event callback callback

// asynchronous callback function
expensiveOperation(args, (data) => {<!-- -->
  // Do something
})
  • After adding memoize
const memo = {<!-- -->}
function memoExpensiveOperation(key, callback) {<!-- -->
  if (memo. hasOwnProperty(key)) {<!-- -->
    callback(memo[key])
    return
  }

  expensiveOperation(key, (data) => {<!-- -->
    memo[key] = data
    callback(data)
  })
}

There is a problem: when executing multiple times, the first result may not be returned, and the expensiveOperation function will be executed multiple times

How to deal with it: When the data data has not been returned, the newly added expensiveOperation function is no longer added to the execution environment, but directly put into the queue to ensure that there is only one same asynchronous function While executing and waiting for a response, all newly added expensiveOperation functions will wait for the result notification of the first execution

  • Optimized writing
const memo = {<!-- -->} // cache object
const progressQueues = {<!-- -->} // running queue

function memoExpensiveOperation(key, callback) {<!-- -->
  if (memo. hasOwnProperty(key)) {<!-- -->
    callback(memo[key])
    return
  }

  if (!progressQueues.hasOwnProperty(key)) {<!-- -->
    // When the key does not exist, add a new queue with the key as the logo
    progressQueues[key] = [callback]
  } else {<!-- -->
    // When the key exists, add it to the queue, then exit
    progressQueues[key].push(callback)
    // Exit directly without continuing to execute the expensiveOperation function, that is, the expensiveOperation function will only be executed once
    return
  }

  expensiveOperation(key, (data) => {<!-- -->
    // cache the result
    memo[key] = data
    // notify all queues
    for (let callback of progressQueues[key]) {<!-- -->
      callback(data)
    }
    // clear the queue
    delete progressQueue[key]
  })
}

The above memo is used to cache the results, progressQueues is used to save the function queue, and the key ensures that only one execution function is running, and the rest are added to the queue uniformly Receive the result of executing the function

  • The example is as follows, encapsulate the memoizeAsync function
expensiveOperation(key, (data) => {<!-- -->
  // Do something
})

const memoExpensiveOperation = memoizeAsync(expensiveOperation, (key) => key)
function memoizeAsync(fn, getKey) {<!-- -->
  const memo = {<!-- -->}
  const progressQueues = {<!-- -->}

  return function memoized(...allArgs) {<!-- -->
    // Get the event callback callback function of the function
    const callback = allArgs[allArgs. length - 1]
    // get all parameters
    const args = allArgs. slice(0, -1)
    // custom key
    const key = getKey(...args)

    if (memo. hasOwnProperty(key)) {<!-- -->
      callback(key)
      return
    }

    if (!progressQueues.hasOwnProperty(key)) {<!-- -->
      progressQueues[key] = [callback]
    } else {<!-- -->
      progressQueues[key].push(callback)
      return
    }

    // Note: Here we use call instead of apply to execute the function
    // Because you need to pass a callback function (the original function structure) at the end, you don't use the apply array to pass it, but replace it with the form of parameter list call
    fn.call(this, ...args, (data) => {<!-- -->
      // memoize result
      memo[key] = data
      // process all the enqueued items after it's done
      for (let callback of progressQueues[key]) {<!-- -->
        callback(data)
      }
      // clean up progressQueues
      delete progressQueue[key]
    })
  }
}

The main thing is to put the variable object inside the function execution environment, complete the isolation in the form of closure, and then replace fn.apply with fn.call

Asynchronous function: promise

For the promise function, consider wrapping the whole process with promise and return promise

The internal anonymous function is adjusted to new Promise((resolve, reject) => {}), and resolve, reject return the execution status result value

Suppose there is an asynchronous function fetchData(args), an example of cache capability implementation

const memo = {<!-- -->}
const progressQueues = {<!-- -->}

function memoizePromise(fn, getKey) {<!-- -->
    return new Promise((resolve, reject) => {<!-- -->
      // There is a cache, and the result value is returned by resolve
      if (memo. hasOwnProperty(key)) {<!-- -->
        resolve(memo[key])
        return
      }

      // queue
      // consider success/abnormality, put [resolve, reject] into the queue
      if (!progressQueues.hasOwnProperty(key)) {<!-- -->
        progressQueues[key] = [[resolve, reject]]
      } else {<!-- -->
        progressQueues[key].push([resolve, reject])
        return
      }

      // Execute the asynchronous function fetchData
      fetchData(key)
        .then((data) => {<!-- -->
          memo[key] = data // cache data
          // success queue notification
          for (let [resolver] of progressQueues[key]) resolver(data)
        })
        .catch((error) => {<!-- -->
          // failure queue notification
          for (let [, rejector] of progressQueues[key]) rejector(error)
        })
        .finally(() => {<!-- -->
          // clear the queue
          delete progressQueues[key]
        })
    })
}

For the cache encapsulation of promise, you can check the library p-memoize

Analysis of the implementation of memory functions by third-party libraries such as lodash.memoize and memoize-one

lodash.memoize

The source code is as follows, the writing method is relatively concise and smart, and the logic is similar

Also provide two parameters func – the original function itself, resolver custom key execution function

/**
 * @param {Function} func The function to have its output memoized.
 * @param {Function} [resolver] The function to resolve the cache key.
 * @returns {Function} Returns the new memoized function.
 */

function memoize(func, resolver) {<!-- -->
  // exception filtering
  if (typeof func !== "function" || (resolver != null & amp; & amp; typeof resolver !== "function")) {<!-- -->
    throw new TypeError("Expected a function")
  }

  // return function memoized
  const memoized = function (...args) {<!-- -->
    // Get the key: if the resolver exists, the function execution result will be taken, otherwise the first parameter will be taken
    const key = resolver ? resolver. apply(this, args) : args[0]

    // cache judgment
    // This way of writing is to directly mount the cache object to the function object attribute cache
    const cache = memoized.cache
    if (cache.has(key)) {<!-- -->
      return cache. get(key)
    }

    // Execute the function to get the result
    const result = func. apply(this, args)

    // Save to the cache object cache
    memoized.cache = cache.set(key, result) || cache
    return result
  }

  // Specify the cache instance, the default is Map
  memoized.cache = new (memoize.Cache || Map)()
  return memoized
}

// can specify the cache object instance
// Example: replace memoize.Cache = WeakMap;
memoize. Cache = Map

export default memoize

lodash’s memoize method actually does not meet our expectations for asynchronous results. It can cache the execution results of synchronous functions, but for asynchronous, due to the uncertainty of execution timing and return results, it makes We cannot directly cache the execution result of the asynchronous function (refer to the description of the asynchronous function above), so what you can do is actually cache the promise asynchronous function itself, as follows:

  • Example of a simple scenario, cache the obtained results
const usersCache = new Map();

const getUserById = async (userId) => {<!-- -->
  if (!usersCache.has(userId)) {<!-- -->
    const user = await request. get(`/api/user/${<!-- -->userId}`)
    usersCache.set(userId, user);
  }

  return usersCache. get(userId);
}
  • For example, in a concurrent scenario, the asynchronous function starts caching after the result of the first function is returned. When there are multiple requests, the idea of caching the result will not work. A trick is to directly cache the promise function, not the result
// await Promise. all([
// getUserById('user1'),
// getUserById('user2')
// ]);

const userPromisesCache = new Map();

const getUserById = (userId) => {<!-- -->
  if (!userPromisesCache.has(userId)) {<!-- -->
    // Note that async await is not used here
    const userPromise = request.get(`/api/user/${<!-- -->userId}`)
    userPromisesCache.set(userId, userPromise);
  }

  return userPromisesCache. get(userId)!;
};

// lodash.memoize writing method
const getUserById = async (useId) => {<!-- -->
  const user = await request. get(`/api/user/${<!-- -->userId}`)
  return user
}

lodash.memoize(getUserById)

The disadvantages of this writing method are obvious, including the exception handling of execution failures. It is recommended to use memoizee here, which can satisfy the support for promise. If you are interested, you can look at the source code for further research.

import memoize from 'memoizee';

const getUserById = async (useId) => {<!-- -->
  const user = await request. get(`/api/user/${<!-- -->userId}`)
  return user
}

const getUserById = memoize(getUserById, {<!-- --> promise: true});

memoize-one

The code logic is quite concise, it can be regarded as another version of lodash.memoize, which is also not friendly to asynchronous support

import areInputsEqual from './are-inputs-equal'; // Provide a parameter comparison function by default

function memoizeOne(resultFn, isEqual = areInputsEqual) {<!-- -->
  let lastThis;
  let lastArgs = [];
  let lastResult;
  let calledOnce = false;

  // breaking cache when context (this) or arguments change
  function memoized(this, ...newArgs) {<!-- -->
    if (calledOnce & amp; & amp; lastThis === this & amp; & amp; isEqual(newArgs, lastArgs)) {<!-- -->
      return lastResult;
    }

    lastResult = resultFn. apply(this, newArgs);
    calledOnce = true;
    lastThis = this;
    lastArgs = newArgs;
    return lastResult;
  }

  return memoized;
}

Summary

First of all, an idea for the cache function is actually very concise and clear, and even two lines of code can realize the basic version, but divergent thinking based on this is worth exploring~

If you don’t understand the implementation logic, it’s like a blind man touching an elephant before looking at the source code. It seems that many things are difficult and complicated. In fact, when you start to understand, you will suddenly become enlightened in the process of reading the source code, and everything is clear.

For many things in the open source library, you can actually start from the source code with a learning attitude. You don’t need to think about them too complicated. You can understand the basic idea by looking at the most original version 1.0. Follow-up things are more Most of it is the processing of details, put aside the details and look at the whole, and then look at the details after grasping the overall framework~

In general, if the requirements are not complicated, you can implement a simple version of the cache class library yourself. lodash.memoize is easy to use, and it is sufficient for general function value caching, but it does not support asynchronous good~

Reference

  • Memoizing async functions in Javascript
  • Function Memory in JavaScript Topics
  • JavaScript Memoization Function
  • lodash.memoize
  • p-memoize
  • memoize-one