Regular RegExp front-end manual

Opening

RegExp (regular expression) is often used to process string rules. The usage scenario contains two angles:

  1. Verify, verify whether the string conforms to a certain rule, common requirements such as: verify the format of the mobile phone number, etc.;
  2. Matching/capturing, matching the content in the string that meets the rules, common requirements such as: matching template expressions {{}} for variable substitution, etc.

Next, let’s learn and master the use of front-end regularization from basics to practical applications.

  1. regular syntax,
  2. Use regularity to achieve verification,
  3. Use regular capture to achieve,
  4. Common regular usage scenarios at work.

1. Regular syntax

1. Create a regular expression

  1. Constructor way:

Regexp has its own constructor, which receives a regular expression as a parameter and returns a regular instance, so it has some methods on the RegExp prototype. The following verifies that the input content is a number:

const reg = new RegExp('\d + ');
reg. test(123); // true
reg.test('abc'); // false
  1. Literal way:

Create a regular expression by means of two slashes / /, and also have methods on RegExp.prototype, such as: test, exec, etc.

const reg = /\d + /;
reg. test(123); // true
reg.test('abc'); // false
  1. The difference between the two:
  • Constructor method, when using \, you need to add an additional \ for translation, otherwise it will be used as an ordinary character;

2. Components of regular expressions

Above we learned how to create a regular expression. The content between the two slashes in the parameters of the constructor and the literal is the place where the regular expression is written.

A regular expression can consist of two parts: metacharacters and modifiers.

1. Metacharacters:

It is the core component of writing regular expressions, which is used to define the rules for string verification and matching. Metacharacters can be divided into three categories:

  • 1) Ordinary metacharacters: (full literal rules, the meaning represents itself)
The regular definition is that the string matched by /cegz/ is "cegz"
  • 2) Special metacharacters: (single or multiple combinations to express special meaning rules)
1. \ Translate characters, which can convert ordinary characters into characters with special meaning, and can also convert special meaning characters into ordinary characters
    For example, transfer the special character . to an ordinary character: /2\.5/.test('2.5'); // true
2. ^ Specifies the metacharacter used by the beginning rule. After setting it, it will require the beginning of the string to match the metacharacter rule here
    console.log(/^\d/.test('2023abcd')); // true
3. $ specifies the metacharacter used by the ending rule,
    console.log(/\d$/.test('abcd2023')); // true
    When ^$ is added, it means: the string can only be the same as the rule, such as verifying an 11-digit mobile phone number: /^1\d{<!-- -->10}$/
4. . represents any character other than `\\
`,

5. \\
 represents a newline character
6. \d A collection representing numbers between 0-9
7. \D A word that is not a number between 0-9, such as a letter (`Uppercase will be the opposite of the lowercase rule`)
8. \w Any one of numbers, letters, and underscores (_)
9. \W is not \w
10. \s represents a blank character (including spaces, tabs, newlines)
11. \S is not \s
12. \t represents a tab character (tab key)

13. The meaning of | or, such as x|y means one of x or y
14. The meaning of [], such as [xyz] means one of x, y, z
15. [^] means inversion, such as [^xy] means other characters except x and y
16. [-] specifies the range, such as [a-z] means the characters between a and z

17. () stands for grouping, which is to divide into blocks, first integrate the rules of this block, and it also has another function of `group matching`;
18. (?:) only matches, not captures (the specific meaning is introduced later in Matching)
  • 3) Quantifier metacharacters: (set the number of occurrences of common/special metacharacters)
1. * represents 0 to multiple occurrences
2. + represents 1 to multiple occurrences
3. ? represents 0 or 1 occurrence
4. {<!-- -->n} is wrapped in parentheses, representing n times (n is a randomly specified number)
5. {<!-- -->n,} represents n ~ multiple occurrences
6. {<!-- -->n,m} represents n to m occurrences

2. Modifiers:

There are three modifiers, which are used to specify the regular matching strategy: i, m, g

  • i(ignoreCase) Ignore word case matching;
  • m(multiline) Ignore the newline character and perform multi-line matching;
  • g(global) Global matching, matching all the results that meet the conditions.

Suppose we have a string:

const str = `Google test google test google`;

In the absence of modifiers, the regex only performs an exact match and returns the first matched message:

console.log(str.match(/google/)); // ['google', index: 12]

If you add the i modifier, the regular expression will ignore case and match Google:

console.log(str.match(/google/i)); // ['Google', index: 0]

Multiple modifiers can also be used at the same time. The following matches ignore case and match all regular-related results:

console.log(str.match(/google/gi)); // [ 'Google', 'google', 'google' ]

After understanding metacharacters and modifiers, we write regular expressions based on them to achieve checksum matching.

2. Use regularity to realize verification

RegExp.test is usually used for regular verification. When a regular expression is instantiated, the regular expression will have this method, which receives a string as a parameter for matching. If the verification is successful, it returns true, otherwise it returns false.

We write a regex to verify whether the format of the mobile phone number is correct: start with 1, 11 digits

const reg = /^1\d{10}$/;
const iphone = '18933112266';
console.log(reg.test(iphone)); // true

If we want to realize that the input box only allows the input of numbers, after setting type=number, we will find that the letter e and decimal point . are also allowed to be input, we can Strictly control the input rules through regular checks:

<Input value={value} onChange={event => {
  const value = event. target. value;
  // Only numbers are allowed to be entered, and the input is allowed to be empty
  if (/^$|^\d + $/.test(value)) {
    setValue(value);
  }
}} />

3. Using regular expressions to achieve capture

There are two ways to capture string content: regular expression exec capture and string match capture, let’s take a look at the difference between the two.

1. RegExp.exec

Regular expressions provide a exec method for capture, which accepts a string as a parameter, and the matching result is:

  • If no match is found, the result is null;
  • If a match is found, an array will be returned, the first item of the array element is the matching result, and the second item is the starting position of the matching result (starting from 0).
let str = "abcd2023efgh0301";
let reg = /\d + /;
console.log(reg.exec(str));

// output:
[ '2023', index: 4, input: 'abcd2023efgh0301', groups: undefined ]

It should be noted that the execution of exec will only match a result that meets the rules (lazy matching), and only the first one is captured by default.

The reason for lazy matching is that the regular expression has a special attribute lastIndex, which marks the starting index of the regular expression to match. By default this value will not be adjusted with the number of calls to exec:

console.log(reg.lastIndex);
console.log(reg.exec(str));
console.log(reg.lastIndex);

// output:
0
[ '2023', index: 4, input: 'abcd2023efgh0301', groups: undefined ]
0

How to solve regular laziness? It is not feasible to modify lastIndex directly. Add modifier g to the regular expression to achieve global matching, and lastIndex will be updated automatically after each execution of exec:

console.log(reg.lastIndex);
console.log(reg.exec(str));
console.log(reg.lastIndex);

0
[ '2023', index: 4, input: 'abcd2023efgh0301', groups: undefined ]
8

However, each time you need to manually call exec to match the next one, we write execAll to help us achieve all matches:

~function () {<!-- -->
  function execAll(str = '') {<!-- -->
    if (!this.global) return this.exec(str); // no g, only capture once
    let ary = [], res = null;
    while (res = this.exec(str)) {<!-- -->
      ary.push(res[0]);
    }
    return ary. length === 0 ? null : ary;
  }
  RegExp.prototype.execAll = execAll;
}();

console.log(reg.execAll(str));

// output:
[ '2023', '0301' ]

Usually, if we want to achieve all capture, we will use another capture method to replace the implementation of execAll: string match capture.

2. String. match

The match method is provided on the string prototype, which accepts a regular expression as a parameter, and its matching result is very similar to exec:

  • If no match is found, the result is null;
  • If a match is found, an array will be returned, the first item of the array element is the matching result, and the second item is the starting position of the matching result (starting from 0).
let str = "abcd2023efgh0301";
let reg = /\d + /;
console.log(str.match(reg));

// output:
[ '2023', index: 4, input: 'abcd2023efgh0301', groups: undefined ]

If you want to match all, just add the modifier g to the regular expression:

let str = "abcd2023efgh0301";
let reg = /\d + /g;
console.log(str.match(reg));

// output:
[ '2023', '0301' ]

At this point, you will find that the above two methods are matching results, and there is another method below that can achieve matching and replace the matching results with new values. (This is often used in business)

3. String. replace

String replace method is usually used for string replacement, the first parameter represents the matching rule, which can be string literal or regular expression ;The second parameter represents the value to be replaced, which can be a replacement string value or a processing function, and the processing function needs to return a replacement string value.

  1. If the second parameter is a string, the simplest usage is as follows:
const str = "abcd 2023 abcd";
console.log(str.replace('2023', 'abcd')); // abcd abcd abcd
  1. If the second parameter is a function, it will be executed when the result is matched, and a string needs to be returned as the new value after replacement:
const str = "abcd 2023 abcd";
console.log(str.replace('2023', () => {<!-- -->
  return 'abcd';
})); // abcd abcd abcd

Since it is a function, it will naturally have high flexibility, and complex logic can be written in the function body. Importantly, the parameters of the function can provide us with matching information.

For example: the first parameter $1 represents the matching result, the second parameter $2 represents the starting index of the matching result, and the third parameter $3 represents the original string. In some scenarios, some special processing can be done based on these parameters.

console.log(str.replace('2023', (...args) => {<!-- -->
  console.log(args); // [ '2023', 5, 'abcd 2023 abcd' ]
  return 'abcd';
}));

If the first parameter is string literal, only the first result will be matched and replaced, that is, one replacement will be performed once:

let str = "abcd 2023 abcd";
console.log(str.replace("abcd", "2023").replace("abcd", '2023'));

If you need to replace the string globally, you can use regular expression as the first parameter:

const str = "abcd 2023 abcd";
console.log(str.replace(/abcd/g, (...args) => {<!-- -->
  console. log(args);
  return '2023';
}));

// The output is as follows:
[ 'abcd', 0, 'abcd 2023 abcd' ]
[ 'abcd', 10, 'abcd 2023 abcd' ]
2023 2023 2023

Usually the regular expression will contain some grouping logic. For example, if we want to replace the time separator, we can achieve this through regularization:

const time = '2023-03-01';
const reg = /^(\d{4})-(\d{1,2})-(\d{1,2})$/;
console.log(time.replace(reg, '$1 year $2 month $3 day')); // March 01, 2023

Regular rules implement grouping through () metacharacters. At this time, the prototype of RegExc provides us with the matching results of each grouping, through $1-xx to record.

For the second parameter is a function, it can also be adapted to group matching:

const time = '2023-03-01';
const reg = /^(\d{4})-(\d{1,2})-(\d{1,2})$/;
console.log(time.replace(reg, (target, $1, $2, $3) => {<!-- -->
  console.log(target, $1, $2, $3); // 2023-03-01 2023 03 01
  return `${<!-- -->$1}year ${<!-- -->$2}month ${<!-- -->$3}day`;
})); // March 01, 2023

In actual business, replace can do many things, such as parsing template syntax {{var}} to realize variable replacement, let’s go Take a look at a few common application scenarios.

4. Common regular usage scenarios at work

1. Convert camel case to dash (-) name

Students who have used React JSX know that when setting the style attribute for an element, it is necessary to use small hump to name it, such as: fontSize: 16px, so we will use this when defining the style Way.

Let’s review the way native HTML adds style to DOM:

adopts the dash-name method.

Suppose there is a need now: to provide React JSX nodes and styles to generate native HTML template files for background use.

Then we know: the small hump fontSize must not work normally in HTML, and needs to be converted to font-size, and the regularization can help us quickly realize:

'fontSize'.replace(/[A-Z]/g, val => `-${<!-- -->val.toLowerCase()}`);

2. Template variable substitution

Sometimes we need to parse a string template and replace the slots in the template with actual variables. replace can be easily implemented. Suppose we have template data:

let str = "{<!-- -->{user_name}} - {<!-- -->{user_sex}}";

With the help of regular group capture, we can easily realize variable capture and replacement:

str = str.replace(/{<!-- -->{(\w + )}}/g, (content, $1) => {<!-- -->
  console. log(content, $1);
  return $1 === 'user_name' ? 'Mingli people' : 'male'
});
console. log(str);

The printout is as follows:

{<!-- -->{<!-- -->user_name}} user_name
{<!-- -->{<!-- -->user_sex}} user_sex
Mingli people - male

3. The middle 4 digits of the phone number are replaced by asterisks

In some business scenarios, considering user privacy and security, the middle four digits of the mobile phone number will be replaced by * asterisks, which can be easily realized with the help of regular capture and replace:

'18712345678'.replace(/(\d{3})\d{4}(\d{4})/, '$1****$2');

// get:
'187****5678'

Updating…

Last

Thanks for reading, if there are any deficiencies, welcome to point out.