Grep and regular expressions

1. Text Processing Three Musketeers grep

For Linux enthusiasts of all kinds, if they want to play with Linux, they must understand and master the Three Musketeers of Linux text processing. Today, let’s talk about grep, one of the Three Musketeers.

1. Text Three Musketeers’ Grep

1. What is grep

grep stands for Global search Regular expression and Print out the line. It means to comprehensively search for content matched by regular expressions and print out the matched lines, but grep is just a small figure in the grep family of Unix, among which Also includes egrep, fgrep.

1.egrep

egrep is an extension of grep, supporting more re metacharacters

2. fgrep

fgrep is fixed grep or fast grep, which aims to treat all letters as words. In other words, the metacharacters in regular expressions represent their own literal meanings, and are not special.

2. Go deeper into grep

1. The role of grep

Text search tool, which checks the target text line by line according to the user-specified pattern (filter condition); prints the matched lines;

2. grep mode

Filter conditions written by the metacharacters and text characters of regular expressions;

3. grep usage format
grep [OPTIONS] PATTERN [FILE...]
grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]
    OPTIONS:
        --color=auto: Color and highlight the matched text;
            alias grep='grep --color=auto'
    -i: ignorecase, ignore the case of characters;
    -o: Only display the matched string itself;
    -v, --invert-match: Display lines that cannot be matched by the pattern;
    -E: Supports the use of extended regular expression metacharacters;
    -q, --quiet, --silent: Silent mode, that is, no information is output;

-q is generally used in scripts. You can determine whether it is found by judging the “$?”(echo $?) value.

  • 1 indicates that there is no match
  • 2 indicates that it matches
  • -A #:after, after # lines
  • -B #: before, the previous # line
  • -C #: context, # lines before and after
xuelong@xueba ~ at /etc/passwd | grep -A 2 root
root:x:0:0:root:/root:/usr/bin/zsh #The line matched by this behavior
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin

xuelong@xueba ~ cat /etc/passwd | grep -B 2 xuelong
sddm:x:118:126:Simple Desktop Display Manager:/var/lib/sddm:/bin/false
usbmux:x:119:46:usbmux daemon,,,:/var/lib/usbmux:/bin/false
xuelong:x:1000:1000:xuelong,,,:/home/xuelong:/usr/bin/zsh #The lines matched by this behavior

xuelong@xueba ~ cat /etc/passwd | grep -C 2 HPLIP
colord:x:111:121:colord color management daemon,,,:/var/lib/colord:/bin/false
dnsmasq:x:112:65534:dnsmasq,,,:/var/lib/misc:/bin/false
hplip:x:113:7:HPLIP system user,,,:/var/run/hplip:/bin/false #The lines matched by this behavior
kernoops:x:114:65534:Kernel Oops Tracking Daemon,,,:/:/bin/false
pulse:x:115:122:PulseAudio daemon,,,:/var/run/pulse:/bin/false

3. grep use case

Case 1

xuelong@xueba~ ps -ef | grep docker
root 1176 1 0 09:53 ? 00:00:06 /usr/bin/dockerd -H fd://
root 1218 1176 0 09:53 ? 00:00:00 containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim containerd-shim --metrics-interval=0 --start -timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime runc
xuelong 5218 4238 0 13:42 pts/3 00:00:00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir= .hg --exclude-dir=.svn docker

2. Regular expression

1. Understand regular expressions

The emergence of regular expressions laid the beginning of computer intelligence

Regual Expression, REGEXP is a pattern written by a type of special characters and text characters. Some of the characters do not represent their literal meaning, but are used to represent control or wildcard functions.

Regular expressions are divided into two categories

  • Basic Regular Expressions BRE
  • Extended regular expression ERE

2. Basic regular expressions

1. Metacharacter

  • .: Matches any single character
  • []: Matches any single character within the specified range
  • [^]: Matches any single character outside the specified range

There are several ways to express the range in []

  • [:digit:]: represents all numbers
  • [:lower:]: Represents lowercase characters
  • [:upper:]: represents uppercase characters
  • [:alpha:]: represents all letters
  • [:alnum:]: represents all letters and numbers
  • [:punct:]: represents special symbols
  • [:space:]: represents whitespace characters

2, times matching

Used after the character whose number of occurrences you want to specify is used to limit the number of occurrences of the preceding character. Works in greedy mode by default

  • *: Match the preceding character any number of times; 0, 1, multiple times
  • .*: matches any character of any length
  • \?: matches the character before it 0 or 1 times; that is, the character before it is optional
  • \ + : Matches the character before it one or more times; that is, the character behind it must appear at least once
  • \{m\}: Match the preceding character m times
  • \{m,n\}: Match the preceding character at least m times and at most n times
    • \{0,n\}: at most n times
    • \{m,\}: at least m times

3. Position anchoring

Target a specific location

  • ^: Beginning of line anchor; used for the leftmost side of the pattern
  • $: end-of-line anchor; used for the rightmost side of the pattern
  • ^PATTERN$: used for PATTERN to match the entire line
    • ^$: Blank line
    • ^[[:space:]]*$: Empty line or line containing whitespace characters
  • Word: Continuous characters (strings) composed of non-special characters are called words; include numbers
  • < or \b: word-initial anchoring, used on the left side of the word pattern; such as
  • > or \b: word ending anchoring, used on the right side of the word pattern; such as root>, only words ending in root are intelligently anchored
  • : matches the complete word; such as , the exact anchor word is root

4. Group

Grouping is to bundle one or more characters together and process them as a whole, such as \(\)

Case

\(xy\)*ab: Indicates that the whole xy can appear any number of times

Note

The content matched by the pattern in the grouping brackets will be automatically recorded by the regular expression engine in internal variables. These variables are named \1,\2…:

  • \1: Starting from the left side of the pattern, the characters matched by the pattern between the first left bracket and the matching right bracket
  • \2: The pattern starts from the left and matches the characters between the second left bracket and the matching right bracket.
  • Back reference: refers to the characters matched by the pattern in the previous grouping brackets;
He loves his lover.
He likes his lover.
She likes her liker.
She loves her liker.
~]# grep "\(l..e\).*\1" lovers.txt

3. Extended regular expression

Supports extended regular expressions to implement text filtering functions similar to grepUse format

Using format: egrep [OPTIONS] PATTERN [FILE...]
    Options:
        -i: case insensitive
        -o: Only display matches to the string itself,
        -v: Invert the matching results
        -q: indicates silent output
        -A #: Indicates the next few lines of the matched line
        -B #: Indicates matching the first few lines of the line
        -C #: Indicates the context lines of the matched line;
        -G: supports basic regular expressions

1. Metacharacters that extend regular expressions

  • .: any single character
  • []: Any single character within the specified range
  • [^]: Any single character outside the specified range

2, times matching

  • *: any number of times, 0, 1 or more
  • ?: 0 or 1 times, the characters before it are optional
  • +: its preceding character at least 1 time
  • {m}: the character before it m times
  • {m,n}: at least m times, at most n times

3. Position anchoring

  • ^: anchor at the beginning of line
  • $: end-of-line anchoring
  • <, \b: word-initial anchoring
  • >, \b: word ending anchoring

4. Grouping and Reference

  • (): Grouping; characters matched by the pattern in parentheses will be recorded in the internal variables of the regular expression engine
  • Backreferences: \1, \2, …Note
  • a|b: a or b
  • C|cat: C or cat, note that this means the entire left side or back side
  • (c|C)at: cat or Cat

5. Case

Case 1: Find all lines starting with uppercase or lowercase S in the /proc/meminfo file; there are at least three ways to achieve this

~]# grep -i "^s" /proc/meminfo
~]# grep "^[sS]" /proc/meminfo
~]# grep -E "^(s|S)" /proc/meminfo

Case 2: Display relevant information of root, centos or user1 users on the current system

~]# grep -E "^(root|centos|user1)\>" /etc/passwd

Case 3: Find the line in the /etc/rc.d/init.d/functions file that contains a word followed by a parenthesis;

~]# grep -E -o "[_[:alnum:]] + \(\)" /etc/rc.d/init.d/functions

Case 4: Use the echo command to output an absolute path, and use egrep to extract the base name; (matching is done from right to left here)

~]# echo /etc/sysconfig/ | grep -E -o "[^/] + /?$"

Further: Get its path name; similar to the result of executing the dirname command on it;
Case 5: Find the value between 1-255 in the ifconfig command result;

~]# ifconfig | grep -E -o "\<([1-9]|[1-9][0-9]|1[0-9]{2}|2[0- 4][0-9]|25[0-5])\>"

Case 6: Find the IP address in the ifconfig command result;

ifconfig | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}"

Case 7: Add users bash, testbash, basher and nologin (their shell is /sbin/nologin); then find the line in the /etc/passwd file where the user name is the same as the shell name;

~]# grep -E "^([^:] + \>).*\1$" /etc/passwd

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Linux skill treeLinux practical commandsgrep command 9248 people are learning the system