Linux regular expressions and grep

What is bash

image-20191119085303759

  • bash is a command processor that runs in a text window and can execute commands entered directly by the user
  • Bash can also read linxu commands from files, called scripts
  • bash supports wildcards, pipes, command substitution, conditional judgment and other logical control statements

bash features

  • Command line expansion
[root@chaogelinux ~]# echo {tom,bob,chaoge,jerry}
tom bob chaoge jerry

[root@chaogelinux ~]# echo chaoge{666,888}
chaoge666 chaoge888

[root@chaogelinux ~]# echo chaoge{1..5}
chaoge1 chaoge2 chaoge3 chaoge4 chaoge5

[root@chaogelinux ~]# echo chaoge{1..10..2}
chaoge1 chaoge3 chaoge5 chaoge7 chaoge9

[root@chaogelinux ~]# echo chaoge{01..10..2}
chaoge01 chaoge03 chaoge05 chaoge07 chaoge09
  • command alias
alias,unalias
  • command history
history
!Line number
!! Last command
  • shortcut key
ctrl + a moves to the beginning of the line
ctrl + e moves to the end of the line
ctrl + u delete the character before the cursor
ctrl + k deletes the character after the cursor
ctrl + l clears the screen terminal content, the same as clear
  • command completion
tab key
Completion
    Commands present in $PATH
  • File path completion
/opt/chaoge/linux_study

Linux regular expression

image-20191106101524791

Regular Expression: Regular Expression, REGEXP

A pattern written by a class of special characters and text characters, some of which do not represent their literal meaning, but are used to represent control or wildcard functions;

Divided into two categories:

Basic regular expressions: BRE
Extended regular expression: ERE

The meaning of regular expressions

  • Handle large amounts of strings
  • Process text

With the assistance of special symbols, Linux administrators can quickly filter, replace, and process the required strings and texts, making their work efficient.

Usually Linux operation and maintenance work is faced with a large amount of content containing strings, such as

  • Configuration file
  • code
  • Command output results
  • log file

And for this type of string content, we often have specific needs to find specific strings that meet work needs, so regular expressions appear.

  • Regular expressions are a set of rules and methods
  • Regular work is performed in units, processing one line at a time
  • Regular expressions simplify complex expressions and improve work efficiency
  • Linux is only supported by the Three Musketeers (sed, awk, grep), other commands cannot be used

Regular expressions are widely used in Python, Java, Perl, etc. Regular expressions cannot be used in ordinary commands under Linux, so you can only use the Three Musketeers.

Wildcards are supported by most common commands and are used to find files or directories, while regular expressions are used to filter content in files (data streams) through the Three Musketeers commands

Linux three musketeers

Text processing tools, all support regular expression engines

  • grep: text filtering tool, (pattern: pattern) tool
  • sed: stream editor, stream editor; text editing tool
  • awk: Text report generator (formatted text) for Linux, gawk on Linux

Classification of regular expressions

The Three Musketeers of Linux are mainly divided into two categories

  • Basic regular expression (BRE, basic regular expression)
The corresponding metacharacters for BRE are ^$.[]*
  • Extended regular expression (ERE, extended regular expression)
ERE is based on BRE and adds characters such as (){}? + |

Basic regular expression BRE set

  • Match characters
  • Number of matches
  • position anchoring
Symbol Function
^ Sharp angle, used for the leftmost part of the pattern, such as “^oldboy”, matching lines starting with the word oldboy
$ $ symbol, used for the rightmost side of the pattern, such as “oldboy$”, indicating the line ending with the word oldboy
^$ combining symbol , indicating a blank line
. matches any and only one character, and cannot match a blank line
\ Escape characters, allowing characters with special meanings to appear in their original form and restore their original meaning. For example, \. represents the decimal point
* Matches the previous character (continuous occurrence) 0 or more times. Repeating 0 times means empty, that is, matching all content
.* combinator, matches any character of any length
^.* combinator, matches any number Content starting with characters
.*$ combination character, matching content ending with any number of characters
[abc] Matches any character in the [] set, a or b or c, you can write [a-c]
[^abc] Matches any character after ^, a or b or c, ^ represents the inversion of [abc]
Match complete content
Locate the left and right sides of the word, such as to find “The chao ge”, but not “yuchao”

Extended regular expression ERE set

Extended regular expressions must be run with grep -E to take effect

Character Function
+ Matches the previous character 1 or more times, and the previous character appears at least once
[:/] + Matches “:\ in brackets “or”/” character 1 or more times
? Match the previous character 0 or 1 times, the previous character may or may not None
Vertical bars means filtering multiple strings at the same time
() Group filtering, the enclosed content represents a whole
a{n,m} Match the previous character at least n times and at most m times
a{n,} Match the previous character at least n times
a{n} Match the previous character exactly n times
a{,m} Match the previous character at most m times

Tip:

The grep command requires the parameter -E to support regular expressions
egrep is deprecated, use grep -E instead
grep does not take parameters, you must add "" backslash before special characters to recognize them as regular

grep

Full version: Global search REgular expression and Print out the line.

Function: Text search tool, checks the target text line by line according to the “pattern (filter condition)” specified by the user, and prints the matched lines

Pattern: Filter conditions written by metacharacters and text characters of regular expressions;

Syntax:
grep [options] [pattern] file
Command Parameter Match Pattern File Data
                -i: ignorecase, ignore the case of characters;
                -o: Only display the matched string itself;
                -v, --invert-match: Display lines that cannot be matched by the pattern;
                -E: Supports the use of extended regular expression metacharacters;
                -q, --quiet, --silent: Silent mode, that is, no information is output;

The grep command is one of the most important commands in the Linux system. Its function is to filter matching lines and from text files or pipeline data streams. code>data, if combined with regular expression, the function is very powerful and is a must-have command for Linux operation and maintenance personnel.

The matching pattern in the grep command is what you want to find. It can be ordinary text symbols or a regular expression.

Parameter options Explanation
-v Exclude matching results
-n Display matching lines and line numbers
– i Not case sensitive
-c Only count the number of matching lines
-E Use the egrep command
–color=auto Add color to the grep filter results
-w Only match filtered words
-o Only output matching content

Case

cat /etc/passwd > /tmp/test_grep.txt

grep "login" /tmp/test_grep.txt -n #Find out the login related lines
grep "login" /tmp/test_grep.txt -n -v #Find the lines without login
grep "ROOT" /tmp/test_grep.txt -i #Ignore case and find out the lines related to root
grep -E "root|sync" /tmp/test_grep.txt --color=auto #Filter out root and sync related lines at the same time
grep "login" /tmp/test_grep.txt -c #Count the number of lines matching the results
grep "login" /tmp/test_grep.txt -n -o #Only output matching content

grep "oldboy" /tmp/test_grep.txt -w #Complete match, exact string match, the entire word
grep -Ev "^#|^$" /tmp/test_grep.txt #Filter out blank and comment lines

Regular expression grep practice

Prepare test files

image-20191106154442687

^symbol

1. Output all lines starting with m

[root@pylinux data]# grep -i -n "^m" luffy.txt # -i ignore case -n display line number
5:My qq is 877348180.
7:My name is chaotic.

image-20191106154522834

2. Output all lines starting with i

[root@pylinux data]# grep -i -n "^i" luffy.txt
1:I am old boy teacher
2:I teach linux.
3:I like python.

image-20191106154535049

$symbol

1. Output all lines ending with r

[root@pylinux data]# grep -i -n "r$" luffy.txt
1:I am oldboy teacher

image-20191106154550687

2. Output all lines ending with m

[root@pylinux data]# grep -i -n "m$" luffy.txt
9:Our school website is http://oldboyedu.com

image-20191106154603756

TIP

Note that under the Linux platform, all files have a $ symbol at the end
You can use cat -A to view the file

3. Output all lines ending with “.”, be careful to use escape characters

1. Pay attention to the result without adding escape characters. "." in the regular expression matches any character. Grep treats . as a regular expression, so it finds the lines with data.

[root@pylinux data]# grep -i -n ".$" luffy.txt
1:I am old boy teacher
2:I teach linux.
3:I like python.
5:My qq is 877348180.
7:My name is chaotic.
9:Our school website is http://oldboyedu.com


2. Add an escape character and treat it as an ordinary decimal point filter.
[root@pylinux data]# grep -i -n "\.$" luffy.txt
2:I teach linux.
3:I like python.
5:My qq is 877348180.
7:My name is chaotic.

image-20191106154626226

^$combinator

1. Find the empty line and line number of the file
[root@pylinux data]# grep "^$" luffy.txt -n
4:
6:
8:
10:
11:
12:

image-20191106154654427

.dot symbol

The “.” dot represents any character, there is only one, and does not include blank lines.

[root@pylinux data]# grep -i -n "." luffy.txt
1:I am old boy teacher
2:I teach linux.
3:I like python.
5:My qq is 877348180.
7:My name is chaotic.
9:Our school website is http://oldboyedu.com

image-20191106154706558

Match “.ac” and find any three-digit character, including ac

[root@pylinux data]# grep -i -n ".ac" luffy.txt
1:I am old boy teacher
2:I teach linux.

image-20191106154719560

\escape character

1. Find all the points “.” in the text

[root@pylinux data]# grep "\." luffy.txt
I teach linux.
I like python.
My qq is 877348180.
My name is chaotic.
Our school website is http://oldboyedu.com

image-20191106154811677

*symbol

1. Find the previous character 0 or more times, find 0 or more times “i” appears in the text

[root@pylinux data]# grep -n "i*" luffy.txt
1:I am teacher
2:I teach linux.
3:I like python.
4:
5:My qq is 283178231.
6:
7:My name is haoge.
8:
9:Our school website is http://www.baidu.com
10:
11:
12:

image-20191106154826189

.*combinator

. means any character, * means matching the previous character 0 or more times, so when put together, it means matching everything, as well as spaces.

[root@pylinux data]# grep '.*' luffy.txt
I am old boy teacher
I teach linux.
I like python.

My qq is 877348180.

My name is chaotic.

Our school website is http://oldboyedu.com

image-20191106154909663

^.*o character

^ starts with a certain character

.any 0 or more characters

.* means match everything

o ordinary characters, until the end of the letter o

This feature of matching the same characters to the last character is called greedy matching.

[root@chaogelinux data]# grep "I.*o" luffy.txt
I am old boy teacher
I like python.

[abc] square brackets

Square bracket expression, [abc] means matching any character in the square brackets, a or b or c, the common form is as follows

  • [a-z] matches all lowercase single letters
  • [A-Z] matches all single uppercase letters
  • [a-zA-Z] matches all single uppercase and lowercase letters
  • [0-9] matches all single digits
  • [a-zA-Z0-9] matches all numbers and letters
[root@pylinux data]# grep '[a-z]' luffy.txt
I am old boy teacher
I teach linux.
I like python.
My qq is 877348180.
My name is chaotic.
Our school website is http://oldboyedu.com

image-20191106155012826

[root@pylinux data]# grep '[abcd]' luffy.txt
I am old boy teacher
I teach linux.
My name is chaotic.
Our school website is http://oldboyedu.com

image-20191106155050280

grep parameter-o

Using the “-o” option, you can only display the matched keywords instead of outputting the entire line.

Show how many characters a there are in the file

[root@pylinux data]# grep -o 'a' luffy.txt |wc -l
5

[^abc] negates the square brackets

For commands like [^abc] or [^a-c], the “^” symbol in the first place in the square brackets means exclusion, which means to exclude the letters a or b or c

The sharp angles appearing in square brackets indicate negation

1. Find characters other than lowercase letters

[root@pylinux data]# grep '[^a-z]' luffy.txt
I am old boy teacher
I teach linux.
I like python.
My qq is 877348180.
My name is chaotic.
Our school website is http://oldboyedu.com

image-20191106160345811

Extended regular expression practice

Here grep -E is used for practical extended regularity. The egrep official website has been deprecated.

+ number

The + sign means matching the previous character one or more times. You must use grep -E to expand the regular pattern.

[root@pylinux data]# grep -E 'l + ' luffy.txt
I am old boy teacher
I teach linux.
I like python.
Our school website is http://oldboyedu.com

image-20191106160751805

?symbol

Matches the previous character 0 or 1 times

1. Find the line containing gd or god in the file

[root@pylinux data]# grep -E 'go?d' luffycity.txt
god #The letter o appears once
gd #The letter o appears 0 times

|symbol

Vertical bar | means or in regular expressions

1. Find the txt file in the system, and the name contains the characters a or b

[root@pylinux data]# find / -maxdepth 3 -name "*.txt" |grep -i -E "a|b"
/data/luffycity.txt
/data/luffy.txt
/test_find/chaoge.txt
/test_find/alex.txt
/opt/all.txt
/opt/_book/123.txt
/opt/Python-3.7.3/pybuilddir.txt
/opt/alltxt.txt
/opt/s15oldboy/qiong.txt
/opt/IIS/keystorePass.txt

()Parentheses

Bundle one or more characters together and treat them as a whole;

  • One of the functions of parentheses is to group and filter the enclosed content, the content within the brackets represents a whole
  • The content within the brackets () can be referenced by the following "\
    " regular expressions
    , n is a number, indicating which bracket references >Contents

    • \1: Indicates the characters matched by the pattern in the first bracket from the left
    • \2: From the left side, the character matched by the pattern in the second bracket

1. Find the lines containing good and glad

[root@pylinux data]# grep -E 'goo|lad' luffycity.txt #The result is not what we want
good
goooood
goooooood
glad

[root@pylinux data]# grep -E 'good|glad' luffycity.txt #We hope to achieve such a match
good
glad

[root@pylinux data]# grep -E 'g(oo|la)d' luffycity.txt
good
glad

image-20191106164125453

Reference after grouping

[root@chaogelinux data]# cat lovers.txt
I like my lover.
I love my lover.
He likes his lovers.
He loves his lovers.

[root@chaogelinux data]# grep -E '(l..e).*\1' lovers.txt
I love my lover.
He loves his lovers.

[root@chaogelinux data]# grep -E '(r..t).*\1' /etc/passwd #Case 2
root:x:0:0:root:/root:/bin/bash

The clearest explanation of grouping

image-20200417105618924

{n,m} matching times

Repeat the previous character various times. Explicit matching can be shown via the -o parameter.

image-20191106173606905

image-20191106173749346