shell three musketeers (grep, sed, awk)

Table of Contents

1. Regular expressions

1 Overview

2. Basic metacharacters of regular expressions

3. Regular expression expansion metacharacters

2. grep and egrep

1.Basic example

3.sed (streaming text editor)

sed stream editor usage and analysis

4.awk text editor tool

awk commonly used built-in variables

Example


1. Regular expression

1. Overview

Regular Expression (RE) is a character pattern used to match specified characters during the search process. In most programs, regular expressions are enclosed in two forward slashes. Between bars; Regular expressions are usually used to retrieve and replace text that matches a certain pattern (rule). Many programming languages support string manipulation using regular expressions. A regular expression is a logical formula for string operations. It uses some predefined specific characters and combinations of these specific characters to form a “regular string”. This “regular string” is used to express a pair of characters. A filtering logic for strings.

2. Regular expression basic metacharacter

The difference between BRE and ERE

The difference between basic regular expressions (BRE) and extended regular expressions (ERE) is only the difference in meta characters.
BRE: Only ^$.*[] is a metacharacter
ERE: ^$.[]* + (){}?|are all metacharacters

^: line start locator, such as: ^root matches lines starting with root
$: end-of-line locator, such as: world$ matches lines ending with world
. : Matches a single character, such as: r..t
*: Matches the leading character 0 or more times, matches all grep "o*" /etc/passwd (o appears 0 or more times)
.*: Match any number of characters (greedy matching
[] : Matches any character in square brackets
[^]: Matches characters not in the specified group [^0-9] Matches characters except 0-9
\: Escape character, used to cancel the meaning of special symbols, such as: \!, \$
\<: word start locator #consisting of numbers or letters \<root
\> : end of word locator
\(\): Matched label # Test in vim
:%s@\(张三\) \(李四\) \(王五\)@ \3 \1 \2@g

3. Regular expression expansion metacharacter

= equal to != not equal to =~ match
 + : Matches one or more leading characters [a-z] + oot
? :matches zero or one leading character
a|b matches a or b
(): Parentheses can enclose part of a regular expression to form a unit (that is, a group), and you can use a quantity qualifier for the entire unit, such as: (oo) + two oo appear once or more
x{m}: character x repeated m times
o{,3}: character appears up to 3 times
x{m,}: character x repeated at least m times
x{m,n}: character x repeated m to n times

二.grep and egrep

egrep supports extended metacharacters for regular expressions (or grep -E)

grep common options

-E: Make grep support extended regular expressions, the extended option is regular expressions, grep -E is equivalent to egrep
-n: Display the line number of the matching line
-q: Quiet mode, that is, silent output, does not print any standard output, and if there is matching content, status value 0 will be returned immediately
-v: negation (reverse filtering)
-w: for exact matching
-R: Search subdirectories
-i: case-insensitive for single characters
-o: Print only matching characters
-c: Display how many lines are matched

1.Basic instance

 * 0 or more
[root@slave2 ~]# grep 'ro*' passwd
root:x:0:0:root:/root:/bin/bash
adm:x:3:4:adm:/var/adm:/sbin/nologin

 \< Beginning positioning symbol \> End positioning symbol
[root@newrain ~]# cat jack.txt
Jack JACK JAck jackly
:% s/\<[Jj]ack\>/123/g

What does $ end with?
[root@newrain ~]# grep 'bash$' /etc/passwd
root:x:0:0:root:/root:/bin/bash
confluence:x:1000:1000:Atlassian Confluence:/home/confluence:/bin/bash
to:x:1003:1003::/home/to:/bin/bash

. matches a single character
[root@newrain ~]# grep 'r..t' /etc/passwd
root:x:0:0:root:/root:/bin/bash

[] matches any character in square brackets
[root@newrain ~]# grep 'Root' /etc/passwd
[root@newrain ~]# grep '[Rr]oot' /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
dockerroot:x:998:995:Docker User:/var/lib/docker:/sbin/nologin

extended regular

1. + matches one or more leading characters
[root@newrain ~]# egrep 'ro + t' /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
dockerroot:x:998:995:Docker User:/var/lib/docker:/sbin/nologin
2. ? matches zero or one leading character
[root@newrain ~]# egrep 'ro?t' /etc/passwd
abrt:x:1041:1041::/home/abrt:/bin/bash

3. a|b matches a or b
[root@newrain ~]# netstat -anlp|egrep ':80|:22'
[root@newrain ~]# egrep 'root|alice'
/etc/passwd root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
dockerroot:x:998:995:Docker User:/var/lib/docker:/sbin/nologin

4. x{m} character x repeated m times
[root@newrain ~]# cat a.txt
love
love.
loove
looooove
[root@newrain ~]# egrep 'o{2}' a.txt
loove
looooove
[root@newrain ~]# egrep 'o{2,}' a.txt
loove
looooove
[root@newrain ~]# egrep 'o{6,7}' a.txt

三.sed (Streaming Text Editor)

sed is a stream editor. It is a very moderate tool for text processing. It can be perfectly used with regular expressions and has extraordinary functions.

sed “parameters” ‘mode’ file

parameter:

-e Use the specified script to process the input text file
-f When the requirements for specifying a rule file are complex, there are many contents that need to be matched
-i Modify the file content directly without outputting to the terminal. When using -i.bak, back up the original file.
-r Expand the regular rules, and the ones that cannot be matched can now be matched. Arrived
-n Suppress input line output and only display after script processing The results

model :

1 s replaces 2 g the entire line (can also be a number, replace the number) 3 d deletes 4 p prints 5 a appends 6 i is inserted

[root@web-server ~]# cat rule.sed
/root/d
[root@web-server ~]# sed -f rule.sed passwd #-f specifies the rule file

sed stream editor usage and analysis

1. Delete mode d

Remove comments and empty lines from the sshd configuration file

sed '/^#.*/d;/^$/d' /etc/ssh/sshd_config
sed '1d' passwd //Delete the first line of the file
sed '1,2d' passwd //Delete lines 1 to 2 of the file
sed '2,$d' passwd //Delete line 2 to the last line
sed '/root/d' passwd //match root, delete this line
sed '/root/,2d' passwd //match the root line and go to a certain line
sed '1~2d' passwd //Delete odd lines
sed '0~2d' passwd //Delete even-numbered lines
 sed '/root/d' passwd ##Delete lines matching root
sed '1d;2d' passwd ##Delete the first and second lines

2. Add pattern a (join in the next line) i (join in the previous line)

[root@web-server ~]# sed 4a\hello passwd //Add hello after the fourth line
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
hello
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync

[root@web-server ~]# cat -n passwd | sed -e '2a hello nginx \ //Add multiple lines
>nginx\
>mysql'
     1 root:x:0:0:root:/root:/bin/bash
     2 bin:x:1:1:bin:/bin:/sbin/nologin
hello nginx
nginx
mysql

3.Replace (s/aaa/bbb/g)

[root@web-server ~]# sed 's/root/ROOT/g' passwd ##Replace all roots with ROOT
ROOT:x:0:0:ROOT:/ROOT:/bin/bash

[root@web-server ~]# sed 's/pattern/replace_string/gi' filename //Ignore case replacement

The delimiter in sed can be replaced with other characters, because the s flag will consider the following character as a delimiter

sed ‘s:text:replace_text:’
sed ‘s|text|replace_text|’

4. Print parameter -n mode p

If you do not add the -n parameter, both inflow and outflow will be printed.

[root@web-server ~]# sed -n '1p' passwd ## Print the first line
root:x:0:0:root:/root:/bin/bash

[root@web-server ~]# sed -n '1,4p' passwd ##Print lines 1 to 4
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin

5. Insert -i (modify the file content directly without outputting to the terminal)

[root@web-server ~]# sed -i '1a hello shell ' passwd ##Insert into the file after the first line
[root@web-server ~]# head -5 passwd
root:x:0:0:root:/root:/bin/bash
hello shell
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin

Since it is more dangerous when using the -i parameter, when we use the i parameter and add .bak at the end, a backup file will be generated to prevent regrets

sed -i.bak 's/pattern/replace_string/' filename

四.awk text editor tool

Introduction:
awk is a line processor. Compared with the advantages of screen processing, when processing huge files, there will be no memory overflow or slow processing problems. It is usually used Formatted text information.

awk parameter ‘BEGIN{What to do before processing} {Processing content} END{Content after processing}’
BEGIN{} {} before row processing
END{} row processing after row processing

BEGIN{} does something before processing the text, only executed once
{} What is done when processing text, how many times is it executed?
awk ‘BEGIN{i=1}{print i + + }’ /etc/passwd
END{} is done after processing the text and is only executed once.

How awk works
awk -F”:” ‘{print $1,$3}’ /etc/passwd
(1) Awk uses one line as input and assigns this line to the variable $0. Each line can be called a record and ends with a newline character.
(2)The row is then: broken down into fields, each field is stored in a numbered variable, starting with $1
(3) How does awk know that spaces are used to separate fields? Because there is an internal variable FS to determine the field separator. Initially, FS is assigned to spaces or tabs.
(4) When awk prints fields, it will use the print function to print in the setting method. awk adds spaces between the printed fields because there is a comma between $1 and $3. The comma is special and is mapped to another variable, becoming the output field separator OFS. OFS defaults to spaces.
(5) When awk prints a field, it will get each line from the file, store it in $0, overwrite the original content, and then separate the new string into fields and process it. This process continues until the file is processed.

awk commonly used built-in variables
NF The number of browsing record fields (indicating the number of fields, when awk records the behavior, this variable is equivalent to the current column number)
FS (input field separator) # What symbol is used to separate
OFS (output field delimiter) # What delimiter is used to display
NR Number of records read (number of rows)
FNR Number of records browsing files
RS Input record delimiter
ORS Output record delimiter
Example
FS (input field separator)
awk 'BEGIN{FS=":"}{print $1,$2}' passwd ##Print the first and second columns of data with: as the separator
root x
hello shell

OFS (output field separator)
awk 'BEGIN{FS=":";OFS="##"}{print $1,$2}' passwd ##Print the first and second columns of data with ## as the separator
root##x
hello shell ##
bin##x
daemon##x

NR represents the record number. When awk records the behavior, this variable is equivalent to the current line number.
awk -F: '{print NR,$0}' passwd
1 root:x:0:0:root:/root:/bin/bash
2 hello shell
3 bin:x:1:1:bin:/bin:/sbin/nologin

FNR represents the record number. When awk records the behavior, this variable is equivalent to the current line number (separate for different files)
awk '{print FNR,$0}' passwd aaa
1 root:x:0:0:root:/root:/bin/bash
2 hello shell
3 bin:x:1:1:bin:/bin:/sbin/nologin
4 daemon:x:2:2:daemon:/sbin:/sbin/nologin
5 adm:x:3:4:adm:/var/adm:/sbin/nologin
1 lisi
2 wangwu
3 zhangsan


ORS (output record separator)
[root@web-server ~]# cat passwd
root:x:0:0:root:/root:/bin/bash
hello shell
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
[root@web-server ~]# awk 'BEGIN{ORS=""}{print $0}' passwd
root:x:0:0:root:/root:/bin/bashhello shell bin:x:1:1:bin:/bin:/sbin/nologindaemon:x:2:2:daemon:/sbin:/sbin/ nologinadm:x:3:4:adm:/var/adm:/sbin/nologin