Table of Contents
1. Regular expressions
1 Overview
2. Basic metacharacters of regular expressions
3. Regular expression expansion metacharacters
2. grep and egrep
1.Basic example
3.sed (streaming text editor)
sed stream editor usage and analysis
4.awk text editor tool
awk commonly used built-in variables
Example
1. Regular expression
1. Overview
Regular Expression (RE) is a character pattern used to match specified characters during the search process. In most programs, regular expressions are enclosed in two forward slashes. Between bars; Regular expressions are usually used to retrieve and replace text that matches a certain pattern (rule). Many programming languages support string manipulation using regular expressions. A regular expression is a logical formula for string operations. It uses some predefined specific characters and combinations of these specific characters to form a “regular string”. This “regular string” is used to express a pair of characters. A filtering logic for strings.
2. Regular expression basic metacharacter
The difference between BRE and ERE
The difference between basic regular expressions (BRE) and extended regular expressions (ERE) is only the difference in meta characters.
BRE: Only^$.*[]
is a metacharacter
ERE:^$.[]* + (){}?|
are all metacharacters
^: line start locator, such as: ^root matches lines starting with root $: end-of-line locator, such as: world$ matches lines ending with world . : Matches a single character, such as: r..t *: Matches the leading character 0 or more times, matches all grep "o*" /etc/passwd (o appears 0 or more times) .*: Match any number of characters (greedy matching [] : Matches any character in square brackets [^]: Matches characters not in the specified group [^0-9] Matches characters except 0-9 \: Escape character, used to cancel the meaning of special symbols, such as: \!, \$ \<: word start locator #consisting of numbers or letters \<root \> : end of word locator \(\): Matched label # Test in vim :%s@\(张三\) \(李四\) \(王五\)@ \3 \1 \2@g
3. Regular expression expansion metacharacter
= equal to != not equal to =~ match + : Matches one or more leading characters [a-z] + oot ? :matches zero or one leading character a|b matches a or b (): Parentheses can enclose part of a regular expression to form a unit (that is, a group), and you can use a quantity qualifier for the entire unit, such as: (oo) + two oo appear once or more x{m}: character x repeated m times o{,3}: character appears up to 3 times x{m,}: character x repeated at least m times x{m,n}: character x repeated m to n times
二.grep and egrep
egrep supports extended metacharacters for regular expressions (or grep -E)
grep common options
-E: Make grep support extended regular expressions, the extended option is regular expressions, grep -E is equivalent to egrep
-n: Display the line number of the matching line
-q: Quiet mode, that is, silent output, does not print any standard output, and if there is matching content, status value 0 will be returned immediately
-v: negation (reverse filtering)
-w: for exact matching
-R: Search subdirectories
-i: case-insensitive for single characters
-o: Print only matching characters
-c: Display how many lines are matched
1.Basic instance
* 0 or more [root@slave2 ~]# grep 'ro*' passwd root:x:0:0:root:/root:/bin/bash adm:x:3:4:adm:/var/adm:/sbin/nologin \< Beginning positioning symbol \> End positioning symbol [root@newrain ~]# cat jack.txt Jack JACK JAck jackly :% s/\<[Jj]ack\>/123/g What does $ end with? [root@newrain ~]# grep 'bash$' /etc/passwd root:x:0:0:root:/root:/bin/bash confluence:x:1000:1000:Atlassian Confluence:/home/confluence:/bin/bash to:x:1003:1003::/home/to:/bin/bash . matches a single character [root@newrain ~]# grep 'r..t' /etc/passwd root:x:0:0:root:/root:/bin/bash [] matches any character in square brackets [root@newrain ~]# grep 'Root' /etc/passwd [root@newrain ~]# grep '[Rr]oot' /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin dockerroot:x:998:995:Docker User:/var/lib/docker:/sbin/nologin
extended regular
1. + matches one or more leading characters [root@newrain ~]# egrep 'ro + t' /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin dockerroot:x:998:995:Docker User:/var/lib/docker:/sbin/nologin 2. ? matches zero or one leading character [root@newrain ~]# egrep 'ro?t' /etc/passwd abrt:x:1041:1041::/home/abrt:/bin/bash 3. a|b matches a or b [root@newrain ~]# netstat -anlp|egrep ':80|:22' [root@newrain ~]# egrep 'root|alice' /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin dockerroot:x:998:995:Docker User:/var/lib/docker:/sbin/nologin 4. x{m} character x repeated m times [root@newrain ~]# cat a.txt love love. loove looooove [root@newrain ~]# egrep 'o{2}' a.txt loove looooove [root@newrain ~]# egrep 'o{2,}' a.txt loove looooove [root@newrain ~]# egrep 'o{6,7}' a.txt
三.sed (Streaming Text Editor)
sed is a stream editor. It is a very moderate tool for text processing. It can be perfectly used with regular expressions and has extraordinary functions.
sed “parameters” ‘mode’ file
parameter:
-e | Use the specified script to process the input text file |
-f | When the requirements for specifying a rule file are complex, there are many contents that need to be matched |
-i | Modify the file content directly without outputting to the terminal. When using -i.bak, back up the original file. |
-r | Expand the regular rules, and the ones that cannot be matched can now be matched. Arrived |
-n | Suppress input line output and only display after script processing The results |
model :
1 s replaces 2 g the entire line (can also be a number, replace the number) 3 d deletes 4 p prints 5 a appends 6 i is inserted
[root@web-server ~]# cat rule.sed
/root/d
[root@web-server ~]# sed -f rule.sed passwd #-f specifies the rule file
sed stream editor usage and analysis
1. Delete mode d
Remove comments and empty lines from the sshd configuration file
sed '/^#.*/d;/^$/d' /etc/ssh/sshd_config
sed '1d' passwd //Delete the first line of the file sed '1,2d' passwd //Delete lines 1 to 2 of the file sed '2,$d' passwd //Delete line 2 to the last line sed '/root/d' passwd //match root, delete this line sed '/root/,2d' passwd //match the root line and go to a certain line sed '1~2d' passwd //Delete odd lines sed '0~2d' passwd //Delete even-numbered lines sed '/root/d' passwd ##Delete lines matching root sed '1d;2d' passwd ##Delete the first and second lines
2. Add pattern a (join in the next line) i (join in the previous line)
[root@web-server ~]# sed 4a\hello passwd //Add hello after the fourth line root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin hello lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync [root@web-server ~]# cat -n passwd | sed -e '2a hello nginx \ //Add multiple lines >nginx\ >mysql' 1 root:x:0:0:root:/root:/bin/bash 2 bin:x:1:1:bin:/bin:/sbin/nologin hello nginx nginx mysql
3.Replace (s/aaa/bbb/g)
[root@web-server ~]# sed 's/root/ROOT/g' passwd ##Replace all roots with ROOT ROOT:x:0:0:ROOT:/ROOT:/bin/bash [root@web-server ~]# sed 's/pattern/replace_string/gi' filename //Ignore case replacement
The delimiter in sed can be replaced with other characters, because the s flag will consider the following character as a delimiter
sed ‘s:text:replace_text:’
sed ‘s|text|replace_text|’
4. Print parameter -n mode p
If you do not add the -n parameter, both inflow and outflow will be printed.
[root@web-server ~]# sed -n '1p' passwd ## Print the first line root:x:0:0:root:/root:/bin/bash [root@web-server ~]# sed -n '1,4p' passwd ##Print lines 1 to 4 root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin
5. Insert -i (modify the file content directly without outputting to the terminal)
[root@web-server ~]# sed -i '1a hello shell ' passwd ##Insert into the file after the first line [root@web-server ~]# head -5 passwd root:x:0:0:root:/root:/bin/bash hello shell bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin
Since it is more dangerous when using the -i parameter, when we use the i parameter and add .bak at the end, a backup file will be generated to prevent regrets
sed -i.bak 's/pattern/replace_string/' filename
四.awk text editor tool
Introduction:
awk is a line processor
. Compared with the advantages of screen processing, when processing huge files, there will be no memory overflow or slow processing problems
. It is usually used Formatted text information.
awk parameter ‘BEGIN{What to do before processing} {Processing content} END{Content after processing}’
BEGIN{} {} before row processing
END{} row processing after row processingBEGIN{} does something before processing the text, only executed once
{} What is done when processing text, how many times is it executed?
awk ‘BEGIN{i=1}{print i + + }’ /etc/passwd
END{} is done after processing the text and is only executed once.
How awk works
awk -F”:” ‘{print $1,$3}’ /etc/passwd
(1) Awk uses one line as input and assigns this line to the variable $0. Each line can be called a record and ends with a newline character.
(2)The row is then: broken down into fields, each field is stored in a numbered variable, starting with $1
(3) How does awk know that spaces are used to separate fields? Because there is an internal variable FS to determine the field separator. Initially, FS is assigned to spaces or tabs.
(4) When awk prints fields, it will use the print function to print in the setting method. awk adds spaces between the printed fields because there is a comma between $1 and $3. The comma is special and is mapped to another variable, becoming the output field separator OFS. OFS defaults to spaces.
(5) When awk prints a field, it will get each line from the file, store it in $0, overwrite the original content, and then separate the new string into fields and process it. This process continues until the file is processed.
awk commonly used built-in variables
NF | The number of browsing record fields (indicating the number of fields, when awk records the behavior, this variable is equivalent to the current column number) |
FS | (input field separator) # What symbol is used to separate |
OFS | (output field delimiter) # What delimiter is used to display |
NR | Number of records read (number of rows) |
FNR | Number of records browsing files |
RS | Input record delimiter |
ORS | Output record delimiter |
Example
FS (input field separator) awk 'BEGIN{FS=":"}{print $1,$2}' passwd ##Print the first and second columns of data with: as the separator root x hello shell OFS (output field separator) awk 'BEGIN{FS=":";OFS="##"}{print $1,$2}' passwd ##Print the first and second columns of data with ## as the separator root##x hello shell ## bin##x daemon##x NR represents the record number. When awk records the behavior, this variable is equivalent to the current line number. awk -F: '{print NR,$0}' passwd 1 root:x:0:0:root:/root:/bin/bash 2 hello shell 3 bin:x:1:1:bin:/bin:/sbin/nologin FNR represents the record number. When awk records the behavior, this variable is equivalent to the current line number (separate for different files) awk '{print FNR,$0}' passwd aaa 1 root:x:0:0:root:/root:/bin/bash 2 hello shell 3 bin:x:1:1:bin:/bin:/sbin/nologin 4 daemon:x:2:2:daemon:/sbin:/sbin/nologin 5 adm:x:3:4:adm:/var/adm:/sbin/nologin 1 lisi 2 wangwu 3 zhangsan ORS (output record separator) [root@web-server ~]# cat passwd root:x:0:0:root:/root:/bin/bash hello shell bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin [root@web-server ~]# awk 'BEGIN{ORS=""}{print $0}' passwd root:x:0:0:root:/root:/bin/bashhello shell bin:x:1:1:bin:/bin:/sbin/nologindaemon:x:2:2:daemon:/sbin:/sbin/ nologinadm:x:3:4:adm:/var/adm:/sbin/nologin