Awk Overview
How awk works:
- Read text line by line, separated by space or tab key as the separator by default, save the separated fields into built-in variables, and execute the editing command according to the mode or condition.
- awk tends to split a line into multiple “fields” before processing.
- The reading of awk information is also read line by line, and the execution result can be displayed by printing the field data through the print function.
- In the process of using the awk command, you can use the logical operator “& amp; & amp;” to represent “and”, “|” to represent “or”, “!” to represent “not”, and you can also perform simple mathematical operations, such as +, -, *, /, %, and ^ represent addition, subtraction, multiplication, division, remainder, and exponentiation, respectively.
Command format of awk:
awk option 'pattern or condition {action}' file1 file2... ? awk -f script file file1 file2
Notice:
- Note that it must be single quotes: ‘pattern or condition {operation}’
- Specify the condition outside { }, and specify the operation inside { }.
- Use commas to specify consecutive lines, and || to specify discontinuous lines. & amp; & amp; means “and”.
- Built-in variables, cannot be enclosed in double quotes, otherwise the system will treat it as a string
Common built-in variables of awk (can be used directly):
Built-in variables | Meaning | |
---|---|---|
$0 | The entire row content of the currently processed row | |
$n | The nth field (nth column) of the currently processed row | |
NR | The line number (ordinal number) of the currently processed line | |
NF | current The number of fields in the processed row. $NF represents the last field | |
FS | column separator. Specifies the field separator for each line of text, defaults to spaces or tabs. Same as “-F” | File name to be processed |
RS | Line separator. When awk reads data from a file, it will cut the data into many records according to the definition of RS, and awk only reads one record at a time for processing. The default value is “\ “ |
Basic usage of awk
2.1 Output content by line
Example 1: $0 represents the entire line content of the current line
1. Output all content. Read line by line, $0 means the entire line content of the current line.
[root@yuji ~]# awk '{print $0}' ff.txt //output all content one two three four five six seven eight nine ten [root@yuji ~]# awk '{print}' ff.txt // output all content one two three four five six seven eight nine ten
Use NR to specify the line number
NR represents the line number (ordinal number) of the currently processed line.
1. Print 1 to 3 lines. Commas indicate consecutive line content. & amp; & amp; means “and
[root@yuji ~]# awk 'NR==1,NR==3 {print $0}' ff.txt //Print 1 to 3 lines one two three [root@yuji ~]# awk '(NR>=1) & amp; & amp;(NR<=3) {print}' ff.txt one two three
Note the symbol “||” (for “or”)
[root@yuji ~]# awk '(NR>=1)||(NR<=3) {print}' ff.txt //print all lines one two three four five six seven eight nine ten [root@yuji ~]# awk '(NR==1)||(NR==3) {print}' ff.txt //Print line 1 and line 3 one three
Print odd lines; print even lines. (Divide the row number by 2 to get the remainder, the remainder 1 is an odd row, and the remainder 0 is an even row
[root@yuji ~]# awk '(NR%2)==1 {print}' ff.txt //Print odd lines one three five seven nine [root@yuji ~]# awk '(NR%2)==0 {print}' ff.txt //Print even lines two four six eight ten
$n represents the nth field of the current processing line
[root@yuji ~]# ifconfig ens33 ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.72.192 netmask 255.255.255.0 broadcast 192.168.72.255 inet6 fe80::fb25:1441:ffa2:fe13 prefixlen 64 scopeid 0x20<link> ether 00:0c:29:09:56:c0 txqueuelen 1000 (Ethernet) RX packets 941 bytes 95889 (93.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 652 bytes 98855 (96.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ? [root@yuji ~]# ifconfig ens33| awk 'NR==2{print $2}' 192.168.72.192
Filter lines by text pattern (matching string)
Combine with regular expressions to filter out the desired lines by matching strings.
1. Output the line containing root.
[root@yuji ~]# awk '/root/ {print}' pass.txt //Use awk command root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin [root@yuji ~]# sed -n '/root/p' pass.txt //Use sed command root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin
#Use the awk command [root@yuji ~]# awk '/^root/ {print}' pass.txt //Output lines starting with root root:x:0:0:root:/root:/bin/bash [root@yuji ~]# awk '/bash$/ {print}' pass.txt //Output the line ending with bash. root:x:0:0:root:/root:/bin/bash yuji2:x:1000:1000:yuji2:/home/yuji2:/bin/bash nancy:x:1021:1021::/home/nancy:/bin/bash helen:x:1022:1022::/home/helen:/bin/bash ? #Use the sed command [root@yuji ~]# sed -n '/^root/p' pass.txt //Output lines starting with root root:x:0:0:root:/root:/bin/bash [root@yuji ~]# sed -n '/bash$/p' pass.txt //Output the line ending with bash. root:x:0:0:root:/root:/bin/bash yuji2:x:1000:1000:yuji2:/home/yuji2:/bin/bash nancy:x:1021:1021::/home/nancy:/bin/bash helen:x:1022:1022::/home/helen:/bin/bash
BEGIN mode
Format
awk 'BEGIN{...};{...};END{...}' file ? #Processing process: 1. Before awk processes the specified text, it is necessary to execute the command operation in the BEGIN{...} mode; 2. The {...} in the middle is the actual command operation for processing files; 3. The command operation in the END{...} mode will be executed after awk finishes processing the file. In the END { } statement block, statements such as printing results are often placed.
Count lines ending in bash.
- BEGIN is the operation performed before processing the file, and END is the operation performed after processing the file.
- /bash$/ is a condition that needs to be met.
- First specify a variable x=0 in the BEGIN block; then process the file, execute x=x + 1 every time a line ending with bash is retrieved; finally execute the command in the END block and print the value of x.
[root@yuji ~]# awk '/bash$/{print}' pass.txt root:x:0:0:root:/root:/bin/bash yuji2:x:1000:1000:yuji2:/home/yuji2:/bin/bash nancy:x:1021:1021::/home/nancy:/bin/bash helen:x:1022:1022::/home/helen:/bin/bash [root@yuji ~]# awk 'BEGIN{x=0};/bash$/{x + + };END{print x}' pass.txt 4
Output content by field (column)
Two ways to specify the column separator: use -F or use the built-in variable FS.
Example 1: Use -F to specify the delimiter.
[root@yuji ~]# awk -F ':' '{print $1,$3}' /etc/passwd root 0 bin 1 daemon 2 adm 3 lp 4 sync 5 shutdown 6 stop 7 mail 8 ?…
Reassign the built-in variable FS.
The built-in variable FS represents the column separator, and the default is a space or a tab stop. Reassign to reassign the delimiter.
[root@yuji ~]# awk 'BEGIN {FS=":"};{print $1,$3}' /etc/passwd root 0 bin 1 daemon 2 adm 3 lp 4 sync 5 shutdown 6 stop 7 mail 8 ?…
Print users with UID greater than 500, print users with UID less than or equal to 500 (! Negate)
Ask to print username and UID.
! means negation, not greater than 500, that is, less than or equal to 500.
#Using a colon as a separator, filter out the lines whose third field is greater than 500, and then print the first and third fields [root@yuji ~]# awk -F: '$3>500 {print $1,$3}' /etc/passwd ? #Using a colon as a separator, filter out the rows where the third field is less than or equal to 500, and then print the first and third fields [root@yuji ~]# awk -F: '!($3>500) {print $1,$3}' /etc/passwd
Use an if statement
When using the if statement, add ( ) to the internal condition and { } to the outside.
Treat the entire statement { } as an operation command, which is equivalent to nesting.
[root@yuji ~]# awk -F: '{if($3>500) {print $1,$3}}' /etc/passwd polkitd 999 libstoragemgmt 998 color 997 saslauth 996 setroubleshoot 995 chrony 994 geoclue 993 sssd 992 nfsnobody-65534 gnome-initial-setup 991 yuji2 1000 nancy 1021 helen 1022
Ternary operator
In java: (conditional expression)? (A expression or value): (B expression or value) - When the conditional expression is established (true), the value A before the colon will be taken. - When the conditional expression is not established (false), the value B after the colon will be taken. ? In the shell: [ conditional expression ] & amp; & amp; A || B - When the conditional expression is established (true), the value A in front of || will be taken. - When the conditional expression is not established (false), the value B after || will be taken.
[root@yuji ~]# awk -F: '{max=($3>=$4)?$3:$4;{print max,$1}}' /etc/passwd 0 root 1 bin 2 daemons 4 adm 7lp 5 sync 6 shutdown 7 halt 12 mail ?…
$NF, $n~, $n!~, $n==, $n!=
$NF //represents the last field ? ~ means include, !~ means not include, == means equal, != means not equal ? $n> < == // used to compare values $n~"string" //Represents the function of the nth field containing a certain string $n!~"string" //Represents the nth field does not contain a certain string $n=="string" //represents the function of the nth field as a certain string $n!="String" //Represents the function of the nth field not being a certain string
When the seventh field is required to be not equal to “/bin/bash” and not equal to “/sbin/nologin”, output the first field, the third field, and the last field
[root@yuji ~]# awk -F: '($7!="/bin/bash") & amp; & amp;($NF!="/sbin/nologin") {print $1,$3, $NF}' /etc/passwd sync 5 /bin/sync shutdown 6 /sbin/shutdown halt 7 /sbin/halt
Advanced usage of awk
RS specifies the line separator
When awk reads data from a file, it will cut the data into many records according to the definition of RS, and awk only reads one record at a time for processing. The default value of the built-in variable RS is “\
“.
Specifying a colon as a delimiter prints the line number and the entire line content.
[root@yuji ~]# echo $PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin [root@yuji ~]# echo $PATH |awk 'BEGIN{RS=":"};{print NR,$0}' //Specify a colon as a separator, print the line number and the entire line content 1 /usr/local/sbin 2 /usr/local/bin 3 /usr/sbin 4 /usr/bin 5 /root/bin ? [root@yuji ~]#
Pipeline the results of other commands
Count of rows
Method 1: Use the built-in variable NR to print the last line number
After processing the text, print the line number, which is the line number of the last line, that is, the number of lines. (This method is only applicable to the case of outputting all lines, that is, it can only count how many lines there are in the full text)
[root@yuji ~]# echo $PATH |awk 'BEGIN{RS=":"};{print NR,$0}' 1 /usr/local/sbin 2 /usr/local/bin 3 /usr/sbin 4 /usr/bin 5 /root/bin ? [root@yuji ~]# echo $PATH |awk 'BEGIN{RS=":"};END{print NR}'
Use “wc -l” to count lines
Use the pipe character to pass the command result to “wc -l”. If you use “wc -l” inside { }, be sure to add double quotes.
[root@yuji ~]# awk -F: '/bash$/{print}' pass.txt root:x:0:0:root:/root:/bin/bash yuji2:x:1000:1000:yuji2:/home/yuji2:/bin/bash nancy:x:1021:1021::/home/nancy:/bin/bash helen:x:1022:1022::/home/helen:/bin/bash [root@yuji ~]# awk -F: '/bash$/{print}' pass.txt |wc -l 4 [root@yuji ~]# awk -F: '/bash$/{print |"wc -l"}' pass.txt 4
Use “grep -c” to count the number of matching lines.
[root@yuji ~]# grep -c "bash$" pass.txt 4
Statistic memory usage
The free command can check the memory usage.
Usage = amount of memory used/total amount of memory.
int means taking an integer.
[root@yuji ~]# free total used free shared buff/cache available Mem: 999696 310872 319996 7400 368828 491760 Swap: 2097148 0 2097148 [root@yuji ~]# free | awk '/Mem:/ {print $3/$2}' 0.311223 [root@yuji ~]# free | awk '/Mem:/ {print $3/$2*100}' 31.1223 [root@yuji ~]# free | awk '/Mem:/ {print int($3/$2*100)}' 31 [root@yuji ~]# free | awk '/Mem:/ {print int($3/$2*100)"%"}' 31%
Filter the CPU idle rate
The top command can check the CPU usage.
top -b -n 1 will only output the result of top once and will not refresh it.
[root@yuji ~]# top -b -n 1 top - 18:45:59 up 4:46, 3 users, load average: 0.04, 0.08, 0.06 Tasks: 149 total, 1 running, 148 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 6.2 sy, 0.0 ni, 93.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 s KiB Mem : 999696 total, 317456 free, 311324 used, 370916 buff/cache KiB Swap: 2097148 total, 2097148 free, 0 used. 491288 avail Mem [root@yuji ~]# top -b -n 1 |awk -F, '/Cpu/{print $4}' 100.0 id [root@yuji ~]# top -b -n 1 |awk -F, '/Cpu/{print $4}' |awk '{print $1}' 100.0 [root@yuji ~]# top -b -n 1 |awk -F, '/Cpu/{print $4}' |awk '{print int($1)}' 85 [root@yuji ~]# top -b -n 1 |awk -F, '/Cpu/{print $4}' |awk '{print int($1)"%"}' 86%
Specify the column separator when outputting
FS Column separator on input.
Column separator for OFS output. ($1=$1
is used for activation, otherwise it will not take effect)
[root@yuji ~]# echo "A B C D" A B C D [root@yuji ~]# echo "A B C D" | tr " " "|" A|B|C|D [root@yuji ~]# echo "A B C D" | sed 's/ /|/g' A|B|C|D [root@yuji ~]# echo "A B C D" |awk 'BEGIN{OFS="|"};{$1=$1;print $0}' A|B|C|D
Use awk to deduplicate (awk array feature)
Arrays can be defined in awk, and the subscript value of an array element can be a string (with double quotation marks), and if the element value is a string, quotation marks must also be added.
1. Define an array.
[root@yuji ~]# awk 'BEGIN{a[0]=10 ;a[1]=20; print a[0]}' 10 [root@yuji ~]# awk 'BEGIN{a[0]=10 ;a[1]=20; print a[1]}' 20 [root@yuji ~]# awk 'BEGIN{a["abc"]=10 ;a["xyz"]=20; print a["abc"]}' 10
Print all element values and corresponding subscript values in the array.
[root@yuji ~]# awk 'BEGIN{a[0]=10;a[1]=20;a[2]=30; for(i in a){print i,a[i]} }' 0 10 1 20 2 30
, print out the repeated lines in the file, and the number of repetitions
[root@yuji ~]# cat test.txt aaa aaa bbb ccc aaa bbb aaa [root@yuji ~]# awk '{a[1] + + ; print a[1]}' test.txt 1 2 3 4 5 6 7 [root@yuji ~]# awk '{a[$1] + + };END{for(i in a){print i,a[i]}}' test.txt aaa 4 ccc 1 bbb2
Case
By analyzing the log /var/log/secure, check which hosts are brute-forcing the local service. If the password verification fails more than three times (regardless of continuity), add the IP to the blacklist /etc/hosts.deny
#Filter out the lines containing "Failed password", print the 11th field, and sort by number [root@yuji ~]# awk '/Failed password/{print $11}' /var/log/secure |sort -n 192.168.72.10 192.168.72.10 192.168.72.10 192.168.72.10 192.168.72.192 192.168.72.192 192.168.72.192 ? #Count the number of duplicate rows [root@yuji ~]# awk '/Failed password/{print $11}' /var/log/secure |sort -n |uniq -c 4 192.168.72.10 3 192.168.72.192 ? #If the number of repetitions is greater than 3, add "sshd:" before the IP and add it to the /etc/hosts.deny file [root@yuji ~]# awk '/Failed password/{print $11}' /var/log/secure |sort -n |uniq -c| awk '$1>3 {print "sshd:"$2}' >>/ etc/hosts.deny ? #View etc/hosts.deny file [root@yuji ~]# cat /etc/hosts.deny # # hosts. deny This file contains access rules which are used to # deny connections to network services that either use # the tcp_wrappers library or that have been # started through a tcp_wrappers-enabled xinetd. # # The rules in this file can also be set up in # /etc/hosts.allow with a 'deny' option instead. # # See 'man 5 hosts_options' and 'man 5 hosts_access' # for information on rule syntax. # See 'man tcpd' for information on tcp_wrappers # sshd:192.168.72.10
[root@yuji ~]# awk '/Failed password/{a[$11] + + };END{for(i in a){print i,a[i]}}' /var/log/secure 192.168.72.10 4 192.168.72.192 3