The awk command of the text Three Musketeers

Awk Overview

How awk works:

  • Read text line by line, separated by space or tab key as the separator by default, save the separated fields into built-in variables, and execute the editing command according to the mode or condition.
  • awk tends to split a line into multiple “fields” before processing.
  • The reading of awk information is also read line by line, and the execution result can be displayed by printing the field data through the print function.
  • In the process of using the awk command, you can use the logical operator “& amp; & amp;” to represent “and”, “|” to represent “or”, “!” to represent “not”, and you can also perform simple mathematical operations, such as +, -, *, /, %, and ^ represent addition, subtraction, multiplication, division, remainder, and exponentiation, respectively.

Command format of awk:

 awk option 'pattern or condition {action}' file1 file2...
 ?
 awk -f script file file1 file2

Notice:

  • Note that it must be single quotes: ‘pattern or condition {operation}’
  • Specify the condition outside { }, and specify the operation inside { }.
  • Use commas to specify consecutive lines, and || to specify discontinuous lines. & amp; & amp; means “and”.
  • Built-in variables, cannot be enclosed in double quotes, otherwise the system will treat it as a string

Common built-in variables of awk (can be used directly):

Built-in variables Meaning
$0 The entire row content of the currently processed row
$n The nth field (nth column) of the currently processed row
NR The line number (ordinal number) of the currently processed line
NF current The number of fields in the processed row. $NF represents the last field
FS column separator. Specifies the field separator for each line of text, defaults to spaces or tabs. Same as “-F” File name to be processed
RS Line separator. When awk reads data from a file, it will cut the data into many records according to the definition of RS, and awk only reads one record at a time for processing. The default value is “\

Basic usage of awk

2.1 Output content by line

Example 1: $0 represents the entire line content of the current line

1. Output all content. Read line by line, $0 means the entire line content of the current line.

[root@yuji ~]# awk '{print $0}' ff.txt //output all content
 one
 two
 three
 four
 five
 six
 seven
 eight
 nine
 ten
 [root@yuji ~]# awk '{print}' ff.txt // output all content
 one
 two
 three
 four
 five
 six
 seven
 eight
 nine
 ten

Use NR to specify the line number

NR represents the line number (ordinal number) of the currently processed line.

1. Print 1 to 3 lines. Commas indicate consecutive line content. & amp; & amp; means “and

[root@yuji ~]# awk 'NR==1,NR==3 {print $0}' ff.txt //Print 1 to 3 lines
 one
 two
 three
 [root@yuji ~]# awk '(NR>=1) & amp; & amp;(NR<=3) {print}' ff.txt
 one
 two
 three

Note the symbol “||” (for “or”)

[root@yuji ~]# awk '(NR>=1)||(NR<=3) {print}' ff.txt //print all lines
 one
 two
 three
 four
 five
 six
 seven
 eight
 nine
 ten
 [root@yuji ~]# awk '(NR==1)||(NR==3) {print}' ff.txt //Print line 1 and line 3
 one
 three

Print odd lines; print even lines. (Divide the row number by 2 to get the remainder, the remainder 1 is an odd row, and the remainder 0 is an even row

[root@yuji ~]# awk '(NR%2)==1 {print}' ff.txt //Print odd lines
 one
 three
 five
 seven
 nine
 [root@yuji ~]# awk '(NR%2)==0 {print}' ff.txt //Print even lines
 two
 four
 six
 eight
 ten

$n represents the nth field of the current processing line

[root@yuji ~]# ifconfig ens33
 ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
         inet 192.168.72.192 netmask 255.255.255.0 broadcast 192.168.72.255
         inet6 fe80::fb25:1441:ffa2:fe13 prefixlen 64 scopeid 0x20<link>
         ether 00:0c:29:09:56:c0 txqueuelen 1000 (Ethernet)
         RX packets 941 bytes 95889 (93.6 KiB)
         RX errors 0 dropped 0 overruns 0 frame 0
         TX packets 652 bytes 98855 (96.5 KiB)
         TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
 ?
 [root@yuji ~]# ifconfig ens33| awk 'NR==2{print $2}'
 192.168.72.192

Filter lines by text pattern (matching string)

Combine with regular expressions to filter out the desired lines by matching strings.

1. Output the line containing root.

[root@yuji ~]# awk '/root/ {print}' pass.txt //Use awk command
 root:x:0:0:root:/root:/bin/bash
 operator:x:11:0:operator:/root:/sbin/nologin
 [root@yuji ~]# sed -n '/root/p' pass.txt //Use sed command
 root:x:0:0:root:/root:/bin/bash
 operator:x:11:0:operator:/root:/sbin/nologin

 #Use the awk command
 [root@yuji ~]# awk '/^root/ {print}' pass.txt //Output lines starting with root
 root:x:0:0:root:/root:/bin/bash
 [root@yuji ~]# awk '/bash$/ {print}' pass.txt //Output the line ending with bash.
 root:x:0:0:root:/root:/bin/bash
 yuji2:x:1000:1000:yuji2:/home/yuji2:/bin/bash
 nancy:x:1021:1021::/home/nancy:/bin/bash
 helen:x:1022:1022::/home/helen:/bin/bash
 ?
 #Use the sed command
 [root@yuji ~]# sed -n '/^root/p' pass.txt //Output lines starting with root
 root:x:0:0:root:/root:/bin/bash
 [root@yuji ~]# sed -n '/bash$/p' pass.txt //Output the line ending with bash.
 root:x:0:0:root:/root:/bin/bash
 yuji2:x:1000:1000:yuji2:/home/yuji2:/bin/bash
 nancy:x:1021:1021::/home/nancy:/bin/bash
 helen:x:1022:1022::/home/helen:/bin/bash

BEGIN mode

Format

 awk 'BEGIN{...};{...};END{...}' file
 ?
 #Processing process:
 1. Before awk processes the specified text, it is necessary to execute the command operation in the BEGIN{...} mode;
 2. The {...} in the middle is the actual command operation for processing files;
 3. The command operation in the END{...} mode will be executed after awk finishes processing the file. In the END { } statement block, statements such as printing results are often placed.

Count lines ending in bash.

  • BEGIN is the operation performed before processing the file, and END is the operation performed after processing the file.
  • /bash$/ is a condition that needs to be met.
  • First specify a variable x=0 in the BEGIN block; then process the file, execute x=x + 1 every time a line ending with bash is retrieved; finally execute the command in the END block and print the value of x.
 [root@yuji ~]# awk '/bash$/{print}' pass.txt
 root:x:0:0:root:/root:/bin/bash
 yuji2:x:1000:1000:yuji2:/home/yuji2:/bin/bash
 nancy:x:1021:1021::/home/nancy:/bin/bash
 helen:x:1022:1022::/home/helen:/bin/bash
 [root@yuji ~]# awk 'BEGIN{x=0};/bash$/{x + + };END{print x}' pass.txt
 4

Output content by field (column)

Two ways to specify the column separator: use -F or use the built-in variable FS.

Example 1: Use -F to specify the delimiter.

 [root@yuji ~]# awk -F ':' '{print $1,$3}' /etc/passwd
 root 0
 bin 1
 daemon 2
 adm 3
 lp 4
 sync 5
 shutdown 6
 stop 7
 mail 8
 ?…

Reassign the built-in variable FS.

The built-in variable FS represents the column separator, and the default is a space or a tab stop. Reassign to reassign the delimiter.

[root@yuji ~]# awk 'BEGIN {FS=":"};{print $1,$3}' /etc/passwd
 root 0
 bin 1
 daemon 2
 adm 3
 lp 4
 sync 5
 shutdown 6
 stop 7
 mail 8
 ?…

Print users with UID greater than 500, print users with UID less than or equal to 500 (! Negate)

Ask to print username and UID.

! means negation, not greater than 500, that is, less than or equal to 500.

 #Using a colon as a separator, filter out the lines whose third field is greater than 500, and then print the first and third fields
 [root@yuji ~]# awk -F: '$3>500 {print $1,$3}' /etc/passwd
 ?
 #Using a colon as a separator, filter out the rows where the third field is less than or equal to 500, and then print the first and third fields
 [root@yuji ~]# awk -F: '!($3>500) {print $1,$3}' /etc/passwd

Use an if statement

When using the if statement, add ( ) to the internal condition and { } to the outside.

Treat the entire statement { } as an operation command, which is equivalent to nesting.

[root@yuji ~]# awk -F: '{if($3>500) {print $1,$3}}' /etc/passwd
 polkitd 999
 libstoragemgmt 998
 color 997
 saslauth 996
 setroubleshoot 995
 chrony 994
 geoclue 993
 sssd 992
 nfsnobody-65534
 gnome-initial-setup 991
 yuji2 1000
 nancy 1021
 helen 1022

Ternary operator

 In java: (conditional expression)? (A expression or value): (B expression or value)
 - When the conditional expression is established (true), the value A before the colon will be taken.
 - When the conditional expression is not established (false), the value B after the colon will be taken.
 ?
 In the shell: [ conditional expression ] & amp; & amp; A || B
 - When the conditional expression is established (true), the value A in front of || will be taken.
 - When the conditional expression is not established (false), the value B after || will be taken.

 [root@yuji ~]# awk -F: '{max=($3>=$4)?$3:$4;{print max,$1}}' /etc/passwd
 0 root
 1 bin
 2 daemons
 4 adm
 7lp
 5 sync
 6 shutdown
 7 halt
 12 mail
 ?…

$NF, $n~, $n!~, $n==, $n!=

 $NF //represents the last field
 ?
 ~ means include, !~ means not include, == means equal, != means not equal
 ?
 $n> < == // used to compare values
 $n~"string" //Represents the function of the nth field containing a certain string
 $n!~"string" //Represents the nth field does not contain a certain string
 $n=="string" //represents the function of the nth field as a certain string
 $n!="String" //Represents the function of the nth field not being a certain string

When the seventh field is required to be not equal to “/bin/bash” and not equal to “/sbin/nologin”, output the first field, the third field, and the last field

 [root@yuji ~]# awk -F: '($7!="/bin/bash") & amp; & amp;($NF!="/sbin/nologin") {print $1,$3, $NF}' /etc/passwd
 sync 5 /bin/sync
 shutdown 6 /sbin/shutdown
 halt 7 /sbin/halt

Advanced usage of awk

RS specifies the line separator

When awk reads data from a file, it will cut the data into many records according to the definition of RS, and awk only reads one record at a time for processing. The default value of the built-in variable RS is “\
“.

Specifying a colon as a delimiter prints the line number and the entire line content.

 [root@yuji ~]# echo $PATH
 /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
 [root@yuji ~]# echo $PATH |awk 'BEGIN{RS=":"};{print NR,$0}' //Specify a colon as a separator, print the line number and the entire line content
 1 /usr/local/sbin
 2 /usr/local/bin
 3 /usr/sbin
 4 /usr/bin
 5 /root/bin
 ?
 [root@yuji ~]#

Pipeline the results of other commands

Count of rows

Method 1: Use the built-in variable NR to print the last line number

After processing the text, print the line number, which is the line number of the last line, that is, the number of lines. (This method is only applicable to the case of outputting all lines, that is, it can only count how many lines there are in the full text)

 [root@yuji ~]# echo $PATH |awk 'BEGIN{RS=":"};{print NR,$0}'
 1 /usr/local/sbin
 2 /usr/local/bin
 3 /usr/sbin
 4 /usr/bin
 5 /root/bin
 ?
 [root@yuji ~]# echo $PATH |awk 'BEGIN{RS=":"};END{print NR}'

Use “wc -l” to count lines

Use the pipe character to pass the command result to “wc -l”. If you use “wc -l” inside { }, be sure to add double quotes.

 [root@yuji ~]# awk -F: '/bash$/{print}' pass.txt
 root:x:0:0:root:/root:/bin/bash
 yuji2:x:1000:1000:yuji2:/home/yuji2:/bin/bash
 nancy:x:1021:1021::/home/nancy:/bin/bash
 helen:x:1022:1022::/home/helen:/bin/bash
 [root@yuji ~]# awk -F: '/bash$/{print}' pass.txt |wc -l
 4
 [root@yuji ~]# awk -F: '/bash$/{print |"wc -l"}' pass.txt
 4

Use “grep -c” to count the number of matching lines.

 [root@yuji ~]# grep -c "bash$" pass.txt
 4

Statistic memory usage

The free command can check the memory usage.

Usage = amount of memory used/total amount of memory.

int means taking an integer.

[root@yuji ~]# free
               total used free shared buff/cache available
 Mem: 999696 310872 319996 7400 368828 491760
 Swap: 2097148 0 2097148
 [root@yuji ~]# free | awk '/Mem:/ {print $3/$2}'
 0.311223
 [root@yuji ~]# free | awk '/Mem:/ {print $3/$2*100}'
 31.1223
 [root@yuji ~]# free | awk '/Mem:/ {print int($3/$2*100)}'
 31
 [root@yuji ~]# free | awk '/Mem:/ {print int($3/$2*100)"%"}'
 31%

Filter the CPU idle rate

The top command can check the CPU usage.

top -b -n 1 will only output the result of top once and will not refresh it.

[root@yuji ~]# top -b -n 1
 top - 18:45:59 up 4:46, 3 users, load average: 0.04, 0.08, 0.06
 Tasks: 149 total, 1 running, 148 sleeping, 0 stopped, 0 zombie
 %Cpu(s): 0.0 us, 6.2 sy, 0.0 ni, 93.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 s
 KiB Mem : 999696 total, 317456 free, 311324 used, 370916 buff/cache
 KiB Swap: 2097148 total, 2097148 free, 0 used. 491288 avail Mem
 [root@yuji ~]# top -b -n 1 |awk -F, '/Cpu/{print $4}'
 100.0 id
 [root@yuji ~]# top -b -n 1 |awk -F, '/Cpu/{print $4}' |awk '{print $1}'
 100.0
 [root@yuji ~]# top -b -n 1 |awk -F, '/Cpu/{print $4}' |awk '{print int($1)}'
 85
 [root@yuji ~]# top -b -n 1 |awk -F, '/Cpu/{print $4}' |awk '{print int($1)"%"}'
 86%

Specify the column separator when outputting

FS Column separator on input.

Column separator for OFS output. ($1=$1 is used for activation, otherwise it will not take effect)

[root@yuji ~]# echo "A B C D"
 A B C D
 [root@yuji ~]# echo "A B C D" | tr " " "|"
 A|B|C|D
 [root@yuji ~]# echo "A B C D" | sed 's/ /|/g'
 A|B|C|D
 [root@yuji ~]# echo "A B C D" |awk 'BEGIN{OFS="|"};{$1=$1;print $0}'
 A|B|C|D

Use awk to deduplicate (awk array feature)

Arrays can be defined in awk, and the subscript value of an array element can be a string (with double quotation marks), and if the element value is a string, quotation marks must also be added.

1. Define an array.

 [root@yuji ~]# awk 'BEGIN{a[0]=10 ;a[1]=20; print a[0]}'
 10
 [root@yuji ~]# awk 'BEGIN{a[0]=10 ;a[1]=20; print a[1]}'
 20
 [root@yuji ~]# awk 'BEGIN{a["abc"]=10 ;a["xyz"]=20; print a["abc"]}'
 10

Print all element values and corresponding subscript values in the array.

 [root@yuji ~]# awk 'BEGIN{a[0]=10;a[1]=20;a[2]=30; for(i in a){print i,a[i]} }'
 0 10
 1 20
 2 30

, print out the repeated lines in the file, and the number of repetitions

[root@yuji ~]# cat test.txt
 aaa
 aaa
 bbb
 ccc
 aaa
 bbb
 aaa
 [root@yuji ~]# awk '{a[1] + + ; print a[1]}' test.txt
 1
 2
 3
 4
 5
 6
 7
 [root@yuji ~]# awk '{a[$1] + + };END{for(i in a){print i,a[i]}}' test.txt
 aaa 4
 ccc 1
 bbb2

Case

By analyzing the log /var/log/secure, check which hosts are brute-forcing the local service. If the password verification fails more than three times (regardless of continuity), add the IP to the blacklist /etc/hosts.deny

 #Filter out the lines containing "Failed password", print the 11th field, and sort by number
 [root@yuji ~]# awk '/Failed password/{print $11}' /var/log/secure |sort -n
 192.168.72.10
 192.168.72.10
 192.168.72.10
 192.168.72.10
 192.168.72.192
 192.168.72.192
 192.168.72.192
 ?
 #Count the number of duplicate rows
 [root@yuji ~]# awk '/Failed password/{print $11}' /var/log/secure |sort -n |uniq -c
       4 192.168.72.10
       3 192.168.72.192
 ?
 #If the number of repetitions is greater than 3, add "sshd:" before the IP and add it to the /etc/hosts.deny file
 [root@yuji ~]# awk '/Failed password/{print $11}' /var/log/secure |sort -n |uniq -c| awk '$1>3 {print "sshd:"$2}' >>/ etc/hosts.deny
 ?
 #View etc/hosts.deny file
 [root@yuji ~]# cat /etc/hosts.deny
 #
 # hosts. deny This file contains access rules which are used to
 # deny connections to network services that either use
 # the tcp_wrappers library or that have been
 # started through a tcp_wrappers-enabled xinetd.
 #
 # The rules in this file can also be set up in
 # /etc/hosts.allow with a 'deny' option instead.
 #
 # See 'man 5 hosts_options' and 'man 5 hosts_access'
 # for information on rule syntax.
 # See 'man tcpd' for information on tcp_wrappers
 #
 sshd:192.168.72.10

 [root@yuji ~]# awk '/Failed password/{a[$11] + + };END{for(i in a){print i,a[i]}}' /var/log/secure
 192.168.72.10 4
 192.168.72.192 3