RHCE—shell script programming awk

Article Directory

  • Table of Contents

    Article directory

    Preface

    1.awk concept

    2. Workflow

    3.awk execution method

    4.awk syntax structure and cases

    Pure command execution script

    awk command calls script execution

    Direct awk pure script execution

    5. Records and Domains

    concept:

    6. awk variables

    Summarize

Foreword

The previous article has introduced in detail two of the grep and sed commands of the Three Musketeers of Text. This article will introduce the last one of the Three Musketeers of Text, awk.

一.awk concept

AWK is a tool for text processing and data extraction. It can be used from the command line or run as a script. AWK implements functions such as data extraction and formatted output by processing text messages.

Awk is also a programming language environment. It provides regular expression matching, flow control, operators, expressions, variables and functions and a series of features of programming languages. It has obtained some excellent features from the C language. thought of

2. Workflow

  • Step 1: Automatically read lines of text from the specified data file.
  • Step 2: Automatically update the values of awk’s built-in system variables, such as column number variable NF, row number variable NR, row variable, and each column variable 1, $2, etc.
  • Step 3: Execute all matching patterns and operations in the program in sequence
  • Step 4: After executing all the matching patterns and operations in the program, if there are still unread data lines in the data file, return to step (1) and repeat (1) ~ (4) Operation

Three.awk execution method

Any awk statement consists of a pattern and an action.

  • Pattern: A set of rules used to test whether an input line requires an action
  • Action: contains the execution process of statements, functions and expressions
  • In short, the mode determines when the action triggers the event, and the action determines the processing action performed

四.awk Grammar Structure and Cases

Grammatical structure:

Pure command execution script

awk 'pattern {action}' file# represents the file that needs to be executed
#pattern is the text pattern, used to filter lines of input data
#The above pattern is one of the three modes, and the remaining two modes are begin and end modes.
#action is an execution action, used to perform certain operations on rows that meet specific conditions
  • begin mode: All preparations before processing the text are performed in this mode
  • end mode: All work before exiting after text processing is executed in this mode, referred to as ending mode.
  • For example: the awk command is to execute the text content in sequence and output a result. If you want to output the processing results at once after processing all the text, you need to put the output work in end mode.

Format:

awk 'BEGIN{ commands } {print item1, item2,...} END{ commands }' [INPUTFILE...]
#begin mode commands are separated by commas and output is output with spaces

Parameters:

  1. -F: Specify delimiter
  2. -v: define variables
  3. -f: Read awk script from file
  4. -F: ‘{print $2}’: Specify script
  5. -NR: Output the specified range of lines
  6. -NF: Output the number of fields in the current row
  7. -OFS: Specify output delimiter
  8. -ORS: Specify the output line separator
  9. -i: modify the file directly

Case 1:

[root@timeserver ~]# vim input
#Enter multiple carriage returns to create multiple blank lines

[root@timeserver ~]# awk '/^$/{print "This is a blank line"}' input
This is a blank line
This is a blank line
This is a blank line
This is a blank line
#The script calling process is: pattern condition /^$/ retrieves blank lines, and executes an action when matching a blank line: print "This is a blank line

Case 2:

[root@server ~]# awk 'BEGIN{print "BEGIN ...."} {print $0} END{print "The End"}' /etc/fstab
# $0 represents all columns in this row

The above command execution process:

#First execute the begin mode to print a header information....
#Then execute the pattern mode to process the text (print $0), read all the columns of the first line of the text and output,
#After the first line of processing ends, continue to process the second line of output. Awk processes the output in line order.
#Finally execute the end mode and perform the final finishing work

awk command calls script execution

  • It is recommended to use this method when there are multiple modes and actions.
[root@server ~]# vim scr.awk
#Edit the following content
/^$/{print "This is a blank line"}
#Use awk command to call script execution
[root@server ~]# awk -f scr.awk input
This is a blank line
This is a blank line
This is a blank line

Direct awk pure script execution

[root@server ~]# vim scr.awk
#First edit the header to declare the script interpreter
#!/bin/awk -f
/^$/{print "This is a blank line"}
#Use the following command to execute the script
[root@server ~]chmod o + x scr.awk
[root@server ~]./scr.awk inout#The file name being processed

5. Records and Domains

Concept:

Field: In awk, a field is an individual data item in a line of text, usually separated by spaces or tabs. By default, awk divides the input line into fields according to spaces or tabs, the default is spaces or tab.

Record: A record refers to a complete line of input text processed by awk. By default, records are defined with newlines as delimiters. In awk, each record can contain multiple fields, and each field can be separated by a field separator. $1 represents the first domain $0 represents all domains.

In short: Records represent rows, fields represent columns

Case 1: Process each row and column

[root@timeserver ~]# awk '{print $0}' awk1.txt
#Intercept the first column as $1 and the second column as $2

Case 2: The field after $ is defined using variables

[root@server ~]# awk 'BEGIN{one=1 ; two=2} {print $(one + two)}' awk1.txt 

Case 3: Query rows containing l to display the third column

[root@server ~]# awk '/^l/{print $3}' awk1.txt

Case 4: Export all accounts

[root@server ~]# awk -F ":" '{print $1}' /etc/passwd
  • Awk recognizes spaces by default to identify columns. There are no spaces and delimiters recognized by awk by default in special files.
  • You need to use the parameter -F to specify the separator

Six.awk variables

$0: record variable represents all fields (columns) $n field variable represents the nth domain (column)

NF: The number of fields in the current record

NR: Display the line number of each line

FS: Input field delimiter, the default value is space or tab character, you can use -F to specify the delimiter

OFS: Output field delimiter, OFS=”#” specifies the output delimiter as # RS record delimiter, the default value is newline character \\
ENVIRON: Associative array of current shell environment variables and their values

FILENAME: The file name processed by awk

Case 1: Export all accounts

[root@server ~]# awk 'BEGIN{FS=":"} {print $1}' /etc/passwd
#Set separator in begin mode

Case 2: Use NR NF to display the number of rows and columns

[root@server ~]# awk '{print NF,NR,$0} END{print FILENAME}' awk1.txt
#FILENAME represents the file name to be processed

[root@server ~]# awk '{print "th",NR,"row","has",NF,"column"}' > "/root /t1.txt" awk1.txt

Case 3:

[root@server ~]# awk -F ":" 'BEGIN{OFS="\t"} {print $1,$3}' /etc/passwd

Case 4:

[root@server ~]# vim awk2.txt
zhangsan 68 88 92 45 71
77 69 43 52 84
wangwu 61 99 85 77 56


[root@server ~]# vim test.awk
{
       print
       print "$0:" , $0
       print "$1:" , $1
       print "$2:" , $2
       print "NF:" , NF
       print "NR:" , NR
       print "FILENAME: " , FILENAME
}

[root@server ~]# awk -f test.awk awk2.txt
#The execution process is as follows
#awk processes by line and executes the first command

The processing results are as follows

zhangsan 68 88 92 45 71
$0: zhangsan 68 88 92 45 71
$1: zhangsan
$2: 68
NF: 6
NR: 1
FILENAME: awk2.txt
77 69 43 52 84
$0: lisi 77 69 43 52 84
$1: lisi
$2: 77
NF: 6
NR: 2
FILENAME: awk2.txt
wangwu 61 99 85 77 56
$0: wangwu 61 99 85 77 56
$1: wangwu
$2: 61
NF: 6
NR: 3
FILENAME: awk2.txt

Interview question: Check the line numbers of blank lines in a file

[root@server ~]# awk '/^$/{print NR}' /etc/sos/sos.conf
#First use awk to filter blank lines in the file, and then use awk to perform specified operations on the filtered content

Seven. Operators in awk

Arithmetic operators:

Arithmetic operators: +, -, *, /, % ^(exponent) **(exponent) (representing addition, subtraction, multiplication, division and modulo respectively)

Case 1: Calculation using arithmetic operators

[root@server ~]# awk 'BEGIN {x=2;y=3;print x + y , x-y , x/y , x%y , x^y , x**y}'

Case 2: Count the number of bytes occupied in a directory file

[root@server ~]# ll /etc | awk 'BEGIN{size=0} {size=size + $5} END{print "size is :", size/1024 , "KB" } '
#$5:The fifth column
The content in #BEGIN is executed before processing the body, but not during the processing of the body.
The content in #END will be executed last after the text is processed.
#The above command processing process:
#Use ll to query the file content and use the pipe character | to hand over the query content to the awk command for processing
The output content at the beginning of the BEGIN definition in the #awk command: assign 0 to size. During text processing, the value of the fifth column is accumulated from the first line.
#Finally, use the END mechanism to output the result once. If END is not used, according to the awk rules, one line of text will be processed and the processing result will be output once

Assignment operator:

Assignment operator: = + = -= /= %= ^=

-=: is a shorthand assignment operator, which means subtracting a number and assigning the result to a variable

[root@server ~]# awk 'BEGIN{a=5 ; print a + =5 , a-=5 , a/=5 }'
#The above command processing process: awk processes it line by line, and outputs a result for each line processed. First assign 5 to a, and then assign the value after a=5+5 to a. #At this time, a=10 and then a After the value of -5 is assigned to a, finally the value of a/5 is assigned to a, and finally the result is output

Conditional operator:

Format: Conditional expression?Expression1:Expression2

Note:? : Similar to conditional statement if else

Case: Output the maximum value

[root@server ~]# vim awk3.txt
3 6
10 9
3 3
7 5
[root@server ~]# awk '{max=$1>$2?$1:$2 ; print NR , "max=", max }' awk3.txt

Logical operators:

& amp; & amp; || ! (negation)

Relational operators

> (greater than) < >= = == != ~(match) ! ~ (does not match)

Case: Display local IP address

[root@server ~]# ifconfig ens160 | awk 'NR==2{print $2}' 

Case: Query the lines in the third column of the /etc/passwd file that are less than 10, and only output the account and UID.

[root@server ~]# awk 'BEGIN{FS=":"} $3<10{print $1,$3}' /etc/passwd

Other operators

+ + — + –

Case:

[root@server ~]# awk 'BEGIN{a=3 ; print a + + , + + a}'
#a + +: auto-increment after setting: output a first and then the value + 1
# + + a: pre-increment: first value + 1 and then output a

Case 2: When the operand in awk is a string, it will be converted to a value of 0 when participating in arithmetic operations.

[root@server ~]# awk 'BEGIN{a="china" ; print a + + , + + a}'
#Assign the string value to a. When a participates in arithmetic operations, it will automatically be converted to the value 0.

Summary