Super detailed experiment on Linux text processing tools sed, awk and grep

Article directory

  • 1. sed (good at line fetching and replacement)
    • Brief description
    • Order
      • Command format
      • Common optionsoptions
      • address delimitation
      • Edit commandcommand
      • Advanced editing commands
      • Pattern space and maintaining spatial relationships
    • experiment
  • 2. awk (good at fetching columns)
    • Brief description
    • grammar
    • Order
    • built-in variables
    • Custom variables (character case sensitive)
    • experiment
  • 3. grep (good at searching)
    • Brief description
    • Order
    • experiment
  • 4. cut
    • Order
  • 5. seq command
    • Order

1. sed (good at line fetching and replacement)

Brief description

sed is a stream editor that processes one line at a time. During processing, the currently processed line is stored in a temporary buffer, which is called “patternspace”, and then the sed command is used to process the content in the buffer. After the processing is completed, the buffer is content is sent to the screen. Then read the next line and execute the next loop.

Command

Command format

sed [options] [address delimited] command’ file(s)

Common optionsoptions

-n: Do not output the pattern space content to the screen, that is, do not automatically print, only print the matching lines.
-e: Multi-point editing, when processing each line, there can be multiple Scripts
-f: Write the Script to the file. When executing sed, -f specifies the file path. If there are multiple Scripts, write them in a new line.
-r: supports extended regular expressions
-i: Directly write the processing results to the file
-i.bak: Make a backup copy of the processed results before writing them to the file

Address delimitation

No address: process the full text
Single address: #: The specified line; /pattern/: Each line that can be matched by the pattern here
Address range:

  • #,#
  • List item
  • #, + #
  • /pat1/,/pat2/
  • #,/pat1/

~: step

  • sed -n 1~2p’ only prints odd lines (1~2 from line 1, plus 2 lines at a time)
  • sed -n ‘2~2p’ only prints even lines

Edit command command

d: Delete the lines matching the pattern space and immediately enable the next round of looping
p: Print the contents of the current pattern space, appended to the default output
a: Append text after the specified line. Supports using \\
to append multiple lines.
i: Insert text in front of the line, support using \\
to append multiple lines
c: The replacement line is single or multi-line text, and supports using, n to implement multi-line appending.
w: Save lines matching the pattern to the specified file
r: Read the text of the specified file to the end of the matched line in the pattern space
=: Print line numbers for lines in the pattern space
!: Negation processing of matching lines in pattern space
s///: Search and replace, supporting the use of other delimiters, such as: s@@@, s#i#

Advanced editing commands

h: Overwrite the content in the pattern space into the holding space
H: Append the content in the pattern space to the holding space
g: Take data from the holding space and overwrite it into the pattern space
G: Take the content from the holding space and append it to the pattern space
x: Exchange the contents in the pattern space with the contents in the holding space
n: Read the next line of the matched line and overwrite it into the pattern space
N: Read the next line of the matched line and append it to the pattern space
d: delete lines in pattern space
D: Delete the contents from the beginning of the current pattern space to \\
(no longer passed to standard output), abandon the subsequent commands, but re-execute sed on the remaining pattern space

Pattern space and maintaining spatial relationship

  • The relationship between pattern space and holding space: Holding space is a buffer that temporarily stores data in the pattern space, assisting the pattern space in data processing; data first enters the pattern space and then enters the holding space.

Experiment

Text content:

/p is to print the content of the current pattern space, appended to the default output; no option – by default, matching lines will be printed once, and unmatched lines will also be printed:

-n option: Do not output the pattern space content to the screen, only print matching lines:

-e: Multi-point editing, processing each line (the content of the pattern space),
s/ / /: search and replace, also s@@@, s###;

1. sed 's/original string/replacement string/'
2. If the row is not marked, this operation is performed for each row.
3. If the $ symbol is in / $ /, it means the end of the line, but if it is outside / $ /, it means the last line (the last line).
4. The arrow ^ indicates the beginning of the line
5. Parameters & amp; means adding
6. Add g to indicate in-line global replacement
7. l: Convert the next character to lowercase
8. L: Convert the replacement letter to lowercase until U or E appears.
9. u: Convert the next character to uppercase.
10. U: Convert replacement letters to uppercase until L or E appears
11. E: Stop case conversion starting with L or U

Replace a with A in the first line and b with B in the second line:

Convert all global letters containing uppercase letters to lowercase letters:

Please replace the first letter of each word with uppercase:
Convert all global letters containing lowercase letters to uppercase letters:

Comparison of /u and /U:

-r: Indicates that regular expressions can be used

Use file processing: Indicates that regular expressions can be used

Replace b in the second line with g:

Replace a or A in odd rows with RR:

Add, delete, check and modify:



2. awk (good at fetching columns)

Brief description

Awk is a stream editor and programming language, used to process text and data under linux/unix. Data can come from standard input (stdin), one or more files, or the output of other commands. It supports advanced functions such as user-defined functions and dynamic regular expressions, and is a powerful programming tool under Linux/Unix. It is used from the command line, but more often as a script. Awk has many built-in functions, such as arrays, functions, etc. This is the same as the C language. Flexibility is the biggest advantage of awk.
In fact, awk is not only a tool software, but also a programming language.

  • record: one line, one record
  • filed fields: such as hello, world, linux
  • Separator: space or: …field separator
  • $: hello is called $1, world is called $2
  • $0: this line

Grammar

awk [options] program’ var=value file…
awk [options] -f programfile var=value file…
awk [options] BEGIN{ action;… } pattern{ action;… } END{ action;… }’ file …

Command

-F fs: fs specifies the input delimiter, fs can be a string or a regular expression, such as -F:
-v var=value: Assign a user-defined variable and pass external variables to awk
-f scripfile: Read awk commands from script files

Built-in variables

FS: Input field separator, default is whitespace character
OFS: output field delimiter, default is whitespace character
RS: Input record separator, specify the newline character during input, the original newline character is still valid
ORS: Output record delimiter, use the specified symbol to replace the newline character when outputting
NF: Number of fields, how many fields there are, (NF-1) refers to the second to last column
NR: Line number, which can be followed by multiple files. The line number of the second file continues from the last line number of the first file.
FNR: Each file is counted separately, and the line number is followed by a file, which is the same as NR. With multiple files, the line number of the second file starts from 1.
FILENAME: current file name
ARGC: the number of command line parameters
ARGV: Array, which stores the parameters given by the command line. View the parameters.

Custom variables (character case sensitive)

Experiment

Space is the default separator:

Built-in parameters:


Mode + Operation:

Regular expression: //


Built-in functions:
Pattern block’:

Find index. There are two lines starting with lin, so there are two outputs, printing the locations of “l” and “e” respectively:

**length:**Field length

BEGIN{} and END{}: begin is executed before reading, end is executed after reading, and only one side of the entire awk execution flow is executed. First execute the pattern block in begin, then execute according to the pattern matching statement block, and then execute the end pattern block. (must be capitalized)



-f:

Script file:
run:

3. grep (good at searching)

Brief description

grep (comprehensive search for regular expressions and print out the lines) is a powerful text search tool that uses regular expressions to search for text and prints matching lines.

Command

Experiment

Find hello:
-i : ignore case search

-w: Exact search, ignore others

-n: Find the line number

-e: Or. Looking for both hello and leedia

-v: Reverse search, find lines without hello

-r: Find all files in this folder that contain leedia content

-l: Find the file names and paths of all files in this folder that contain leedia content

-E: Find the file names and paths of all files in this folder that contain leedia content

4. cut

Command

-d specifies the delimiter character.
-f specifies which column, commas or 1 can be used for multiple columns.
-c Get the content by character.

5. seq command

Command

-w digit completion.
-s specifies the separator character.
seq start step size end, often used to find odd and even numbers