Suppose we now have two files a.txt and b.txt
The content in a.txt is as follows:
a c 1 3 d 4
The content in b.txt is as follows:
a b e 2 1 5
#Example 01
Compute the union:
[root@VM_81_181_centos ~]# sort -u a.txt b.txt 1 2 3 4 5 a b c d e [root@VM_81_181_centos ~]#
#Exmaple 02
Compute intersection:
[root@VM_81_181_centos ~]# grep -F -f a.txt b.txt | sort | uniq 1 a [root@VM_81_181_centos ~]#
#Example 03
Compute the difference set (a – b):
[root@VM_81_181_centos ~]# grep -F -v -f b.txt a.txt | sort | uniq 3 4 c d [root@VM_81_181_centos ~]#
#Example 04
Calculate the difference set (b – a):
[root@VM_81_181_centos ~]# grep -F -v -f a.txt b.txt | sort | uniq 2 5 b e [root@VM_81_181_centos ~]#
————————————————– ———-Manual dividing line—————————————– —————————————-
2018/09/30 Update
The above describes how to use the grep command to implement intersection and difference sets of files, but there are some problems with the results obtained in actual operations.
[root@VM_81_181_centos ~]# grep -F -f a.txt b.txt | sort | uniq | wc -l 4095 [root@VM_81_181_centos ~]# grep -F -f b.txt a.txt | sort | uniq | wc -l 4729 [root@VM_81_181_centos ~]#
I used the above command to find the intersection of two files a and b, but when I changed the position order of the two files, the result turned out to be different.
Again, this is unscientific.
After thinking about it carefully, the grep command is a search command. For example:
The contents of the c.txt file are as follows:
1122 1133 1144 1155
The contents of the d.txt file are as follows:
11223344
Execute grep command:
[root@VM_81_181_centos ~]# grep -F -f c.txt d.txt | sort | uniq 11223344 [root@VM_81_181_centos ~]# grep -F -f d.txt c.txt | sort | uniq [root@VM_81_181_centos ~]#
Based on the results, the interpretation of the first command is:
After the command is executed, search the d.txt file for characters that match the c.txt file, because the characters 1122 in the c.txt file and the characters in the d.txt file
If the character 11223344 matches the preceding 1122, the character 11223344 will be recorded as the same part of the two files.
Second command:
After the command is executed, search the c.txt file for characters that match the d.txt file. 11223344 in the d.txt file cannot be found in the c.txt file.
Similar or identical characters, so the result is empty.
Now, add the character 112233445566 in the c.txt file. The results and operations are as follows:
c.txt file content:
1122 1133 1144 1155 1122334455
Execute grep command:
[root@VM_81_181_centos ~]# grep -F -f d.txt c.txt | sort | uniq 1122334455 [root@VM_81_181_centos ~]#
in conclusion:
grep -F -f fileA fileB | sort | uniq
When fileA file comes first, it means searching for the same or similar characters in fileB file as those in fileA file, and recording the character in fileB file.
In the same way, fileB comes first and fileA comes last.
However, this is not the result we want here. The result we want is that when we used to learn mathematics, we found that the intersection of two sets is the same, and the result is
The output is the common part of the two collections. I tried several methods and finally chose to use the cat command.
The command format is as follows:
cat fileA fileB | sort | uniq -d # Find intersection cat fileA fileB | sort | uniq -u # Find the difference set
This command is easier to understand. The cat command first merges two files into one file, and then sorts and removes duplicates from the merged files. The -d command outputs the file.
The same characters in the file, the -u command outputs different characters in the file, and when calculating the intersection, the result of which file order is fileA or fileB is the same.
The case is as follows:
[root@VM_81_181_centos ~]# cat c.txt 1122 1133 1144 1155 1122334455 [root@VM_81_181_centos ~]# cat d.txt 11223344 1122 [root@VM_81_181_centos ~]#
The contents of c and d files are as above
Execute the cat command to find the intersection:
[root@VM_81_181_centos ~]# cat c.txt d.txt | sort | uniq -d 1122 [root@VM_81_181_centos ~]# cat d.txt c.txt | sort | uniq -d 1122 [root@VM_81_181_centos ~]#
Execute the cat command to find the difference set:
[root@VM_81_181_centos ~]# cat c.txt d.txt | sort | uniq -u 11223344 1122334455 1133 1144 1155 [root@VM_81_181_centos ~]# cat d.txt c.txt | sort | uniq -u 11223344 1122334455 1133 1144 1155 [root@VM_81_181_centos ~]#
But the cat command also has a shortcoming. When the file is relatively large, an error will occur, but here we can use it.
The split command splits files, divides and conquers them, and then merges them. For how to use the split command, you can refer to this article of mine.
Portal: https://www.cnblogs.com/leeyongbard/p/9594439.html
————————————————–2019/04/ 27————————————————- ———-
paste command
Merge files by columns
The paste format is:
paste -d -s -file1 file2
The options have the following meanings:
-d specifies a delimiter different from spaces or tab keys, such as using the @ delimiter, use -d @
-s merge each file into lines instead of pasting by line
– Use standard input. For example: ls -l | paste means to display the output on only one column
example:
#cat pas1 ID897 ID666 ID982 #cat pas2 P.Jones S.Round L.Clip
Paste the two files pas1.txt and pas2.txt into two columns based on the paste command:
# paste pas1 pas2 ID897 P.Jones ID666 S.Round ID982 L.Clip
You can specify which column to paste first by exchanging the file names:
# paste pas2 pas1 P.Jones ID897 S.Round ID666 L.Clip ID982
To create a separator other than spaces or tabs, use the -d option, using colon as the separator as follows:
# paste -d: pas2 pas1 P.Jones:ID897 S.Round:ID666 L.Clip:ID982
To merge two columns into two rows, you need to use the -s option, as in the following example:
# paste -s pas1 pas2 ID897 ID666 ID982 P.Jones S.Round L.Clip
If you have different opinions, please share your opinions ^_^
The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Cloud native entry-level skills treeHomepageOverview 15692 people are learning the system