一. sort
1. 功能及原则
sort将文件的每一行为单位,相互比较,从字符串的首字符开始,依次按照ASCII值比较,按照升序输出。
[jinyuyu@localhost progammer]$ cat file
bbbbbbb
aaaaaaa
ddddddd
eeeeee
aaaaaaa
fffffff
bbbbbbb
[jinyuyu@localhost progammer]$ sort file
aaaaaaa
aaaaaaa
bbbbbbb
bbbbbbb
ddddddd
eeeeee
fffffff
2. 常见选项
示例
[jinyuyu@localhost progammer]$ sort -r file
fffffff
eeeeee
ddddddd
bbbbbbb
bbbbbbb
aaaaaaa
aaaaaaa
[jinyuyu@localhost progammer]$ sort -u file
aaaaaaa
bbbbbbb
ddddddd
eeeeee
fffffff
[jinyuyu@localhost progammer]$ sort -ur file
fffffff
eeeeee
ddddddd
bbbbbbb
aaaaaaa
[jinyuyu@localhost progammer]$ sort -n file
22aaaaaaa
33ddddddd
44eeeeee
55aaaaaaa
66fffffff
77bbbbbbb
111bbbbbbb
[jinyuyu@localhost progammer]$ cat file
111bbbbbbb:12
22aaaaaaa:13
33ddddddd:11
44eeeeee:10
55aaaaaaa:12
66fffffff:11
77bbbbbbb:10
[jinyuyu@localhost progammer]$ sort -t':' -nrk 2 file
22aaaaaaa:13
55aaaaaaa:12
111bbbbbbb:12
66fffffff:11
33ddddddd:11
77bbbbbbb:10
44eeeeee:10
[jinyuyu@localhost progammer]$ sort -t':' -nruk 2 file
22aaaaaaa:13
111bbbbbbb:12
33ddddddd:11
44eeeeee:10
[jinyuyu@localhost progammer]$ cat file
bbbbbbb:12
Aaaaaaa:13
ddddddd:11
eeeeee:10
aaaaaaa:12
fffffff:11
bbbbbbb:10
[jinyuyu@localhost progammer]$ sort -f file
aaaaaaa:12
Aaaaaaa:13
bbbbbbb:10
bbbbbbb:12
ddddddd:11
eeeeee:10
fffffff:11
[jinyuyu@localhost progammer]$ sort -c file
sort:file:2:无序: Aaaaaaa:13
[jinyuyu@localhost progammer]$ echo $?
1
[jinyuyu@localhost progammer]$ sort -C file
[jinyuyu@localhost progammer]$ echo $?
1
二. uniq
1. 功能
如果文本内容有重复的行,将相邻的重复行去掉。
2. 常见选项
(1)示例
[jinyuyu@localhost progammer]$ cat file
bbbbbbb
Aaaaaaa
ddddddd
eeeeee
aaaaaaa
fffffff
bbbbbbb
[jinyuyu@localhost progammer]$ sort file | uniq -c
1 aaaaaaa
1 Aaaaaaa
2 bbbbbbb
1 ddddddd
1 eeeeee
1 fffffff
[jinyuyu@localhost progammer]$ sort file | uniq -d
bbbbbbb
[jinyuyu@localhost progammer]$ sort file | uniq -u
aaaaaaa
Aaaaaaa
ddddddd
eeeeee
fffffff
(2)练习
海量数据筛选
给一个很大的文件,里面有很多的IP地址,找出出现次数最多的前3个IP地址。
[jinyuyu@localhost progammer]$ sort file | uniq -c | sort -nr | head -3
2 bbbbbbb
1 fffffff
1 eeeeee
求两个文件的并集,补集,交集
并集
[jinyuyu@localhost progammer]$ cat file2 file > file3
[jinyuyu@localhost progammer]$ cat file3
aaaaaaa
bbbbbbb
zzzzzz
fffffff
bbbbbbb
Aaaaaaa
ddddddd
eeeeee
aaaaaaa
fffffff
bbbbbbb
[jinyuyu@localhost progammer]$ sort file | uniq
aaaaaaa
Aaaaaaa
bbbbbbb
ddddddd
eeeeee
fffffff
补集
[jinyuyu@localhost progammer]$ sort file3 | uniq -u
Aaaaaaa
ddddddd
eeeeee
zzzzzz
交集
[jinyuyu@localhost progammer]$ sort file3 | uniq -d
aaaaaaa
bbbbbbb
fffffff
三. paste命令
1. 功能
paste的意思是粘贴,但是主要功能是将多个文件的内容合并。paste按行将不同的文件信息放在一行,默认情况下,paste连接时,用空格或Tab键分隔行中的不同文本。
[jinyuyu@localhost progammer]$ vim file
[jinyuyu@localhost progammer]$ vim file2
[jinyuyu@localhost progammer]$ cat file
a
b
c
d
e
f
g
[jinyuyu@localhost progammer]$ cat file2
e
f
g
h
i
j
k
l
[jinyuyu@localhost progammer]$ paste file file2
a e
b f
c g
d h
e i
f j
g k
l
2. 常见选项
示例
[jinyuyu@localhost progammer]$ paste -d# file file2
a#e
b#f
c#g
d#h
e#i
f#j
g#k
#l
[jinyuyu@localhost progammer]$ paste -d: file file2
a:e
b:f
c:g
d:h
e:i
f:j
g:k
:l
[jinyuyu@localhost progammer]$ paste -s file file2
a b c d e f g
e f g h i j k l
[jinyuyu@localhost progammer]$ paste -s -d# file file2
a#b#c#d#e#f#g
e#f#g#h#i#j#k#l
[jinyuyu@localhost progammer]$ ls /etc | paste - - - - - -
abrt acpi adjtime aliases aliases.db alsa
alternatives anacrontab asound.conf at.deny audisp audit
avahi bash_completion.d bashrc blkid bluetooth bonobo-activation
centos-release chkconfig.d ConsoleKit cron.d cron.daily cron.deny
四. cut剪切
1. 功能
cut命令从文件的每一行剪切字节,字符和字段,并将这些字节,字符和字段写至标准输出。
2. 常见选项
选项 | 功能 |
---|---|
-b | 以字节为单位进行分割 |
-c | 以字符为单位进行分割 |
-d | 自定义分割符,默认为制表符 |
-f | 与-d一起使用,指定显示的区域 |
示例
[jinyuyu@localhost progammer]$ echo hello | cut -b 1
h
[jinyuyu@localhost progammer]$ echo hello |cut -b 1-3
hel
[jinyuyu@localhost progammer]$ echo hello |cut -b 3
l
[jinyuyu@localhost progammer]$ echo hello |cut -b 0
cut: 序号从1 开始计数
[jinyuyu@localhost progammer]$ echo hello |cut -b 1,3
hl
[jinyuyu@localhost progammer]$ echo hello | cut -c 1
h
[jinyuyu@localhost progammer]$ echo hello | cut -c 1-3
hel
[jinyuyu@localhost progammer]$ echo hello | cut -c 1-
hello
注:是不是从上面的例子看出来-c和-b选项没有什么区别,那就大错特错了。
[jinyuyu@localhost progammer]$ echo '开始' | cut -b 1
[jinyuyu@localhost progammer]$ echo '开始' | cut -b 1,3
[jinyuyu@localhost progammer]$ echo '开始' | cut -b 1-3
开
[jinyuyu@localhost progammer]$ echo '开始' | cut -c 1
开
[jinyuyu@localhost progammer]$ echo '开始' | cut -c 1-3
开始
注:在分割汉字的时候,我们可以看到-c&-b选项大有不同,那是因为一个汉字是一个字符但是占2或3个字节。要注意它们的使用。
[jinyuyu@localhost progammer]$ echo he llo |cut -d' ' -f 1-
he llo
[jinyuyu@localhost progammer]$ echo he llo |cut -d' ' -f -2
he llo
五. xargs
1. 功能
xargs是一个非常强大的工具,有非常多的用武之地。它的第一个主要功能是将标准输入转化为命令行参数。第二个主要功能是将单行或多行的输入转换为其他格式,如多行变单行,单行变多行。
示例
[jinyuyu@localhost progammer]$ cat file
a
b
c
d
e
f
g
[jinyuyu@localhost progammer]$ cat file | xargs
a b c d e f g
2. 常见选项
选项 | 功能 |
---|---|
-n | 指定列数,并多行输出 |
-d | 自定义域分隔符,将特定列打散,并指定格式输出 |
-I | 指定一个替换字符串 |
注:xargs的常用选项不多,但是它的命令也有很多。
[jinyuyu@localhost progammer]$
[jinyuyu@localhost progammer]$ cat file | xargs -n3
a b c
d e f
g
[jinyuyu@localhost progammer]$ echo "a#b#c#d" |xargs -d#
a b c d
[jinyuyu@localhost progammer]$ echo "a#b#c#d" |xargs -d# -n2
a b
c d
[hb@MiWiFi-R1CL-srv test]$ cat file | xargs ./test.sh -a -b
-a -b aaa bbb ccc ddd
[hb@MiWiFi-R1CL-srv test]$ cat file | xargs -I {} ./test.sh -a {} -b
-a aaa -b
-a bbb -b
-a ccc -b
-a ddd -b
[hb@MiWiFi-R1CL-srv test]$ cat test.sh
创建根目录下的.log文件,并删除它。
[hb@MiWiFi-R1CL-srv test]$ ls /
bin boot dev hello lib media opt proc sbin server sys tmp var BIT client etc home lost+found mnt output.tgz root selinux srv tcp_client usr
[hb@MiWiFi-R1CL-srv test]$ ls / | xargs -I {} touch {}.log
[hb@MiWiFi-R1CL-srv test]$ ls bin.log boot.log dev.log hello.log lib.log media.log opt.log proc.log sbin.log server.log sys.log tmp.log var.log BIT.log client.log etc.log home.log lost+found.log mnt.log output.tgz.log root.log selinux.log srv.log tcp_client.log usr.log
[hb@MiWiFi-R1CL-srv test]$ ls / | xargs -I {} rm {}.log
[hb@MiWiFi-R1CL-srv test]$ ls
注:做替换时对每一个命令行参数都要替换一次。
作为命令行参数传入
首先创建一个test3.sh的shell脚本文件
#!/bin/bash
echo $0
echo $1
echo $2
echo $3
echo $5
echo $@
echo $#
创建file文件
[jinyuyu@localhost progammer]$ cat file
aaa
bbb
ccc
ddd
先运行test3.sh,输出为:
[jinyuyu@localhost progammer]$ chmod u+x test3.sh
[jinyuyu@localhost progammer]$ ./test3.sh
./test3.sh
0
实现将输入转换成命令行参数
[jinyuyu@localhost progammer]$ cat file | xargs ./test3.sh
./test3.sh
aaa
bbb
ccc
aaa bbb ccc ddd
4
也可以添加别的命令行参数
[jinyuyu@localhost progammer]$ cat file | xargs ./test3.sh -a -b
./test3.sh
-a
-b
aaa
ccc
-a -b aaa bbb ccc ddd
6