在正文开始之前向大家介绍一个在线的很好用的正则表达式规则测试的网站 https://regex101.com/
本文简要介绍Shell编程中正则表达式的相关内容,所谓正则表达式(regular expression)是指一种字符串匹配的模式(pattern),其可以用来检查一个串是否含有某种子串、将匹配的子串替换或者从某个串中取出符合某个条件的子串等。值得一提的是,不同的编程语言的正则表达式大同小异,但略有不同。
对正则表达式进行详细展开前首先要明确其与通配符的不同:
- 正则表达式用来在文件中匹配符合条件的字符串,其属于包含匹配。所谓包含匹配是指当某一行内容中包含符合正则表达式的字符串时,该行能够被筛选出来,因此当正则表达式更详细时,筛选出的内容更为具体。grep、awk、sed等命令可以支持正则表达式。
- 通配符一般用来匹配符合条件的文件名,通配符是完全匹配。ls、cp、find这些命令不支持正则表达式,所以只能使用shell自己的通配符来进行匹配。
- 一般可以认为,正则表达式用于字符串匹配,即文件内容的匹配,而通配符用于文件名的匹配,两者的相同符号具有不同的含义
了解了正则表达式与通配符的不同之后,正式进入正则表达式的学习,有关正则表达式的一些基础规则(部分规则)如下
元字符 | 作 用 |
* | 前一个字符匹配0次或任意多次 |
. | 匹配除了换行符外任意一个字符 |
^ | 匹配行首。例如^hello会匹配以hello开头的行 |
$ | 匹配行尾。例如hello$会匹配以hello结尾的行 |
[] | 匹配中括号中指定的任意一个字符,只匹配一个字符。 例如[aoeiu]匹配任意一个元音字母,[0-9]匹配任意一位数字,[a-z][0-9]匹配小写字母和一位数字构成的两位字符 |
[^] | 匹配除中括号的字符以外的任意一个字符。例如[^0-9]匹配任意一位非数字字符,[^a-z]表示任意一位非小写字母 |
\ | 转义符。用于取消特殊符号的含义 |
\{n\} | 表示其前面的字符恰好出现n次。例如[0-9]\{4\}匹配4为数字 |
\{n,\} | 表示其前面的字符出现不小于n次。例如[0-0]\{2,\}表示两位及以上的数字 |
\{n,m\} | 表示其前面的字符至少出现n次,最多出现m次。例如:[a-z]\{6,8\}匹配6-8位的小写字母 |
为更具体的说明正则表达式的匹配规则,编写了测试文件regex_test.txt,内容如下
This is a txt file about regex rule.
For test regex rule,i try to write this file.
Maybe there are some wrong words because of testing.
said
sold
saaaid
555nice
nic55e
以上述文件为基础,对正则表达式的规则作出以下测试
- "*"
- grep "a*" regex_test.txt
- 匹配所有内容,包括空白行
- grep "aa*" regex_test.txt
- 匹配至少一个a的行
[root@localhost tmp]# grep "a*" regex_test.txt
This is a txt file about regex rule.
For test regex rule,i try to write this file.
Maybe there are some wrong words because of testing.
said
sold
saaaid
555nice
nic55e
[root@localhost tmp]# grep "aa*" regex_test.txt
This is a txt file about regex rule.
Maybe there are some wrong words because of testing.
said
saaaid
- "."
- grep "s..d" regex_test.txt
- 匹配在s和d之间一定有两个字符的字符串
- grep "s.*d" regex_test.txt
- 匹配在s和d之间有任意字符的字符串
[root@localhost tmp]# grep "s..d" regex_test.txt
said
sold
[root@localhost tmp]# grep "s.*d" regex_test.txt
Maybe there are some wrong words because of testing.
said
sold
saaaid
- "^" "$" "\"
- grep "^M" regex_test.txt
- 匹配以大写M开头的行
- grep "\.$" regex_test.txt
- 匹配以.结尾的行
- grep -n "^$" regex_test.txt
- 匹配空白行并显示行号
[root@localhost tmp]# grep "^M" regex_test.txt
Maybe there are some wrong words because of testing.
[root@localhost tmp]# grep "\.$" regex_test.txt
This is a txt file about regex rule.
For test regex rule,i try to write this file.
Maybe there are some wrong words because of testing.
[root@localhost tmp]# grep -n "^$" regex_test.txt
2:
4:
6:
8:
10:
12:
14:
- "[]"
- grep "s[ao]id" regex_test.txt
- 匹配字母s和字母串id中,要不为a,要不为o
- grep "[0-9]" regex_test.txt
- 匹配任意一个数字
- grep "^[0-9]" regex_test.txt
- 匹配以数字开头的行
[root@localhost tmp]# grep "s[ao]id" regex_test.txt
said
[root@localhost tmp]# grep "[0-9]" regex_test.txt
555nice
nic55e
[root@localhost tmp]# grep "^[0-9]" regex_test.txt
555nice
- "[^]"
- grep "^[^a-z]" regex_test.txt
- 匹配不用小写字母开头的行
- grep "^[^a-zA-Z]" regex_test.txt
- 匹配不用字母开头的行
[root@localhost tmp]# grep "^[^a-z]" regex_test.txt
This is a txt file about regex rule.
For test regex rule,i try to write this file.
Maybe there are some wrong words because of testing.
555nice
[root@localhost tmp]# grep "^[^a-zA-Z]" regex_test.txt
555nice
- "\{n\}" "\{n,\}" "\{n,m\}"
- grep "[0-9]\{3\}" regex_test.txt
- 匹配包含3个连续数字的字符串
- grep "[0-9]\{2,\}" regex_test.txt
- 匹配最少连续两个数字的字符串1
- grep "sa\{1,3\}id" regex_test.txt
- 匹配s和id之间至少有一个a至多有3个a的字符串
[root@localhost tmp]# grep "[0-9]\{3\}" regex_test.txt
555nice
[root@localhost tmp]# grep "[0-9]\{2,\}" regex_test.txt
555nice
nic55e
[root@localhost tmp]# grep "sa\{1,3\}id" regex_test.txt
said
saaaid
本篇博客只包含Shell编程的部分正则表达式规则,其余例如"?","+"等在后续博客中更新