正则表达式目前在流行的语言中均以支持,通过正则表达式可以方便的对文本进行搜索和替换操作
1、通过回溯应用来实现前后匹配一致:
//javascript实现
var str ="<div> information"
+"<h1>this is h1 </h1>"
+"information <h2>this is h2</h2>"
+"informationinformation <h3>this is h3</h4>"
+"information </div>"
var reg = /<[hH]([1-6])>.*?<\/[hH]\1>/g;
console.log(str.match(reg));
//java的实现
String string = "<div> information" + "<h1>this is h1 </h1>"
+ "information <h2>this is h2</h2>"
+ "informationinformation <h3>this is h3</h4>"
+ "information </div>";
Pattern p = Pattern.compile("<[hH]([1-6])>.*?</[hH]\\1>");
Matcher m = p.matcher(string);
while (m.find()) {
System.out.println(m.group());
//输出结果
<h1>this is h1 </h1>
<h2>this is h2</h2>
这样匹配只会匹配到h1标签和h2表签,由于h3标签的结束标签是h4故不进行匹配,在javascript中使用\来表示回溯引用,用$进行替换操作。回溯应用匹配通常从1开始(\1、\2等等),在众多的实现里,第0个匹配(\0)可以用来代表整个正则表达式。
2、回溯引用在替换中的应用,例如将原始文本文件中的邮件地址转换为可点击的链接:
//javascript的实现
var email = "hello, [email protected] is my email address";
var reg = /(\w+[\w\.]*@[\w\.]+\.\w+)/;
console.log(email.replace(reg,"<a href='mailto:$1'>$1</a>"));
//输出的结果为:hello, <a href='mailto:[email protected]'>[email protected]</a> is my email address