正则表达式-py爬虫篇

re.match

      re.match试着从字符串的起始位置匹配一个模式,
      若不能从起始位置匹配成功,match()就返回none.
  • 常见匹配
import re
content = 'Hello 123 4567 World_This Demo'
res = re.match('^Hello\s\d\d\d\s\d{4}\s\w{10}\sDemo$', content)
print(res)
print(res.group())
print(res.span())
  • 匹配目标
import re
content = 'Hello 123 4567 World_This Demo'
res = re.match('^Hello\s(\d+)\s(\d{4})\s\w{10}\sDemo$', content)
print(res)
print(res.group(1))
  • 模糊匹配
import re
content = 'Hello 123 4567 World_This is a Regex Demo'
res = re.macth('^
  • 贪心匹配
import re
content = 'Hello 123 4567 World_This is a Regex Demo'
res = re.match('^He.*(\d+).*Demo$', content)
print(res)
  • 非贪心匹配
import re
content = 'Hello 123 4567 World_This is a Regex Demo'
res = re.match('^He.*?(\d+).*Demo$', content)
print(res)
  • 匹配模式(re.S) 表示换行的可行
import re
content = '''Hello 123 4567 World_This
is a Regex Demo
'''
res = re.match('^He.*?(\d+).*Demo$', content, res.S)
print(res)
  • 转义
import re
content = 'price is $5.00'
result = re.match('price is \$5\.00', content)
print(result)

re.search

 扫描整个字符串并且返回第一个扫描到的结果
import re
content = 'Extra stings Hello 1234567 World_This is a Regex Demo Ex
tra stings'
result = re.search('Hello.*?(\d+).*?Demo', content)
print(result)
print(result.group(1))

re.sub使用

           替换字符串每一个匹配的子串,然后返回替换后的字符串。
>>> content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'
>>> content = re.sub('\d+','',content)
>>> print content
Extra stings Hello  World_This is a Regex Demo Extra stings
>>> content = 'Extra stings Hello 1234567 World_This is a Regex Demo Extra stings'
>>> content = re.sub('(\d+)',r'\1 445566',content)
>>> print content
Extra stings Hello 1234567 445566 World_This is a Regex Demo Extra stings

猜你喜欢

转载自blog.csdn.net/ichglauben/article/details/82460876