python学习之re 17 sub(pattern, repl, string, count=0, flags=0)

re. sub ( pattern, repl, string, count=0, flags=0 )

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, \n is converted to a single newline character, \r is converted to a carriage return, and so forth. Unknown escapes such as \& are left alone. Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern. For example:

翻译：返回由repl替换Pattern后的字符串，替换规则是无重复的替换。如果pattern没有匹配项，就会返回原串。repl可以是一个字符串也可以是一个方法。如果repl是一个字符串，除了下划线开头的字符都会被转义。也就是说，\n会被转换为新的一行，\r就会被转换为回车,其他的也是一样的。还有一些比如 \&被置左，\6会被替换为group 6得内容。

我们现在分析一下下面的一个案例。

替换规则 def 固定样式 \s+ 多个分隔符组1 ( 字母+[字母+下划线]*) \s* 分隔符 \( 左括号 \s*分隔符 \) 右括号 : 冒号

repl static PyObject*\nnpy_\1(#此处表示元组1的内容)(void)\n{

匹配串 def myfunc():

 
   >>> 
   >>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
...        r'static PyObject*\npy_\1(void)\n{',
...        'def myfunc():')
'static PyObject*\npy_myfunc(void)\n{'
 
  

If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string. For example:

翻译：如果repl是一个方法，那么每次发现不重叠的子串匹配PATTERN时，这个方法就会收到一个参数为match对象，并且返回将要替换的字符串。如下面所示。

这个案例是将一个或者两个短横线替换为一个短横线。

第二个是将单独的单词AND通过&号替换

 
   >>> 
   >>> def dashrepl(matchobj):
...     if matchobj.group(0) == '-': return ' '
...     else: return '-'
>>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
'pro--gram files'
>>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE)
'Baked Beans & Spam'
 
  

The pattern may be a string or a pattern object.

The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer. If omitted or zero, all occurrences will be replaced. Empty matches for the pattern are replaced only when not adjacent to a previous empty match, so sub('x*', '-', 'abxd') returns '-a-b--d-'.

翻译：pattern可以是string类型或者patttern类型。

可选参数count是PATTERN将要匹配的最大次数。count必须是非负整数。如果没有传入参数或者是0，所有的匹配项都将替换。如果是空匹配且与前一个空匹配不相邻才会替换，否则不替换。比如sub('x*', '-', 'abxd')将返回'-a-b--d-'。

具体代码

    def sub(self, s):
        for k, v in self.macros.items():
            s = s.replace(k, v)
        return s

分析： 0 匹配长度为0 加 -abxd

1 匹配长度为0 不加 -abxd

2 匹配长度为0 加 -a-bxd

3 匹配长度为0 不加 -a-bxd

4 匹配长度为1 加 -a-b-d

5 匹配长度为0 加 -a-b--d

6 匹配长度为0 不加 -a-b--d

7 匹配长度为0 加 -a-b--d-

8 大于str长度 break

In string-type repl arguments, in addition to the character escapes and backreferences described above,\g<name> will use the substring matched by the group named name, as defined by the (?P<name>...) syntax. \g<number> uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'. The backreference \g<0> substitutes in the entire substring matched by the RE.

翻译：在字符类型的参数repl的情况下，除了转义字符和前面提到的逆向引用。 \g<name> （表示的就是一个组的名字）就会使用之前name组匹配的子串，该子串是前面通过语法(?P<name>...) 所定义的。\g<number> 可以通过组序号进行引用。因此\g<2> 也就匹配\2 ,但是在匹配的时候也会出现模拟两可的情况，如\g<2>0.\20 将会被理解为参照的是第20组，而不是第二组然后后面跟着字符'0'，逆向引用\g<0> 这种方式在整个RE表达式中进行引用。

Changed in version 3.1: Added the optional flags argument.

Changed in version 3.5: Unmatched groups are replaced with an empty string.

Changed in version 3.6: Unknown escapes in pattern consisting of '\' and an ASCII letter now are errors.

Changed in version 3.7: Unknown escapes in repl consisting of '\' and an ASCII letter now are errors.

Empty matches for the pattern are replaced when adjacent to a previous non-empty match.

python学习之re 17 sub(pattern, repl, string, count=0, flags=0)

猜你喜欢