python学习之re 17 sub(pattern, repl, string, count=0, flags=0)

re. sub ( patternreplstringcount=0flags=0 )

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, \n is converted to a single newline character, \r is converted to a carriage return, and so forth. Unknown escapes such as \& are left alone. Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern. For example:

翻译:返回由repl替换Pattern后的字符串,替换规则是无重复的替换。如果pattern没有匹配项,就会返回原串。repl可以是一个字符串也可以是一个方法。如果repl是一个字符串,除了下划线开头的字符都会被转义。也就是说,\n会被转换为新的一行,\r就会被转换为回车,其他的也是一样的。还有一些比如 \&被置左,\6会被替换为group 6得内容。

我们现在分析一下下面的一个案例。

替换规则  def 固定样式   \s+  多个分隔符  组1 ( 字母+[字母+下划线]*) \s* 分隔符 \( 左括号 \s*分隔符 \) 右括号 : 冒号

repl          static PyObject*\nnpy_\1(#此处表示元组1的内容)(void)\n{

匹配串      def myfunc():

>>>
>>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
...        r'static PyObject*\npy_\1(void)\n{',
...        'def myfunc():')
'static PyObject*\npy_myfunc(void)\n{'

If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string. For example:

翻译:如果repl是一个方法,那么每次发现不重叠的子串匹配PATTERN时,这个方法就会收到一个参数为match对象,并且返回将要替换的字符串。如下面所示。

这个案例是将一个或者两个短横线替换为一个短横线。

第二个是将单独的单词AND通过&号替换

>>>
>>> def dashrepl(matchobj):
...     if matchobj.group(0) == '-': return ' '
...     else: return '-'
>>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
'pro--gram files'
>>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE)
'Baked Beans & Spam'

The pattern may be a string or a pattern object.

The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer. If omitted or zero, all occurrences will be replaced. Empty matches for the pattern are replaced only when not adjacent to a previous empty match, so sub('x*', '-', 'abxd') returns '-a-b--d-'.

翻译:pattern可以是string类型或者patttern类型。

可选参数count是PATTERN将要匹配的最大次数。count必须是非负整数。如果没有传入参数或者是0,所有的匹配项都将替换。如果是空匹配且与前一个空匹配不相邻才会替换,否则不替换。比如sub('x*', '-', 'abxd')将返回'-a-b--d-'。 

具体代码

    def sub(self, s):
        for k, v in self.macros.items():
            s = s.replace(k, v)
        return s

分析:     0        匹配长度为0        加        -abxd

                1        匹配长度为0        不加    -abxd

                2        匹配长度为0        加        -a-bxd

                3        匹配长度为0        不加     -a-bxd

                4        匹配长度为1        加        -a-b-d

                5        匹配长度为0        加        -a-b--d

                6        匹配长度为0       不加     -a-b--d

                7        匹配长度为0       加         -a-b--d-

                8        大于str长度        break

In string-type repl arguments, in addition to the character escapes and backreferences described above,\g<name> will use the substring matched by the group named name, as defined by the (?P<name>...) syntax. \g<number> uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0\20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'. The backreference \g<0> substitutes in the entire substring matched by the RE.

翻译:在字符类型的参数repl的情况下,除了转义字符和前面提到的逆向引用。 \g<name> (表示的就是一个组的名字)就会使用之前name组匹配的子串,该子串是前面通过语法(?P<name>...) 所定义的。\g<number> 可以通过组序号进行引用。因此\g<2> 也就匹配\2 ,但是在匹配的时候也会出现模拟两可的情况,如\g<2>0.\20 将会被理解为参照的是第20组,而不是第二组然后后面跟着字符'0', 逆向引用\g<0> 这种方式在整个RE表达式中进行引用。

Changed in version 3.1: Added the optional flags argument.

Changed in version 3.5: Unmatched groups are replaced with an empty string.

Changed in version 3.6: Unknown escapes in pattern consisting of '\' and an ASCII letter now are errors.

Changed in version 3.7: Unknown escapes in repl consisting of '\' and an ASCII letter now are errors.

Empty matches for the pattern are replaced when adjacent to a previous non-empty match.




猜你喜欢

转载自blog.csdn.net/rubikchen/article/details/80497949