re.
sub
(
pattern,
repl,
string,
count=0,
flags=0
)
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, \n
is converted to a single newline character, \r
is converted to a carriage return, and so forth. Unknown escapes such as \&
are left alone. Backreferences, such as \6
, are replaced with the substring matched by group 6 in the pattern. For example:
翻译:返回由repl替换Pattern后的字符串,替换规则是无重复的替换。如果pattern没有匹配项,就会返回原串。repl可以是一个字符串也可以是一个方法。如果repl是一个字符串,除了下划线开头的字符都会被转义。也就是说,\n会被转换为新的一行,\r就会被转换为回车,其他的也是一样的。还有一些比如 \&被置左,\6会被替换为group 6得内容。
我们现在分析一下下面的一个案例。
替换规则 def 固定样式 \s+ 多个分隔符 组1 ( 字母+[字母+下划线]*) \s* 分隔符 \( 左括号 \s*分隔符 \) 右括号 : 冒号
repl static PyObject*\nnpy_\1(#此处表示元组1的内容)(void)\n{
匹配串 def myfunc():
>>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
... r'static PyObject*\npy_\1(void)\n{',
... 'def myfunc():')
'static PyObject*\npy_myfunc(void)\n{'
If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string. For example:
翻译:如果repl是一个方法,那么每次发现不重叠的子串匹配PATTERN时,这个方法就会收到一个参数为match对象,并且返回将要替换的字符串。如下面所示。
这个案例是将一个或者两个短横线替换为一个短横线。
第二个是将单独的单词AND通过&号替换
>>> def dashrepl(matchobj):
... if matchobj.group(0) == '-': return ' '
... else: return '-'
>>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
'pro--gram files'
>>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE)
'Baked Beans & Spam'
The pattern may be a string or a pattern object.
The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer. If omitted or zero, all occurrences will be replaced. Empty matches for the pattern are replaced only when not adjacent to a previous empty match, so sub('x*', '-', 'abxd')
returns '-a-b--d-'
.
翻译:pattern可以是string类型或者patttern类型。
可选参数count是PATTERN将要匹配的最大次数。count必须是非负整数。如果没有传入参数或者是0,所有的匹配项都将替换。如果是空匹配且与前一个空匹配不相邻才会替换,否则不替换。比如sub('x*', '-', 'abxd')将返回'-a-b--d-'
。
具体代码
def sub(self, s): for k, v in self.macros.items(): s = s.replace(k, v) return s
分析: 0 匹配长度为0 加 -abxd
1 匹配长度为0 不加 -abxd
2 匹配长度为0 加 -a-bxd
3 匹配长度为0 不加 -a-bxd
4 匹配长度为1 加 -a-b-d
5 匹配长度为0 加 -a-b--d
6 匹配长度为0 不加 -a-b--d
7 匹配长度为0 加 -a-b--d-
8 大于str长度 break
In string-type repl arguments, in addition to the character escapes and backreferences described above,\g<name>
will use the substring matched by the group named name
, as defined by the (?P<name>...)
syntax. \g<number>
uses the corresponding group number; \g<2>
is therefore equivalent to \2
, but isn’t ambiguous in a replacement such as \g<2>0
. \20
would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'
. The backreference \g<0>
substitutes in the entire substring matched by the RE.
翻译:在字符类型的参数repl的情况下,除了转义字符和前面提到的逆向引用。 \g<name>
(表示的就是一个组的名字)就会使用之前name组匹配的子串,该子串是前面通过语法(?P<name>...) 所定义的。\g<number> 可以通过组序号进行引用。因此\g<2> 也就匹配\2 ,但是在匹配的时候也会出现模拟两可的情况,如\g<2>0
.\20 将会被理解为参照的是第20组,而不是第二组然后后面跟着字符'0', 逆向引用\g<0> 这种方式在整个RE表达式中进行引用。
Changed in version 3.1: Added the optional flags argument.
Changed in version 3.5: Unmatched groups are replaced with an empty string.
Changed in version 3.6: Unknown escapes in pattern consisting of '\'
and an ASCII letter now are errors.
Changed in version 3.7: Unknown escapes in repl consisting of '\'
and an ASCII letter now are errors.
Empty matches for the pattern are replaced when adjacent to a previous non-empty match.