目录
问题描述:
常见英文句子是以空格为分隔符拆分单词,标点符号和单词之间,并不会有空格,特别的,如果是以' 连接的两个单词,会被误作一个单词。
本程序的目的是实现将英文句子,严格以单词、标点符号等为单位,并且以空格为分隔符。
问题解决:
给定句子如下:
sentence = "The Sony A7 III's write speed, is faster than my previous camera (the Canon EOS've Rebel T6) allowing me."
import re
sentence = "The Sony A7 III's write speed, is !faster than my previous camera (the Canon EOS've Rebel T6) allowing me."
# 定义要添加空格的特殊字符
special_chars = [',', '.', '\'', '’', '“', '”', '(', ')', '[', ']', '{', '}', ':', ';', '?', '!', '-', '--']
# 在特殊字符前添加空格
for char in special_chars:
if char == '(': #特别的,左括号是在后面加空格
sentence = sentence = re.sub(rf'([{char}])', r'\1 ', sentence)
else:
sentence = re.sub(rf'([{char}])', r' \1', sentence)
print(sentence)
运行结果: