python2爬虫抓取数据的时候,保存到数据库报错:
OperationalError: (1366, "Incorrect string value)
发现字符中存在表情符号
使用re正则库将表情符号过滤掉即可, 去除字符串中emoji符号
# 过滤表情符号
def filter(self, text):
try:
text = unicode(text, "utf-8")
except TypeError as e:
pass
try:
highpoints = re.compile(u'[\U00010000-\U0010ffff]')
except re.error:
highpoints = re.compile(u'[\uD800-\uDBFF][\uDC00-\uDFFF]')
return highpoints.sub(u'', text)