将python字符串/unicode对象传递给sqli

1）用scrapy==>；2）用lxml==>；提取主文本内容3）将文本传递到SQLite数据库中。在

前两个步骤很简单，使用以下代码：

class OpEdSpider(BaseSpider):
  name = "opeds"
  allowed_domains = ["scrapy.org"]
  start_urls = ["http://doc.scrapy.org/en/latest/intro/tutorial.html"]

  def parse(self, response):
    data = response.body
    into_lxml = html.fromstring(data)
    raw_content = unicode(into_lxml.text_content())
    print raw_content

第一部分使用Python的scrapy库来抓取站点；parse函数通过lxml的text_content()属性提取相当干净的文本（我对精确的html/xml结构不感兴趣；这个函数给我的是足够干净的文本，可以用于以后的分析）。打印raw_content无论是否使用unicode()都会显示我想要的内容，并按我想要的方式格式化。type(raw_content)如预期，分别是lxml.etree._*或{}。在

当我试图将raw_content添加到一个SQLite数据库中，用这个函数替换print函数时，事情就会分崩离析：

^{pr2}$

以前很好地清理了raw_content的输出现在看起来很糟糕（这里有一个小示例）：

\n\'Example title\'\n\n\nSpiders are expected to return their scraped data inside\nItem objects.

我在这里迷路了。raw_content是一个unicode文本对象；SQLite列（webcontent）应该接受unicode。然而在某处，raw_content被编码/解码成上面的混乱。在

我已经研究了这个问题，我认为我理解问题所在，但不知道解决方法（但如果我错了，请纠正我）。raw_content作为一个元组传递给sqlite*，这可能会将raw_content变量中的非结构化文本分解为元组的单独元素（可能是行？），然后在数据库中以\n和其他字符串分隔。我怎么才能避免呢？是否可以将raw_content按原样传递到SQLite；也就是说，在将打印raw_content传递到数据库之前，精确地将raw_content显示的内容传入数据库？在

为这个冗长的问题道歉。我试图在简洁性和足够的细节之间取得平衡，以免其他人复制我尝试过的失败解决方案。在

[*如果我没有将raw_content作为元组传递（如果我删除尾随的,），则会出现以下错误：

sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 1, and there are 25741 supplied.

]

尝试print row[0]

我觉得其实很好。。。row是一个列表，因此您看到的是字符串的repr，而不是字符串本身

>>> x = ["hello\nworld"] #essentially equivelent to your row variable
>>> print x
['hello\nworld'] #this is likely simillar to what you are seeing
>>> print x[0] #when you want to see this
hello
world

我想。。。（例如，我不认为它编码/解码任何东西，它只是保存原始字节。。。但是select语句返回一个列表。。。打印列表打印列表中项目的代表，而不是打印实际列表）

将python字符串/unicode对象传递给sqli

猜你喜欢