问题产生
在进行爬虫抓取时,需要安装第三方模块库BeautifulSoup。
探索过程
尝试使用pip install BeautifulSoup
问题一:
SyntaxError: Missing parentheses in call to ‘print’. Did you mean print(int “Unit tests have failed!”)?
方法一:
进入官网下载安装包https://files.pythonhosted.org/packages/1e/ee/295988deca1a5a7accd783d0dfe14524867e31abb05b6c0eeceee49c759d/BeautifulSoup-3.2.1.tar.gz
解压后再次键入:python install setup.py
PS使用wheel安装包时:pip install **.whl
但依旧发现安装失败。
此时在源代码文件setup.py:阅读后发现其print未使用函数
from distutils.core import setup
import unittest
import warnings
warnings.filterwarnings("ignore", "Unknown distribution option")
import sys
# patch distutils if it can't cope with the "classifiers" keyword
if sys.version < '2.2.3':
from distutils.dist import DistributionMetadata
DistributionMetadata.classifiers = None
DistributionMetadata.download_url = None
from BeautifulSoup import __version__
#Make sure all the tests complete.
import BeautifulSoupTests
loader = unittest.TestLoader()
result = unittest.TestResult()
suite = loader.loadTestsFromModule(BeautifulSoupTests)
suite.run(result)
if not result.wasSuccessful():
print "Unit tests have failed!"
for l in result.errors, result.failures:
for case, error in l:
print "-" * 80
desc = case.shortDescription()
if desc:
print desc
print error
print '''If you see an error like: "'ascii' codec can't encode character...", see\nthe Beautiful Soup documentation:\n http://www.crummy.com/software/BeautifulSoup/documentation.html#Why%20can't%20Beautiful%20Soup%20print%20out%20the%20non-ASCII%20characters%20I%20gave%20it?'''
print "This might or might not be a problem depending on what you plan to do with\nBeautiful Soup."
if sys.argv[1] == 'sdist':
print
print "I'm not going to make a source distribution since the tests don't pass."
sys.exit(1)
setup(name="BeautifulSoup",
version=__version__,
py_modules=['BeautifulSoup', 'BeautifulSoupTests'],
description="HTML/XML parser for quick-turnaround applications like screen-scraping.",
author="Leonard Richardson",
author_email = "[email protected]",
long_description="""Beautiful Soup parses arbitrarily invalid SGML and provides a variety of methods and Pythonic idioms for iterating and searching the parse tree.""",
classifiers=["Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"License :: OSI Approved :: Python Software Foundation License",
"Programming Language :: Python",
"Topic :: Text Processing :: Markup :: HTML",
"Topic :: Text Processing :: Markup :: XML",
"Topic :: Text Processing :: Markup :: SGML",
"Topic :: Software Development :: Libraries :: Python Modules",
],
url="http://www.crummy.com/software/BeautifulSoup/",
license="BSD",
download_url="http://www.crummy.com/software/BeautifulSoup/download/"
)
# Send announce to:
# [email protected]
# [email protected]
解决方法
主要原因是Python从2.0版本到3.0版本将其函数进行了大改。
print成为print()函数
亲测:目前3.7版本可以使用BS4这一库函数。
另外
要注意到BS4库在IDLE中import时无法使用BeautifulSoup4这一库名,暂时未知其问题出在哪。
解决
Beautiful Soup 3 目前已经停止开发,推荐在现在的项目中使用Beautiful Soup4,不过它已经被移植到BS4了,也就是说导入时我们需要 import bs4 。所以这里我们用的版本是 Beautiful Soup
4.3.2 (简称BS4),另外据说 BS4 对 Python3 的支持不够好,不过我用的是 Python2.7.7,如果有小伙伴用的是 Python3 版本,可以考虑下载 BS3 版本。