Python爬虫爬取CSDND首页的所有的文章 - 代码天地

Python爬虫爬取CSDND首页的所有的文章

其他 2018-12-09 20:44:49 阅读次数: 0

版权声明：未经同意窃取和转载我的内容，如果涉及到权益问题，后果自负！ https://blog.csdn.net/weixin_41605937/article/details/84332233

# -*- encoding: utf-8 -*-

import re
import urllib.request

def function():
    """Python爬虫爬取CSDND首页的所有的文章"""
    html="https://blog.csdn.net/nav/engineering"
    #模拟浏览器
    headers=("User-Agent","Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/63.0")#这里用的是Fix浏览器进行爬取的一个报头如果是采用其他的这个报头就要进行更换
    opener=urllib.request.build_opener()
    opener.addheader=[headers]
    urllib.request.install_opener(opener)

    data = urllib.request.urlopen(html).read()
    data = data.decode("utf-8", "ignore")
    print(data)
    pattern='<h3 class="company_name"><a href="(.*?)"'
    mydata=re.compile(pattern).findall(data)
    print(mydata)
    for i in range(0,len(mydata)):
        file="E:/数据挖掘练习/网页/"+str(i)+".html"
        urllib.request.urlretrieve(mydata[i],filename=file)
        print("第%d次爬取成功"%i)

    print("CSDN爬虫结束")

if __name__ == '__main__':
    function()

猜你喜欢

转载自blog.csdn.net/weixin_41605937/article/details/84332233

Python爬虫爬取CSDND首页的所有的文章

【Python3 爬虫】爬取博客园首页所有文章

pyhton爬虫：三种爬取csdn首页所有文章的方法

pyhton爬虫：三种爬取csdn首页所有文章的方法

pyhton爬虫：三种爬取csdn首页所有文章的方法

Python爬虫—爬取某网站上面所有的世界港口信息数据

python -又一次爬虫练习（爬取LOL所有的英雄头像）

爬虫实战——Scrapy爬取伯乐在线所有文章

python爬虫---CrawlSpider实现的全站数据的爬取,分布式,增量式,所有的反爬机制

【python爬虫系列】12.实战一爬取北京地区所有的房租信息

爬取某网站所有的乌云漏洞公共文章，并保存为pdf文件

一个大胆的想法，爬取简书所有的文章信息

python爬虫练习--爬取所有微博

Python爬虫爬取LOL所有英雄皮肤

Python爬虫爬取网页上的所有图片

第一个Python爬虫，爬取某个新浪博客所有文章并保存为doc文档

Python 分布式爬虫框架 Scrapy 4-6 编写spider爬取所有文章

【python爬虫自学笔记】-----爬取简书网站首页文章标题与链接

Python | 用Python爬取LOL所有的英雄信息以及英雄皮肤

网络爬虫——项目实战（爬取糗事百科所有文章）

Python数据挖掘学习笔记（9）爬取新浪新闻首页的所有新闻

Python 爬虫爬取微信文章

[python爬虫]爬取英雄联盟所有英雄数据并下载所有英雄皮肤

Python BeautifulSoup 爬取笔趣阁所有的小说

Python番外篇：爬取CSDN博文中所有的代码

python爬取华为商城所有的手机参数

网易云爬取首页歌单里的所有歌曲

python爬取博客圆首页文章链接+标题

【python爬虫-爬微博】爬取王思聪所有微博数据

golang多任务爬虫：爬取爆照吧每个帖子第一页所有的照片

今日推荐

周排行

深度学习------Lingvo框架下的加速通道GPipe

webjars管理静态资源

C专家编程_2.2

mysql 源码安装

json文件操作

123231432

注解的实现

Spring MVC 控制器

《人月神话》读后感二

C#使用HttpWebRequest和HttpWebResponse上传文件示例

每日归档

更多

2024-09-08(0)

2024-09-07(0)

2024-09-06(0)

2024-09-05(0)

2024-09-04(0)

2024-09-03(0)

2024-09-02(0)

2024-09-01(0)

2024-08-31(0)

2024-08-30(0)