1、基本使用
(1)創建Scrapy爬蟲工程
選取目錄(D:\pySpider\),執行命令
html
生成工程目錄
python
(2)在工程中產生一個Scrapy爬蟲
進入工程目錄(D:\pySpider\python123demo),執行命令
web
生成的demo.py文件bash
# -*- coding: utf-8 -*-
import scrapy
class DemoSpider(scrapy.Spider):
name = "demo"
allowed_domains = ["python123.io"]
start_urls = ['http://python123.io/']
def parse(self, response):
pass
(3)配置產生的spider爬蟲
a.初始URL地址
b.獲取頁面後的解析方式dom
# -*- coding: utf-8 -*-
import scrapy
class DemoSpider(scrapy.Spider):
name = "demo"
allowed_domains = ["python123.io"]
start_urls = ['http://python123.io/ws/demo.html']
def parse(self, response):
fname = response.url.split('/')[-1]
with open(fname, 'wb') as f:
f.write(response.body)
self.log('Saved file %s.'% fname)
(4)運行爬蟲,獲取網頁
執行後,在文件夾中找到頁面存儲在demo.html中scrapy
完善demo.py
將ide
start_urls = ['http://python123.io/ws/demo.html']
擴展爲svg
def start_requests(self):
urls = [
'http://python123.io/ws/demo.html'
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
2、股票數據Scrapy爬蟲實例
1.創建工程和Spider模板
url
2.編寫Spider
配置stocks.py文件spa
3.編寫Pipelines
配置pipelines.py文件
配置ITEM_PIPELINES文件
執行程序