Scrapy是用python編寫的爬蟲程序。
Scrapyd是一個部署與運行scrapy爬蟲的應用,提供JSON API的調用方式來部署與控制爬蟲 。
本文驗證在fedora與centos是安裝成功。
運行需要如下程序:
l python2.7
l pip setuptools
l lxml
l OpenSSL(pyopenssl)
其中scrapyd需占用6800端口
安裝依賴:
yum -y install readline-devel
yum -y install openssl-devel
yum -y install sqlite sqlite-devel
安裝python2.7
wget https://www.python.org/ftp/python/2.7.9/Python-2.7.9.tgz
tar zxvf Python-2.7.9.tgz
cd Python-2.7.9
./configure --with-zlib-dir=/usr/local/lib
make && make install
安裝 pip:
wget https://bootstrap.pypa.io/get-pip.py
python2.7 get-pip.py
順便也會安裝 setuptools
安裝lxml及依賴
yum install libffi-devel
yum install libxml2
yum install libxslt
yum install libxml2-devel
yum install libxslt-devel
pip install lxml
安裝pyOpenSSL
https://pypi.python.org/packages/source/p/pyOpenSSL/pyOpenSSL-0.15.1.tar.gz#md5=f447644afcbd5f0a1f47350fec63a4c6 --no-check-certificate
tar zxvf pyOpenSSL-0.15.1.tar
cd pyOpenSSL-0.15.1
python2.7 setup.py install
安裝Twisted
yum -y install bzip2-devel
wget https://pypi.python.org/packages/source/T/Twisted/Twisted-14.0.0.tar.bz2
tar xf Twisted-14.0.0.tar.bz2
cd Twisted-14.0.0
python2.7 setup.py install
安裝scrapy:
pip install Scrapy
測試安裝:
scrappy startPRoject testProject, 當前目錄下若能生成scrapy格式的路徑,說明安裝成功
安裝scrapyd
pip install scrapyd
安裝 scrapyd-client
wget https://github.com/scrapy/scrapyd-client/archive/master.zip
unzip master.zip
cd scrapyd-client-master
python2.7 setup.py install
scrapyd-client主要是便于在服務端用腳本來部署scrapy程序:scrapyd-deploy.
新聞熱點
疑難解答