参考文档
scrapyd
安装
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
| pip install scrapyd
vi /etc/scrapyd/scrapyd.conf [scrapyd] eggs_dir = /data/scrapyd/eggs logs_dir = /data/scrapyd/logs items_dir = /data/scrapyd/items jobs_to_keep = 100 dbs_dir = /data/scrapyd/dbs max_proc = 0 max_proc_per_cpu = 10 finished_to_keep = 100 poll_interval = 5.0 bind_address = 0.0.0.0 http_port = 6800 debug = off runner = scrapyd.runner application = scrapyd.app.application launcher = scrapyd.launcher.Launcher webroot = scrapyd.website.Root
[services] schedule.json = scrapyd.webservice.Schedule cancel.json = scrapyd.webservice.Cancel addversion.json = scrapyd.webservice.AddVersion listprojects.json = scrapyd.webservice.ListProjects listversions.json = scrapyd.webservice.ListVersions listspiders.json = scrapyd.webservice.ListSpiders delproject.json = scrapyd.webservice.DeleteProject delversion.json = scrapyd.webservice.DeleteVersion listjobs.json = scrapyd.webservice.ListJobs daemonstatus.json = scrapyd.webservice.DaemonStatus
nohup scrapyd>scrapyd.log 2>&1 &
open http://10.211.55.101:6800
pip install -r requirements.txt
|
HTTP API
1 2 3 4 5 6 7 8 9 10 11 12
| curl http://10.211.55.101:6800/daemonstatus.json curl http://10.211.55.101:6800/addversion.json -F project=hydrabot -F version=1.0.0 -F egg=@hydrabot.egg curl http://10.211.55.101:6800/schedule.json -d project=hydrabot -d spider=teacher -d task_id=1 -d entry_id=3070 curl http://10.211.55.101:6800/cancel.json -d project=hydrabot -d job=6487ec79947edab326d6db28a2d86S11e8247444
curl http://10.211.55.101:6800/listprojects.json curl http://10.211.55.101:6800/listversions.json?project=hydrabot curl http://10.211.55.101:6800/listspiders.json?project=hydrabot curl http://10.211.55.101:6800/listjobs.json?project=hydrabot
curl http://10.211.55.101:6800/delversion.json -d project=hydrabot -d version=1.0.0 curl http://10.211.55.101:6800/delproject.json -d project=hydrabot
|
scrapyd-client
本机安装
pip3 install scrapyd-deploy
pip3 install scrapyd-client
scrapyd-deploy -h
配置scrapy.cfg
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| [settings] default = settings.default
[deploy:node5] url = http://10.194.99.5:6800/ project = hydrabot
[deploy:node6] url = http://10.194.99.6:6800/ project = hydrabot
[deploy:node7] url = http://10.194.99.7:6800/ project = hydrabot
[deploy:node8] url = http://10.194.99.8:6800/ project = hydrabot
|
CLI
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| scrapyd-deploy node8
scrapyd-deploy -a
scrapyd-client -t http://10.194.99.8:6800 projects
scrapyd-client -t http://10.194.99.8:6800 schedule -p hydrabot \istic
scrapyd-client -t http://10.194.99.8:6800 spiders -p hydrabot
|