按照网上教程尝试编写scrapy spider程序,在运行scrapy crawl sean执行时发现一下错误:

E:\\工作\\python\\scrapy\\lagou\\lagou>scrapy crawl sean
2018-12-21 12:04:51 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: lagou)
2018-12-21 12:04:51 [scrapy.utils.log] INFO: Versions: l  4.2.5.0, lib 2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j  20 Nov 2018), cryptography 2.4.2, Platform Windows-10-10.0.10240-SP0
Traceback (most recent call last):
  File \"e:\\python\\python37\\lib\\site-packages\\scrapy\\spiderloader.py\", line 69, in load
    return self._spiders[spider_name]
KeyError: \'sean\'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File \"e:\\python\\python37\\lib\\runpy.py\", line 193, in _run_module_as_main
    \"__main__\", mod_spec)
  File \"e:\\python\\python37\\lib\\runpy.py\", line 85, in _run_code
    exec(code, run_globals)
  File \"E:\\Python\\Python37\\ s\\scrapy.exe\\__main__.py\", line 9, in <module>
  File \"e:\\python\\python37\\lib\\site-packages\\scrapy\\cmdline.py\", line 150, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File \"e:\\python\\python37\\lib\\site-packages\\scrapy\\cmdline.py\", line 90, in _run_print_help
    func(*a, **kw)
  File \"e:\\python\\python37\\lib\\site-packages\\scrapy\\cmdline.py\", line 157, in _run_command
    cmd.run(args, opts)
  File \"e:\\python\\python37\\lib\\site-packages\\scrapy\\commands\\crawl.py\", line 57, in run
    self.crawler_process.crawl(spname, **opts.spargs)
  File \"e:\\python\\python37\\lib\\site-packages\\scrapy\\crawler.py\", line 170, in crawl
    crawler = self.create_crawler(crawler_or_spidercls)
  File \"e:\\python\\python37\\lib\\site-packages\\scrapy\\crawler.py\", line 198, in create_crawler
    return self._create_crawler(crawler_or_spidercls)
  File \"e:\\python\\python37\\lib\\site-packages\\scrapy\\crawler.py\", line 202, in _create_crawler
    spidercls = self.spider_loader.load(spidercls)
  File \"e:\\python\\python37\\lib\\site-packages\\scrapy\\spiderloader.py\", line 71, in load
    raise KeyError(\"Spider not found: {}\".format(spider_name))
KeyError: \'Spider not found: sean\'

如下面代码,应该将scrapy crawl sean将其修改为scrapy crawl spider_lagou,与爬虫文件定义的名字一样,修改后即可运行通过。

class SpiderLagouSpider(scrapy.Spider):
    name = \'spider_lagou\'
    allowed_domains = [\'lagou.com\']
    start_urls = [\'http://www.lagou.com/\'
收藏 打印