admin 管理员组文章数量: 887021
我已经设置了从起始网址获取下一页的规则,但它不起作用,它只抓取起始网址页面和该页面中的链接(使用parseLinks)。它不会转到规则中设置的下一页。在
有什么帮助吗?在from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy import log
from urlparse import urlparse
from urlparse import urljoin
from scrapy.http import Request
class MySpider(CrawlSpider):
name = 'testes2'
allowed_domains = ['example']
start_urls = [
'http://www.example/pesquisa/filtro/?tipo=0&local=0'
]
rules = (Rule(SgmlLinkExtractor(restrict_xpaths=('//a[@id="seguinte"]/@href')), follow=True),)
def parse(self, response):
sel = Selector(response)
urls = sel.xpath('//div[@id="btReserve"]/../@href').extract()
for url in urls:
url = urljoin(response.url, url)
self.log('URLS: %s' % url)
yield Request(url, callback = self.parseLinks)
def parseLinks(self, response):
sel = Selector(response)
titulo = sel.xpath('h1/text()').extract()
morada = sel.xpath('//div[@class="MORADA"]/text()').extract()
email = sel.xpath('//a[@class="sendMail"][1]/text()')[0].extract()
url = sel.xpath('//div[@class="contentContacto sendUrl"]/a/text()').extract()
telefone = sel.xpath('//div[@class="telefone"]/div[@class="contentContacto"]/text()').extract()
fax = sel.xpath('//div[@class="fax"]/div[@class="contentContacto"]/text()').extract()
descricao = sel.xpath('//div[@id="tbDescricao"]/p/text()').extract()
gps = sel.xpath('//td[@class="sendGps"]/@style').extract()
print titulo, email, morada
版权声明:本文标题:python跳转下一页_我怎么能跳转到下一页呢 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.freenas.com.cn/jishu/1726309804h934190.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论