”;
Description
Item objects are the regular dicts of Python. We can use the following syntax to access the attributes of the class −
>>> item = DmozItem() >>> item[''title''] = ''sample title'' >>> item[''title''] ''sample title''
Add the above code to the following example −
import scrapy from tutorial.items import DmozItem class MyprojectSpider(scrapy.Spider): name = "project" allowed_domains = ["dmoz.org"] start_urls = [ "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/", "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/" ] def parse(self, response): for sel in response.xpath(''//ul/li''): item = DmozItem() item[''title''] = sel.xpath(''a/text()'').extract() item[''link''] = sel.xpath(''a/@href'').extract() item[''desc''] = sel.xpath(''text()'').extract() yield item
The output of the above spider will be −
[scrapy] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> {''desc'': [u'' - By David Mertz; Addison Wesley. Book in progress, full text, ASCII format. Asks for feedback. [author website, Gnosis Software, Inc.n], ''link'': [u''http://gnosis.cx/TPiP/''], ''title'': [u''Text Processing in Python'']} [scrapy] DEBUG: Scraped from <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> {''desc'': [u'' - By Sean McGrath; Prentice Hall PTR, 2000, ISBN 0130211192, has CD-ROM. Methods to build XML applications fast, Python tutorial, DOM and SAX, new Pyxie open source XML processing library. [Prentice Hall PTR]n''], ''link'': [u''http://www.informit.com/store/product.aspx?isbn=0130211192''], ''title'': [u''XML Processing with Python'']}
Advertisements
”;