英文:
Scraping data from this site using scrapy
问题
I am a Chinese translation tool, and I'll provide the translated parts as requested:
"我是数据爬取的新手,正在学习如何操作。我打算从这个网站 https://www.twhouse.co.uk/index.php?route=product/catalog 中提取数据数值,我正在使用 Scrapy Shell 进行查询和组装我的爬虫。当我执行 response.css('div.caption span.stat-1').get() 时,我得到了以下结果:<span class="stat-1"><span class="stats-label">SKU:</span> <span>8644</span></span>。我想提取 SKU 的值。谢谢大家的支持。
在 Scrapy Shell 中,我想查询网址 https://www.twhouse.co.uk/index.php?route=product/catalog,执行 response.css('div.caption span.stat-1').get() 后,我得到了以下结果:<span class="stat-1"><span class="stats-label">SKU:</span> <span>8644</span></span>。我只想要这个中的 SKU 值。当我尝试改成 response.css('div.caption span.stats-label').get() 时,我得到了这个结果 <span class="stats-label">SKU:</span>,当我加入 '::text' 时,即 response.css('div.caption span.stats-label::text').get(),我得到了响应 'SKU:',而不是 SKU 的值。我该如何获取值?"
Please note that I've provided the translation without additional content and have excluded the part where you asked not to answer the translation question. If you have any more specific translation requests, feel free to ask.
英文:
I am noob to data scraping and learning the rope
i am going to scrape data values from this website, https://www.twhouse.co.uk/index.php?route=product/catalog, i am using scrapy shell to interogate and assemble my crawler .
When i initiate response, response.css('div.caption span.stat-1').get()
i got this
<span class="stat-1"><span class="stats-label">SKU:</span> <span>8644</span></span>'
i want to extract the value of the sku.
thank you all for your support
from scrapy shell i want to interrogate the url https://www.twhouse.co.uk/index.php?route=product/catalog
response.css('div.caption span.stat-1').get()
gave me this,
'<span class="stat-1"><span class="stats-label">SKU:</span> <span>8644</span></span>'
i want only the sku value from this, when i change is to response.css('div.caption span.stats-label').get()
i got this <span class="stats-label">SKU:</span>', when i inserted '::text' response.css('div.caption span.stats-label::text').get() i got this response 'SKU:' , not the sku value. how do i get the value
答案1
得分: 2
以下是翻译好的内容:
HTML如下:
...
...
<span class="stat-1">
<span class="stats-label">SKU:</span>
<span>8811</span>
</span>
...
...
所以你想要获取外部span
标签("stat-1")内的第二(最后一个)span
标签。
Python代码如下:
scrapy shell https://www.twhouse.co.uk/index.php?route=product/catalog
>>> response.css('div.caption span.stat-1 span:last-child::text').get()
'8811'
如果你想要获取所有的文本,你可以使用 getall(),这样你会得到一个列表。
scrapy shell https://www.twhouse.co.uk/index.php?route=product/catalog
>>> response.css('div.caption span.stat-1 span:last-child::text').getall()
['8811', '8943', '8939', '8730', '8853', '8748', '8901', '8756', '8855', '8951', '8838', '8857', '8934', '8856', '8924', '9050', '8862', '8863', '8764', '9047', '9045', '9055', '8746', '8814', '8714', '8760', '8944', '8958', '8959', '8722', '8743', '8785', '8946', '8860', '8877', '8715', '9011', '8945', '9023', '8947', '9015', '8777', '8753', '8797', '8899', '8734', '8705', '9042', '8936', '8787', '8950', '8888', '8723', '9018', '9019', '8948', '8942', '8890', '8969', '8906', '8907', '8960', '9021', '8713', '9009', '9014', '9022', '8831', '8707', '8724', '9033', '9024', '9038', '8829', '9034', '9027', '9025', '9031', '9026', '9029', '9030', '9032', '9041', '9039', '9051', '9028', '8994', '8765', '8977', '8808', '8978', '8809', '8876', '9008', '8883', '8768', '8823', '8740', '8873', '9013']
英文:
The HTML looks like this:
...
...
<span class="stat-1">
<span class="stats-label">SKU:</span>
<span>8811</span>
</span>
...
...
So you want to get the second (last) span
tag inside the outer span
tag ("stat-1").
scrapy shell https://www.twhouse.co.uk/index.php?route=product/catalog
>>> response.css('div.caption span.stat-1 span:last-child::text').get()
'8811'
If you want to get all the text you can use getall() and you'll get them as a list.
scrapy shell https://www.twhouse.co.uk/index.php?route=product/catalog
>>> response.css('div.caption span.stat-1 span:last-child::text').getall()
['8811', '8943', '8939', '8730', '8853', '8748', '8901', '8756', '8855', '8951', '8838', '8857', '8934', '8856', '8924', '9050', '8862', '8863', '8764', '9047', '9045', '9055', '8746', '8814', '8714', '8760', '8944', '8958', '8959', '8722', '8743', '8785', '8946', '8860', '8877', '8715', '9011', '8945', '9023', '8947', '9015', '8777', '8753', '8797', '8899', '8734', '8705', '9042', '8936', '8787', '8950', '8888', '8723', '9018', '9019', '8948', '8942', '8890', '8969', '8906', '8907', '8960', '9021', '8713', '9009', '9014', '9022', '8831', '8707', '8724', '9033', '9024', '9038', '8829', '9034', '9027', '9025', '9031', '9026', '9029', '9030', '9032', '9041', '9039', '9051', '9028', '8994', '8765', '8977', '8808', '8978', '8809', '8876', '9008', '8883', '8768', '8823', '8740', '8873', '9013']
答案2
得分: 0
尝试访问 stat-1
中的第二个 span
子元素
response.css('div.caption span.stat-1 span:nth-child(2)::text').get()
英文:
Try to access second span
child in stat-1
response.css('div.caption span.stat-1 span:nth-child(2)::text').get()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论