从该网站使用Scrapy爬取数据

huangapple go评论75阅读模式
英文:

Scraping data from this site using scrapy

问题

I am a Chinese translation tool, and I'll provide the translated parts as requested:

"我是数据爬取的新手,正在学习如何操作。我打算从这个网站 https://www.twhouse.co.uk/index.php?route=product/catalog 中提取数据数值,我正在使用 Scrapy Shell 进行查询和组装我的爬虫。当我执行 response.css('div.caption span.stat-1').get() 时,我得到了以下结果:<span class="stat-1"><span class="stats-label">SKU:</span> <span>8644</span></span>。我想提取 SKU 的值。谢谢大家的支持。

在 Scrapy Shell 中,我想查询网址 https://www.twhouse.co.uk/index.php?route=product/catalog,执行 response.css('div.caption span.stat-1').get() 后,我得到了以下结果:<span class="stat-1"><span class="stats-label">SKU:</span> <span>8644</span></span>。我只想要这个中的 SKU 值。当我尝试改成 response.css('div.caption span.stats-label').get() 时,我得到了这个结果 <span class="stats-label">SKU:</span>,当我加入 '::text' 时,即 response.css('div.caption span.stats-label::text').get(),我得到了响应 'SKU:',而不是 SKU 的值。我该如何获取值?"

Please note that I've provided the translation without additional content and have excluded the part where you asked not to answer the translation question. If you have any more specific translation requests, feel free to ask.

英文:

I am noob to data scraping and learning the rope
i am going to scrape data values from this website, https://www.twhouse.co.uk/index.php?route=product/catalog, i am using scrapy shell to interogate and assemble my crawler .
When i initiate response, response.css('div.caption span.stat-1').get()
i got this
<span class="stat-1"><span class="stats-label">SKU:</span> <span>8644</span></span>'
i want to extract the value of the sku.
thank you all for your support

from scrapy shell i want to interrogate the url https://www.twhouse.co.uk/index.php?route=product/catalog
response.css('div.caption span.stat-1').get()
gave me this,

'<span class="stat-1"><span class="stats-label">SKU:</span> <span>8644</span></span>'
i want only the sku value from this, when i change is to response.css('div.caption span.stats-label').get()
i got this <span class="stats-label">SKU:</span>', when i inserted '::text' response.css('div.caption span.stats-label::text').get() i got this response 'SKU:' , not the sku value. how do i get the value

答案1

得分: 2

以下是翻译好的内容:

HTML如下:

...
...
<span class="stat-1">
    <span class="stats-label">SKU:</span>
    <span>8811</span>
</span>
...
...

所以你想要获取外部span标签("stat-1")内的第二(最后一个)span标签。

Python代码如下:

scrapy shell https://www.twhouse.co.uk/index.php?route=product/catalog

>>> response.css('div.caption span.stat-1 span:last-child::text').get()
'8811'

如果你想要获取所有的文本,你可以使用 getall(),这样你会得到一个列表。

scrapy shell https://www.twhouse.co.uk/index.php?route=product/catalog

>>> response.css('div.caption span.stat-1 span:last-child::text').getall()
['8811', '8943', '8939', '8730', '8853', '8748', '8901', '8756', '8855', '8951', '8838', '8857', '8934', '8856', '8924', '9050', '8862', '8863', '8764', '9047', '9045', '9055', '8746', '8814', '8714', '8760', '8944', '8958', '8959', '8722', '8743', '8785', '8946', '8860', '8877', '8715', '9011', '8945', '9023', '8947', '9015', '8777', '8753', '8797', '8899', '8734', '8705', '9042', '8936', '8787', '8950', '8888', '8723', '9018', '9019', '8948', '8942', '8890', '8969', '8906', '8907', '8960', '9021', '8713', '9009', '9014', '9022', '8831', '8707', '8724', '9033', '9024', '9038', '8829', '9034', '9027', '9025', '9031', '9026', '9029', '9030', '9032', '9041', '9039', '9051', '9028', '8994', '8765', '8977', '8808', '8978', '8809', '8876', '9008', '8883', '8768', '8823', '8740', '8873', '9013']
英文:

The HTML looks like this:

...
...
&lt;span class=&quot;stat-1&quot;&gt;
    &lt;span class=&quot;stats-label&quot;&gt;SKU:&lt;/span&gt;
    &lt;span&gt;8811&lt;/span&gt;
&lt;/span&gt;
...
...

So you want to get the second (last) span tag inside the outer span tag ("stat-1").

scrapy shell https://www.twhouse.co.uk/index.php?route=product/catalog

&gt;&gt;&gt; response.css(&#39;div.caption span.stat-1 span:last-child::text&#39;).get()
&#39;8811&#39;

If you want to get all the text you can use getall() and you'll get them as a list.

scrapy shell https://www.twhouse.co.uk/index.php?route=product/catalog

&gt;&gt;&gt; response.css(&#39;div.caption span.stat-1 span:last-child::text&#39;).getall()
[&#39;8811&#39;, &#39;8943&#39;, &#39;8939&#39;, &#39;8730&#39;, &#39;8853&#39;, &#39;8748&#39;, &#39;8901&#39;, &#39;8756&#39;, &#39;8855&#39;, &#39;8951&#39;, &#39;8838&#39;, &#39;8857&#39;, &#39;8934&#39;, &#39;8856&#39;, &#39;8924&#39;, &#39;9050&#39;, &#39;8862&#39;, &#39;8863&#39;, &#39;8764&#39;, &#39;9047&#39;, &#39;9045&#39;, &#39;9055&#39;, &#39;8746&#39;, &#39;8814&#39;, &#39;8714&#39;, &#39;8760&#39;, &#39;8944&#39;, &#39;8958&#39;, &#39;8959&#39;, &#39;8722&#39;, &#39;8743&#39;, &#39;8785&#39;, &#39;8946&#39;, &#39;8860&#39;, &#39;8877&#39;, &#39;8715&#39;, &#39;9011&#39;, &#39;8945&#39;, &#39;9023&#39;, &#39;8947&#39;, &#39;9015&#39;, &#39;8777&#39;, &#39;8753&#39;, &#39;8797&#39;, &#39;8899&#39;, &#39;8734&#39;, &#39;8705&#39;, &#39;9042&#39;, &#39;8936&#39;, &#39;8787&#39;, &#39;8950&#39;, &#39;8888&#39;, &#39;8723&#39;, &#39;9018&#39;, &#39;9019&#39;, &#39;8948&#39;, &#39;8942&#39;, &#39;8890&#39;, &#39;8969&#39;, &#39;8906&#39;, &#39;8907&#39;, &#39;8960&#39;, &#39;9021&#39;, &#39;8713&#39;, &#39;9009&#39;, &#39;9014&#39;, &#39;9022&#39;, &#39;8831&#39;, &#39;8707&#39;, &#39;8724&#39;, &#39;9033&#39;, &#39;9024&#39;, &#39;9038&#39;, &#39;8829&#39;, &#39;9034&#39;, &#39;9027&#39;, &#39;9025&#39;, &#39;9031&#39;, &#39;9026&#39;, &#39;9029&#39;, &#39;9030&#39;, &#39;9032&#39;, &#39;9041&#39;, &#39;9039&#39;, &#39;9051&#39;, &#39;9028&#39;, &#39;8994&#39;, &#39;8765&#39;, &#39;8977&#39;, &#39;8808&#39;, &#39;8978&#39;, &#39;8809&#39;, &#39;8876&#39;, &#39;9008&#39;, &#39;8883&#39;, &#39;8768&#39;, &#39;8823&#39;, &#39;8740&#39;, &#39;8873&#39;, &#39;9013&#39;]

答案2

得分: 0

尝试访问 stat-1 中的第二个 span 子元素

response.css('div.caption span.stat-1 span:nth-child(2)::text').get()
英文:

Try to access second span child in stat-1

response.css(&#39;div.caption span.stat-1 span:nth-child(2)::text&#39;).get()

huangapple
  • 本文由 发表于 2023年5月13日 20:52:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76242837.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定