2023年7月10日 16:05:21go评论87阅读模式

英文:

Get specific text from 2 paragraphs in the same class using Scrapy

问题

我对Scrapy非常陌生，我想要能够使用Scrapy shell提取两个文本段落：“Fintec, Cybersecurity”和“Serie C”。

如果我运行

response.css('div.card-body p.card-text strong::text').get()

我会得到'Secteur'，但我正在寻找'Fintec, Cybersecurity'。

对于

response.css('div.card-body p.card-text::text').get()

我得到'/n'。

我注意到如果我使用

response.css('div.card-body p.card-text:nth-child(3)').get()

我会得到

<p class="card-text">
  <strong>Round</strong> : Série C
</p>

而对于

response.css('div.card-body p.card-text:nth-child(2)').get()

我得到

<p class="card-text">
  <strong>Secteur</strong> : Fintech, Cybersecurity
</p>

我如何获得Serie C和Fintech, Cybersecurity？

谢谢

英文:

I'm very new to Scrapy and I want to be able to extract both texts paragraph using Scrapy shell: "Fintec, Cybersecurity" and "Serie C"

If I run

response.css(&#39;div.card-body p.card-text strong::text&#39;).get()

I get 'Secteur' but I'm looking for 'Fintec, Cybersecurity'.

for

response.css(&#39;div.card-body p.card-text::text&#39;).get()

I get '/n'

I've noticed if I use

response.css(&#39;div.card-body p.card-text:nth-child(3)&#39;).get()

I get \nRound : Série C\n 
and for

response.css(&#39;div.card-body p.card-text:nth-child(2)&#39;).get()

I get

\nSecteur : Fintech, Cybersecurity\n

How do I get Serie C and Fintech Cybersecurity?

Thank you

答案1

得分: 0

这应该可以工作... 'div.card-body p.card-text::text' 你只需要使用 getall 或 extract 方法。

这是我在ipython中做的一个示例：

In [3]: html = '&#39;&#39;&#39;&lt;div class=&quot;card-body&quot;&gt;
   ...:     &lt;h3 class=&quot;card-title mb-1&quot;&gt;L&lt;/h3&gt;
   ...:     &lt;p class=&quot;card-text&quot;&gt;
   ...:         &lt;strong&gt;Secteur&lt;/strong&gt;
   ...:         &quot; : Fintech, Cybersecurity &quot;
   ...:     &lt;/p&gt;
   ...:     &lt;p class=&quot;card-text&quot;&gt;
   ...:         &lt;strong&gt;Round&lt;/strong&gt;
   ...:         &quot; : Serie C &quot;
   ...:     &lt;/p&gt;
   ...:     &lt;p class=&quot;card-text&quot;&gt;
   ...:         &lt;small class=&quot;text-muted&quot;&gt; 2820 votes enregistres &lt;/small&gt;
   ...:     &lt;/p&gt;
   ...: &lt;/div&gt;&#39;&#39;&#39;
In [4]: response = parsel.Selector(html)
In [5]: for p in response.css('&lt;div class=&quot;card-body&quot; p.card-text::text').getall():
   ...:     text = '&#39;&#39;.join(p).strip()
   ...:     print(text)
   ...:
&quot; : Fintech, Cybersecurity &quot;
&quot; : Serie C &quot;

英文:

This should work... 'div.card-body p.card-text::text' you just need to use either the getall or extract methods.

Here is an example I did in ipython:

In [3]: html = &#39;&#39;&#39;&lt;div class=&quot;card-body&quot;&gt;
   ...:     &lt;h3 class=&quot;card-title mb-1&quot;&gt;L&lt;/h3&gt;
   ...:     &lt;p class=&quot;card-text&quot;&gt;
   ...:         &lt;strong&gt;Secteur&lt;/strong&gt;
   ...:         &quot; : Fintech, Cybersecurity &quot;
   ...:     &lt;/p&gt;
   ...:     &lt;p class=&quot;card-text&quot;&gt;
   ...:         &lt;strong&gt;Round&lt;/strong&gt;
   ...:         &quot; : Serie C &quot;
   ...:     &lt;/p&gt;
   ...:     &lt;p class=&quot;card-text&quot;&gt;
   ...:         &lt;small class=&quot;text-muted&quot;&gt; 2820 votes enregistres &lt;/small&gt;
   ...:     &lt;/p&gt;
   ...: &lt;/div&gt;&#39;&#39;&#39;
In [4]: response = parsel.Selector(html)
In [5]: for p in response.css(&#39;div.card-body p.card-text::text&#39;).getall():
   ...:     text=&#39;&#39;.join(p).strip()
   ...:     print(text)
   ...:
&quot; : Fintech, Cybersecurity &quot;
&quot; : Serie C &quot;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从同一类别中使用Scrapy获取两个段落中的特定文本。

问题

答案1

Scrapy未将输出保存为jsonline。

使用 `querySelector` 选择节点时遇到问题，使用数据属性值。

网页抓取结果不正确

如何在不使用Bootstrap的情况下将图标和内容并排显示？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。