使用Beautiful Soup获取特定单词之后的文本。

huangapple go评论61阅读模式
英文:

Getting text after a specific word using beautiful soup

问题

我想要使用Beautiful Soup获取特定文本之后的文本。这个网站 https://www.capterra.ca/reviews/203084/voip-ms 上有客户评论,我想提取评论的“pros:”部分,并将其复制到名为“pros”的文本文件中,而不是所有文本。

import requests, bs4

res = requests.get('https://www.capterra.ca/reviews/203084/voip-ms')

parseSoup = bs4.BeautifulSoup(res.text, 'html.parser')

# 以下是要添加的部分
pros_section = parseSoup.find('div', class_='pros-section')  # 使用合适的类名或标签来定位"pros"部分

with open("pros.txt", 'w') as file:
    file.write(pros_section.get_text() + "\n")

这是我迄今为止的代码,它只将所有文本复制到文本文件中。

英文:

I want to get text after a specific text using beautiful soup. This website https://www.capterra.ca/reviews/203084/voip-ms has customer reviews and I want to extract only the "pros:" section for the reviews and copy it to a text file titled "pros" instead of all the text.

import requests,bs4 res = requests.get('https://www.capterra.ca/reviews/203084/voip-ms')

parseSoup= bs4.BeautifulSoup(res.text,'html.parser')

paragraphs = parseSoup.find_all('p')

with open("web.txt", 'w') as file:
    for paragraph in paragraphs:
        file.write(paragraph.get_text() + "\n")

This is my code so far that just copies all the text into a text file.

答案1

得分: 3

你可以使用CSS选择器p:-soup-contains("Pros:") + p(这将搜索所有直接包含字符串"Pros:"的<p>标签的直接兄弟元素):

import requests
from bs4 import BeautifulSoup

res = requests.get("https://www.capterra.ca/reviews/203084/voip-ms")
soup = BeautifulSoup(res.text, "html.parser")

for t in soup.select('p:-soup-contains("Pros:") + p'):
    print(t.get_text(strip=True))
    print('-' * 80)

打印结果:

这是一个简单明了的VoIP服务,提供了我所需要的一切,无需繁琐的界面。
--------------------------------------------------------------------------------
如果你知道自己在做什么,这个服务性价比很高。最初的挑战是弄清楚VoIP服务的按分钟定价模型,但它总是比其他电话模型便宜得多。
--------------------------------------------------------------------------------
拥有许多功能,但简单而有效。物有所值。
--------------------------------------------------------------------------------
这是一款强大的软件,提供了许多VOIP的选项和功能。它很精简,不会占用大量带宽。Wiki非常有帮助,客户服务总是愿意提供帮助。不需要购买定制设备来使用VOIP.ms,这非常棒。现在随着自我隔离,有了将电话号码转发的选项非常好。
--------------------------------------------------------------------------------
像我一样,当你将这项服务与传统电话服务进行比较时,你会对所获得的各种功能和大幅降低的价格感到惊讶。一旦你了解了voip.ms的管理界面,你会发现设置自动助理以在企业内部路由呼叫非常简单。您会喜欢将语音邮件消息转发到您的电子邮件地址,消息以mp3文件的形式附加。还提供了更多的呼叫路由选项,如巡回等。
--------------------------------------------------------------------------------

...
英文:

You can use CSS selector p:-soup-contains(&quot;Pros:&quot;) + p (This will search for all direct &lt;p&gt; siblings of &lt;p&gt; tags containing string "Pros:"):

import requests
from bs4 import BeautifulSoup

res = requests.get(&quot;https://www.capterra.ca/reviews/203084/voip-ms&quot;)
soup = BeautifulSoup(res.text, &quot;html.parser&quot;)

for t in soup.select(&#39;p:-soup-contains(&quot;Pros:&quot;) + p&#39;):
    print(t.get_text(strip=True))
    print(&#39;-&#39; * 80)

Prints:

It&#39;s a cut and dry VoIP service that provides exactly what I need, how I need without burdensome interfaces.
--------------------------------------------------------------------------------
If you know what you are doing this offers great value for money. An initial challenge is working out the per minute pricing model for voip services. However, it is invariably way cheaper than any other phone model.
--------------------------------------------------------------------------------
Many features but simply it just works. A great value.
--------------------------------------------------------------------------------
It is a powerful software that gives lots of options and functionality with VOIP. It is lean and does not have a lots of bulk so it can be used over phone lines without using lots of bandwidth. The wiki is helpful and the customer service has always been willing to help. It is great that you don&#39;t need to purchase customized equipment to use VOIP.ms. Now with the self isolation it has been great to have the option to forward the phone numbers.
--------------------------------------------------------------------------------
Like me, you&#39;ll be amazed at the range of features you get for the vastly reduced price when comparing the service to that of a traditional telephone service. Once you learn the admin interface of voip.ms, you&#39;ll find it simple to set up an auto attendant for routing calls within your business. You&#39;ll love the ability to forward a voice mail message to your e-mail address, with the message attached as an mp3 file. Many more options for call routing are offered, such as hunting.
--------------------------------------------------------------------------------

...

huangapple
  • 本文由 发表于 2023年5月28日 05:13:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76349074.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定