2023年6月6日 04:31:58go评论114阅读模式

英文:

How to pull text after a specific span tag text but having sup tag in HTML with Python

问题

from bs4 import BeautifulSoup
import re
HTML = '''
<h1> Apple Inc. (AAPL) </h1>
<p>$148.19
<span>(As of 08/20/21)</span>
</p>
<p>
<span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
$167.00
</p>
'''
soup = BeautifulSoup(HTML, 'lxml')
value = soup.find('span', string=re.compile("Price Target")).next_sibling.strip()
print(value)

英文:

I used to pull text after a specific span tag, based on this post. In the example, the code retrieved $167.00 from the old HTML, which is Price Target and what I want. But since the website format was changed to the New HTML below, the code returns nothing with the New HTML. The span tag of Price Target (6-12 Months): is changed to Price Target (6-12 Months)(2):

Old HTML:
> <h1> Apple Inc. (AAPL) </h1>
> $148.19
> (As of 08/20/21)
> 
> 
> Price Target (6-12 Months):
> $167.00
>

New HTML:
> <h1> Apple Inc. (AAPL) </h1>
> $148.19
> (As of 08/20/21)
> 
> 
> Price Target (6-12 Months)(2):
> $167.00
>

What should be changed in the code below, in order to retrieve $167.00?

from bs4 import BeautifulSoup
import requests
import re
HTML = &#39;&#39;&#39;
&lt;h1&gt; Apple Inc. (AAPL) &lt;/h1&gt;
&lt;p&gt;$148.19
&lt;span&gt;(As of 08/20/21)&lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;span&gt;Price Target (6-12 Months)&lt;sup&gt;(2)&lt;/sup&gt;:&lt;/span&gt;
$167.00
&lt;/p&gt;
&#39;&#39;&#39;
page = requests.get(HTML)
soup = BeautifulSoup(page.content, &#39;lxml&#39;)
value = soup.find(&#39;span&#39;, string=re.compile(&quot;Price Target&quot;)).parent.contents[1]
print(value)

答案1

得分: 0

你可以搜索包含文本“Price Target”的标签，然后选择下一个文本兄弟节点：

from bs4 import BeautifulSoup
HTML = '''&lt;h1&gt; Apple Inc. (AAPL) &lt;/h1&gt;
&lt;p&gt;$148.19
&lt;span&gt;(As of 08/20/21)&lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;span&gt;Price Target (6-12 Months)&lt;sup&gt;(2)&lt;/sup&gt;:&lt;/span&gt;
$167.00
&lt;/p&gt;
'''
soup = BeautifulSoup(HTML, 'lxml')
value = soup.select_one('span:-soup-contains("Price Target")').find_next_sibling(string=True).strip()
print(value)

输出：

$167.00

英文:

You can search for  tag containing the text "Price Target" and then next text sibling:

from bs4 import BeautifulSoup
HTML = &#39;&#39;&#39;
&lt;h1&gt; Apple Inc. (AAPL) &lt;/h1&gt;
&lt;p&gt;$148.19
&lt;span&gt;(As of 08/20/21)&lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;span&gt;Price Target (6-12 Months)&lt;sup&gt;(2)&lt;/sup&gt;:&lt;/span&gt;
$167.00
&lt;/p&gt;
&#39;&#39;&#39;
soup = BeautifulSoup(HTML, &#39;lxml&#39;)
value = soup.select_one(&#39;span:-soup-contains(&quot;Price Target&quot;)&#39;).find_next_sibling(string=True).strip()
print(value)

Prints:

$167.00

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to pull text after a specific span tag text but having sup tag in HTML with Python

问题

答案1

如何在Jenkins日志中对测试用例失败进行分类。

How can I add an 'if' statement in a dataframe so that I check there are rows so I don't run into an indexing error?

如何将单个对象或集合转换为集合？

如何在使用循环时解决Pandas数据框中的’KeyError’问题，当使用自定义函数时？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。