How to pull text after a specific span tag text but having sup tag in HTML with Python

huangapple go评论73阅读模式
英文:

How to pull text after a specific span tag text but having sup tag in HTML with Python

问题

from bs4 import BeautifulSoup
import re

HTML = '''
<h1> Apple Inc. (AAPL) </h1>
<p>$148.19
<span>(As of 08/20/21)</span>
</p>
<p>
<span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
$167.00
</p>
'''

soup = BeautifulSoup(HTML, 'lxml')
value = soup.find('span', string=re.compile("Price Target")).next_sibling.strip()
print(value)
英文:

I used to pull text after a specific span tag, based on this post. In the example, the code retrieved $167.00 from the old HTML, which is Price Target and what I want. But since the website format was changed to the New HTML below, the code returns nothing with the New HTML. The span tag of &lt;span&gt;Price Target (6-12 Months):&lt;/span&gt; is changed to &lt;span&gt;Price Target (6-12 Months)&lt;sup&gt;(2)&lt;/sup&gt;:&lt;/span&gt;

Old HTML:
> <h1> Apple Inc. (AAPL) </h1>
> <p>$148.19
> <span>(As of 08/20/21)</span>
> </p>
> <p>
> <span>Price Target (6-12 Months):</span>
> $167.00
> </p>

New HTML:
> <h1> Apple Inc. (AAPL) </h1>
> <p>$148.19
> <span>(As of 08/20/21)</span>
> </p>
> <p>
> <span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
> $167.00
> </p>

What should be changed in the code below, in order to retrieve $167.00?

from bs4 import BeautifulSoup
import requests
import re

HTML = &#39;&#39;&#39;
&lt;h1&gt; Apple Inc. (AAPL) &lt;/h1&gt;
&lt;p&gt;$148.19
&lt;span&gt;(As of 08/20/21)&lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;span&gt;Price Target (6-12 Months)&lt;sup&gt;(2)&lt;/sup&gt;:&lt;/span&gt;
$167.00
&lt;/p&gt;
&#39;&#39;&#39;

page = requests.get(HTML)
soup = BeautifulSoup(page.content, &#39;lxml&#39;)
value = soup.find(&#39;span&#39;, string=re.compile(&quot;Price Target&quot;)).parent.contents[1]
print(value)

答案1

得分: 0

你可以搜索包含文本“Price Target”的&lt;span&gt;标签,然后选择下一个文本兄弟节点:

from bs4 import BeautifulSoup

HTML = '''&lt;h1&gt; Apple Inc. (AAPL) &lt;/h1&gt;
&lt;p&gt;$148.19
&lt;span&gt;(As of 08/20/21)&lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;span&gt;Price Target (6-12 Months)&lt;sup&gt;(2)&lt;/sup&gt;:&lt;/span&gt;
$167.00
&lt;/p&gt;
'''

soup = BeautifulSoup(HTML, 'lxml')

value = soup.select_one('span:-soup-contains("Price Target")').find_next_sibling(string=True).strip()
print(value)

输出:

$167.00
英文:

You can search for &lt;span&gt; tag containing the text &quot;Price Target&quot; and then next text sibling:

from bs4 import BeautifulSoup

HTML = &#39;&#39;&#39;
&lt;h1&gt; Apple Inc. (AAPL) &lt;/h1&gt;
&lt;p&gt;$148.19
&lt;span&gt;(As of 08/20/21)&lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;span&gt;Price Target (6-12 Months)&lt;sup&gt;(2)&lt;/sup&gt;:&lt;/span&gt;
$167.00
&lt;/p&gt;
&#39;&#39;&#39;

soup = BeautifulSoup(HTML, &#39;lxml&#39;)

value = soup.select_one(&#39;span:-soup-contains(&quot;Price Target&quot;)&#39;).find_next_sibling(string=True).strip()
print(value)

Prints:

$167.00

huangapple
  • 本文由 发表于 2023年6月6日 04:31:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76409810.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定