How to pull text after a specific span tag text but having sup tag in HTML with Python

huangapple go评论114阅读模式
英文:

How to pull text after a specific span tag text but having sup tag in HTML with Python

问题

  1. from bs4 import BeautifulSoup
  2. import re
  3. HTML = '''
  4. <h1> Apple Inc. (AAPL) </h1>
  5. <p>$148.19
  6. <span>(As of 08/20/21)</span>
  7. </p>
  8. <p>
  9. <span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
  10. $167.00
  11. </p>
  12. '''
  13. soup = BeautifulSoup(HTML, 'lxml')
  14. value = soup.find('span', string=re.compile("Price Target")).next_sibling.strip()
  15. print(value)
英文:

I used to pull text after a specific span tag, based on this post. In the example, the code retrieved $167.00 from the old HTML, which is Price Target and what I want. But since the website format was changed to the New HTML below, the code returns nothing with the New HTML. The span tag of &lt;span&gt;Price Target (6-12 Months):&lt;/span&gt; is changed to &lt;span&gt;Price Target (6-12 Months)&lt;sup&gt;(2)&lt;/sup&gt;:&lt;/span&gt;

Old HTML:
> <h1> Apple Inc. (AAPL) </h1>
> <p>$148.19
> <span>(As of 08/20/21)</span>
> </p>
> <p>
> <span>Price Target (6-12 Months):</span>
> $167.00
> </p>

New HTML:
> <h1> Apple Inc. (AAPL) </h1>
> <p>$148.19
> <span>(As of 08/20/21)</span>
> </p>
> <p>
> <span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
> $167.00
> </p>

What should be changed in the code below, in order to retrieve $167.00?

  1. from bs4 import BeautifulSoup
  2. import requests
  3. import re
  4. HTML = &#39;&#39;&#39;
  5. &lt;h1&gt; Apple Inc. (AAPL) &lt;/h1&gt;
  6. &lt;p&gt;$148.19
  7. &lt;span&gt;(As of 08/20/21)&lt;/span&gt;
  8. &lt;/p&gt;
  9. &lt;p&gt;
  10. &lt;span&gt;Price Target (6-12 Months)&lt;sup&gt;(2)&lt;/sup&gt;:&lt;/span&gt;
  11. $167.00
  12. &lt;/p&gt;
  13. &#39;&#39;&#39;
  14. page = requests.get(HTML)
  15. soup = BeautifulSoup(page.content, &#39;lxml&#39;)
  16. value = soup.find(&#39;span&#39;, string=re.compile(&quot;Price Target&quot;)).parent.contents[1]
  17. print(value)

答案1

得分: 0

你可以搜索包含文本“Price Target”的&lt;span&gt;标签,然后选择下一个文本兄弟节点:

  1. from bs4 import BeautifulSoup
  2. HTML = '''&lt;h1&gt; Apple Inc. (AAPL) &lt;/h1&gt;
  3. &lt;p&gt;$148.19
  4. &lt;span&gt;(As of 08/20/21)&lt;/span&gt;
  5. &lt;/p&gt;
  6. &lt;p&gt;
  7. &lt;span&gt;Price Target (6-12 Months)&lt;sup&gt;(2)&lt;/sup&gt;:&lt;/span&gt;
  8. $167.00
  9. &lt;/p&gt;
  10. '''
  11. soup = BeautifulSoup(HTML, 'lxml')
  12. value = soup.select_one('span:-soup-contains("Price Target")').find_next_sibling(string=True).strip()
  13. print(value)

输出:

  1. $167.00
英文:

You can search for &lt;span&gt; tag containing the text &quot;Price Target&quot; and then next text sibling:

  1. from bs4 import BeautifulSoup
  2. HTML = &#39;&#39;&#39;
  3. &lt;h1&gt; Apple Inc. (AAPL) &lt;/h1&gt;
  4. &lt;p&gt;$148.19
  5. &lt;span&gt;(As of 08/20/21)&lt;/span&gt;
  6. &lt;/p&gt;
  7. &lt;p&gt;
  8. &lt;span&gt;Price Target (6-12 Months)&lt;sup&gt;(2)&lt;/sup&gt;:&lt;/span&gt;
  9. $167.00
  10. &lt;/p&gt;
  11. &#39;&#39;&#39;
  12. soup = BeautifulSoup(HTML, &#39;lxml&#39;)
  13. value = soup.select_one(&#39;span:-soup-contains(&quot;Price Target&quot;)&#39;).find_next_sibling(string=True).strip()
  14. print(value)

Prints:

  1. $167.00

huangapple
  • 本文由 发表于 2023年6月6日 04:31:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76409810.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定