英文:
How to pull text after a specific span tag text but having sup tag in HTML with Python
问题
from bs4 import BeautifulSoup
import re
HTML = '''
<h1> Apple Inc. (AAPL) </h1>
<p>$148.19
<span>(As of 08/20/21)</span>
</p>
<p>
<span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
$167.00
</p>
'''
soup = BeautifulSoup(HTML, 'lxml')
value = soup.find('span', string=re.compile("Price Target")).next_sibling.strip()
print(value)
英文:
I used to pull text after a specific span tag, based on this post. In the example, the code retrieved $167.00 from the old HTML, which is Price Target and what I want. But since the website format was changed to the New HTML below, the code returns nothing with the New HTML. The span tag of <span>Price Target (6-12 Months):</span>
is changed to <span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
Old HTML:
> <h1> Apple Inc. (AAPL) </h1>
> <p>$148.19
> <span>(As of 08/20/21)</span>
> </p>
> <p>
> <span>Price Target (6-12 Months):</span>
> $167.00
> </p>
New HTML:
> <h1> Apple Inc. (AAPL) </h1>
> <p>$148.19
> <span>(As of 08/20/21)</span>
> </p>
> <p>
> <span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
> $167.00
> </p>
What should be changed in the code below, in order to retrieve $167.00?
from bs4 import BeautifulSoup
import requests
import re
HTML = '''
<h1> Apple Inc. (AAPL) </h1>
<p>$148.19
<span>(As of 08/20/21)</span>
</p>
<p>
<span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
$167.00
</p>
'''
page = requests.get(HTML)
soup = BeautifulSoup(page.content, 'lxml')
value = soup.find('span', string=re.compile("Price Target")).parent.contents[1]
print(value)
答案1
得分: 0
你可以搜索包含文本“Price Target”的<span>
标签,然后选择下一个文本兄弟节点:
from bs4 import BeautifulSoup
HTML = '''<h1> Apple Inc. (AAPL) </h1>
<p>$148.19
<span>(As of 08/20/21)</span>
</p>
<p>
<span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
$167.00
</p>
'''
soup = BeautifulSoup(HTML, 'lxml')
value = soup.select_one('span:-soup-contains("Price Target")').find_next_sibling(string=True).strip()
print(value)
输出:
$167.00
英文:
You can search for <span>
tag containing the text "Price Target"
and then next text sibling:
from bs4 import BeautifulSoup
HTML = '''
<h1> Apple Inc. (AAPL) </h1>
<p>$148.19
<span>(As of 08/20/21)</span>
</p>
<p>
<span>Price Target (6-12 Months)<sup>(2)</sup>:</span>
$167.00
</p>
'''
soup = BeautifulSoup(HTML, 'lxml')
value = soup.select_one('span:-soup-contains("Price Target")').find_next_sibling(string=True).strip()
print(value)
Prints:
$167.00
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论