英文:
How do I use bs4 to parse the text description of an anchor tag, especially when the href link is broken?
问题
我正在练习使用BS4解析HTML文件。我遇到了一个问题,似乎找不到解决方案。我应该如何解析锚标记内部?我尝试指定“href”标记,但链接中有一些额外字符会破坏href标记。
例如,我试图解析这个链接到我的旧问题之一:
<a href = "https://stackoverflow.com/questions/61925957/using-an-api-to-create-data-in-a-react-table" style=
="color: #FFFFFF;font-size: 15px;"> >
但实际上它包含一些破坏标记的字符:
<a href = "https://stackoverflow.com/&amp=3D;questions/61925957"=3D"/using-an-api-to-create-data-in-a-react-table" style=
"color: #FFFFFF;font-size: 15px;"> >
我应该如何使用bs4获取此标记的内部内容,以便我可以修整它并获取最终链接?我还想忽略样式、颜色和字体大小描述符。
英文:
I'm practicing using BS4 to parse HTML files. I've encountered a certain issue and I can't seem to find the solution anywhere. How would I parse the inside of an an anchor tag? I've tried specifying the "href" tag but the link has some added characters which breaks the href tag.
For instance, I am trying to parse this link to one of my older questions:
<a href = "https://stackoverflow.com/questions/61925957/using-an-api-to-create-data-in-a-react-table" style=
=3D"color: #FFFFFF;font-size: 15px;"> >
But, instead it has some characters which breaks the tag:
<a href = "https://stackoverflow.com/&amp=3D"questions/61925957"=3D"/using-an-api-to-create-data-in-a-react-table" style=
=3D"color: #FFFFFF;font-size: 15px;" >
How would I get the inside of this tag using bs4 so that I can trim it and get my final link? I want to also ignore the style, color and font-size descriptors.
答案1
得分: 1
from bs4 import BeautifulSoup
html_sample = """<a href = "https://stackoverflow.com/questions/61925957/using-an-api-to-create-data-in-a-react-table" style=
=3D"color: #FFFFFF;font-size: 15px;"> >"""
soup = BeautifulSoup(html_sample, "lxml").select_one("a")["href"]
print(soup)
输出:
https://stackoverflow.com/questions/61925957/using-an-api-to-create-data-in-a-react-table
英文:
I can't reproduce the issue, this works just fine:
from bs4 import BeautifulSoup
html_sample = """<a href = "https://stackoverflow.com/questions/61925957/using-an-api-to-create-data-in-a-react-table" style=
=3D"color: #FFFFFF;font-size: 15px;"> >"""
soup = BeautifulSoup(html_sample, "lxml").select_one("a")["href"]
print(soup)
Output:
https://stackoverflow.com/questions/61925957/using-an-api-to-create-data-in-a-react-table
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论