英文:
Python extracts texts following a span tag inside another span tag
问题
You can extract "Part-time, Full-time" and "On Campus" from the span tag using the following code:
你可以使用以下代码从span标签中提取"Part-time, Full-time"和"On Campus":
# 假设你正在使用Python解析HTML
from bs4 import BeautifulSoup
html = '''
<span class="SecondaryFacts DesktopOnlyBlock" data-v-3c87c7ca="">Master <span class="Divider" data-v-3c87c7ca="/">/</span> Part-time, Full-time <span class="Divider" data-v-3c87c7ca="/">/</span> On Campus</span>
'''
soup = BeautifulSoup(html, 'html.parser')
# 找到包含所需文本的 span 标签
span_tag = soup.find('span', class_='SecondaryFacts DesktopOnlyBlock')
# 提取 span 标签内的文本
text = span_tag.get_text()
# 将文本按 '/' 分割并获取所需部分
parts = text.split('/')
result = parts[1].strip() # 获取 "Part-time, Full-time"
result2 = parts[2].strip() # 获取 "On Campus"
print(result)
print(result2)
这段代码将从给定的HTML中提取所需的文本并分配给result
和result2
变量,分别包含"Part-time, Full-time"和"On Campus"。
英文:
Is there a way to extract "Part-time, Full-time" and "On Campus" from the span tag?
<span class="SecondaryFacts DesktopOnlyBlock" data-v-3c87c7ca="">Master <span class="Divider" data-v-3c87c7ca="">/</span> Part-time, Full-time <span class="Divider" data-v-3c87c7ca="">/</span> On Campus</span>
I would be able to locate the span tag through class="Divider"
but get text "/". Is there a way to get text after the inner span closes?
答案1
得分: 0
你的示例中的XML具有一个根元素,该元素是一个包含以下子节点的span
元素:
- 文本节点
Master
- 一个
span
元素 - 文本节点
Part-time, Full-time
- 另一个
span
元素 - 文本节点
On Campus
你说你想提取文本节点 Part-time, Full-time
和 On Campus
?我猜你想要一个XPath,可以应用于其他类似的XML数据,而且有不同的条件可以返回这两个文本节点。所以我猜测你的条件是你想提取任何由具有class
属性为Divider
的兄弟span
元素之前的文本节点。适当的XPath将是:
/span/text()[preceding-sibling::span/@class='Divider']
尽管如此,我怀疑ElementTree
的XPath接口可能不适用于你,因为它不支持返回文本节点的XPath查询,只支持返回元素(这是我理解的,我不是Python程序员)。然而,我知道lxml.etree
的XPath API _会_返回文本节点,例如:https://lxml.de/tutorial.html#using-xpath-to-find-text
英文:
The XML in your example has a root element which is a span
element containing the following child nodes:
- the text node
Master
- a
span
element - the text node
Part-time, Full-time
- another
span
element - the text node
On Campus
You say you want to extract the text nodes Part-time, Full-time
and On Campus
? Presumably you want an XPath that you can apply to other similar XML data, and there are different criteria that could return you those same two text nodes. So I'm going to guess that your criteria are you that you want to extract any text node which is preceding by a sibling span
element whose class
attribute is Divider
. The appropriate XPath would be:
/span/text()[preceding-sibling::span/@class='Divider']
That said, I suspect the ElementTree
XPath interface may not work for you, because it doesn't support XPath queries that return text nodes, only elements (that's what I understand, anyway; I'm not a Python programmer). However, I know that the XPath API of lxml.etree
will return text nodes, e.g. https://lxml.de/tutorial.html#using-xpath-to-find-text
答案2
得分: 0
以下是翻译好的代码部分:
这是将返回您所需内容的代码。您正在寻找的值不属于 'span' 标记,您可以使用搜索 find('body') 或 find_all() 并参考找到的第一个元素。
from bs4 import BeautifulSoup
html = '<span class="SecondaryFacts DesktopOnlyBlock" data-v-3c87c7ca="">Master <span class="Divider" data-v-3c87c7ca="/">/</span> Part-time, Full-time <span class="Divider" data-v-3c87c7ca="/">/</span> On Campus</span>'
soup = BeautifulSoup(html, "lxml")
# body = soup.find('body')
body = soup.find_all()[0]
print(body.text.split("/")[1:])
我们将得到一个您可以根据需要处理的列表:
[' Part-time, Full-time ', ' On Campus']
英文:
Here is the code that will return what you need. The values you are looking for do not belong to the 'span' tag, you can use the search for find('body'), or find_all() and refer to the first element found.
from bs4 import BeautifulSoup
html = '<span class="SecondaryFacts DesktopOnlyBlock" data-v-3c87c7ca="">Master <span class="Divider" data-v-3c87c7ca="">/</span> Part-time, Full-time <span class="Divider" data-v-3c87c7ca="">/</span> On Campus</span>'
soup = BeautifulSoup(html, "lxml")
# body = soup.find('body')
body = soup.find_all()[0]
print(body.text.split("/")[1:])
We will get a list that you can process as you need:
[' Part-time, Full-time ', ' On Campus']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论