Python提取位于另一个标签内的标签后面的文本。

huangapple go评论71阅读模式
英文:

Python extracts texts following a span tag inside another span tag

问题

You can extract "Part-time, Full-time" and "On Campus" from the span tag using the following code:

你可以使用以下代码从span标签中提取"Part-time, Full-time"和"On Campus":

# 假设你正在使用Python解析HTML
from bs4 import BeautifulSoup

html = '''
<span class="SecondaryFacts DesktopOnlyBlock" data-v-3c87c7ca="">Master <span class="Divider" data-v-3c87c7ca="/">/</span> Part-time, Full-time <span class="Divider" data-v-3c87c7ca="/">/</span> On Campus</span>
'''

soup = BeautifulSoup(html, 'html.parser')

# 找到包含所需文本的 span 标签
span_tag = soup.find('span', class_='SecondaryFacts DesktopOnlyBlock')

# 提取 span 标签内的文本
text = span_tag.get_text()

# 将文本按 '/' 分割并获取所需部分
parts = text.split('/')
result = parts[1].strip()  # 获取 "Part-time, Full-time"
result2 = parts[2].strip()  # 获取 "On Campus"

print(result)
print(result2)

这段代码将从给定的HTML中提取所需的文本并分配给resultresult2变量,分别包含"Part-time, Full-time"和"On Campus"。

英文:

Is there a way to extract "Part-time, Full-time" and "On Campus" from the span tag?

&lt;span class=&quot;SecondaryFacts DesktopOnlyBlock&quot; data-v-3c87c7ca=&quot;&quot;&gt;Master &lt;span class=&quot;Divider&quot; data-v-3c87c7ca=&quot;&quot;&gt;/&lt;/span&gt; Part-time, Full-time &lt;span class=&quot;Divider&quot; data-v-3c87c7ca=&quot;&quot;&gt;/&lt;/span&gt; On Campus&lt;/span&gt;

I would be able to locate the span tag through class=&quot;Divider&quot; but get text "/". Is there a way to get text after the inner span closes?

答案1

得分: 0

你的示例中的XML具有一个根元素,该元素是一个包含以下子节点的span元素:

  • 文本节点 Master
  • 一个 span 元素
  • 文本节点 Part-time, Full-time
  • 另一个 span 元素
  • 文本节点 On Campus

你说你想提取文本节点 Part-time, Full-time On Campus?我猜你想要一个XPath,可以应用于其他类似的XML数据,而且有不同的条件可以返回这两个文本节点。所以我猜测你的条件是你想提取任何由具有class属性为Divider的兄弟span元素之前的文本节点。适当的XPath将是:

/span/text()[preceding-sibling::span/@class='Divider']

尽管如此,我怀疑ElementTree的XPath接口可能不适用于你,因为它不支持返回文本节点的XPath查询,只支持返回元素(这是我理解的,我不是Python程序员)。然而,我知道lxml.etree的XPath API _会_返回文本节点,例如:https://lxml.de/tutorial.html#using-xpath-to-find-text

英文:

The XML in your example has a root element which is a span element containing the following child nodes:

  • the text node Master
  • a span element
  • the text node Part-time, Full-time
  • another span element
  • the text node On Campus

You say you want to extract the text nodes Part-time, Full-time and On Campus? Presumably you want an XPath that you can apply to other similar XML data, and there are different criteria that could return you those same two text nodes. So I'm going to guess that your criteria are you that you want to extract any text node which is preceding by a sibling span element whose class attribute is Divider. The appropriate XPath would be:

/span/text()[preceding-sibling::span/@class=&#39;Divider&#39;]

That said, I suspect the ElementTree XPath interface may not work for you, because it doesn't support XPath queries that return text nodes, only elements (that's what I understand, anyway; I'm not a Python programmer). However, I know that the XPath API of lxml.etree will return text nodes, e.g. https://lxml.de/tutorial.html#using-xpath-to-find-text

答案2

得分: 0

以下是翻译好的代码部分:

这是将返回您所需内容的代码您正在寻找的值不属于 'span' 标记您可以使用搜索 find('body') 或 find_all() 并参考找到的第一个元素

from bs4 import BeautifulSoup


html = '<span class="SecondaryFacts DesktopOnlyBlock" data-v-3c87c7ca="">Master <span class="Divider" data-v-3c87c7ca="/">/</span> Part-time, Full-time <span class="Divider" data-v-3c87c7ca="/">/</span> On Campus</span>'
soup = BeautifulSoup(html, "lxml")
# body = soup.find('body')
body = soup.find_all()[0]
print(body.text.split("/")[1:])

我们将得到一个您可以根据需要处理的列表:

[' Part-time, Full-time ', ' On Campus']

英文:

Here is the code that will return what you need. The values you are looking for do not belong to the 'span' tag, you can use the search for find('body'), or find_all() and refer to the first element found.

from bs4 import BeautifulSoup


html = &#39;&lt;span class=&quot;SecondaryFacts DesktopOnlyBlock&quot; data-v-3c87c7ca=&quot;&quot;&gt;Master &lt;span class=&quot;Divider&quot; data-v-3c87c7ca=&quot;&quot;&gt;/&lt;/span&gt; Part-time, Full-time &lt;span class=&quot;Divider&quot; data-v-3c87c7ca=&quot;&quot;&gt;/&lt;/span&gt; On Campus&lt;/span&gt;&#39;
soup = BeautifulSoup(html, &quot;lxml&quot;)
# body = soup.find(&#39;body&#39;)
body = soup.find_all()[0]
print(body.text.split(&quot;/&quot;)[1:])

We will get a list that you can process as you need:

[&#39; Part-time, Full-time &#39;, &#39; On Campus&#39;]

huangapple
  • 本文由 发表于 2023年3月7日 14:10:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/75658536.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定