2023年3月7日 14:10:12go评论78阅读模式

英文:

Python extracts texts following a span tag inside another span tag

问题

You can extract "Part-time, Full-time" and "On Campus" from the span tag using the following code:

你可以使用以下代码从span标签中提取"Part-time, Full-time"和"On Campus"：

# 假设你正在使用Python解析HTML
from bs4 import BeautifulSoup

html = '''
<span class="SecondaryFacts DesktopOnlyBlock" data-v-3c87c7ca="">Master <span class="Divider" data-v-3c87c7ca="/">/</span> Part-time, Full-time <span class="Divider" data-v-3c87c7ca="/">/</span> On Campus</span>
'''

soup = BeautifulSoup(html, 'html.parser')

# 找到包含所需文本的 span 标签
span_tag = soup.find('span', class_='SecondaryFacts DesktopOnlyBlock')

# 提取 span 标签内的文本
text = span_tag.get_text()

# 将文本按 '/' 分割并获取所需部分
parts = text.split('/')
result = parts[1].strip()  # 获取 "Part-time, Full-time"
result2 = parts[2].strip()  # 获取 "On Campus"

print(result)
print(result2)

这段代码将从给定的HTML中提取所需的文本并分配给result和result2变量，分别包含"Part-time, Full-time"和"On Campus"。

英文:

Is there a way to extract "Part-time, Full-time" and "On Campus" from the span tag?

&lt;span class=&quot;SecondaryFacts DesktopOnlyBlock&quot; data-v-3c87c7ca=&quot;&quot;&gt;Master &lt;span class=&quot;Divider&quot; data-v-3c87c7ca=&quot;&quot;&gt;/&lt;/span&gt; Part-time, Full-time &lt;span class=&quot;Divider&quot; data-v-3c87c7ca=&quot;&quot;&gt;/&lt;/span&gt; On Campus&lt;/span&gt;

I would be able to locate the span tag through class="Divider" but get text "/". Is there a way to get text after the inner span closes?

答案1

得分: 0

你的示例中的XML具有一个根元素，该元素是一个包含以下子节点的span元素：

文本节点 Master
一个 span 元素
文本节点 Part-time, Full-time
另一个 span 元素
文本节点 On Campus

你说你想提取文本节点 Part-time, Full-time 和 On Campus？我猜你想要一个XPath，可以应用于其他类似的XML数据，而且有不同的条件可以返回这两个文本节点。所以我猜测你的条件是你想提取任何由具有class属性为Divider的兄弟span元素之前的文本节点。适当的XPath将是：

/span/text()[preceding-sibling::span/@class='Divider']

尽管如此，我怀疑ElementTree的XPath接口可能不适用于你，因为它不支持返回文本节点的XPath查询，只支持返回元素（这是我理解的，我不是Python程序员）。然而，我知道lxml.etree的XPath API _会_返回文本节点，例如：https://lxml.de/tutorial.html#using-xpath-to-find-text

英文:

The XML in your example has a root element which is a span element containing the following child nodes:

the text node Master
a span element
the text node Part-time, Full-time
another span element
the text node On Campus

You say you want to extract the text nodes Part-time, Full-time and On Campus? Presumably you want an XPath that you can apply to other similar XML data, and there are different criteria that could return you those same two text nodes. So I'm going to guess that your criteria are you that you want to extract any text node which is preceding by a sibling span element whose class attribute is Divider. The appropriate XPath would be:

/span/text()[preceding-sibling::span/@class=&#39;Divider&#39;]

That said, I suspect the ElementTree XPath interface may not work for you, because it doesn't support XPath queries that return text nodes, only elements (that's what I understand, anyway; I'm not a Python programmer). However, I know that the XPath API of lxml.etree will return text nodes, e.g. https://lxml.de/tutorial.html#using-xpath-to-find-text

答案2

得分: 0

以下是翻译好的代码部分：

这是将返回您所需内容的代码。您正在寻找的值不属于 'span' 标记，您可以使用搜索 find('body') 或 find_all() 并参考找到的第一个元素。

from bs4 import BeautifulSoup


html = '<span class="SecondaryFacts DesktopOnlyBlock" data-v-3c87c7ca="">Master <span class="Divider" data-v-3c87c7ca="/">/</span> Part-time, Full-time <span class="Divider" data-v-3c87c7ca="/">/</span> On Campus</span>'
soup = BeautifulSoup(html, "lxml")
# body = soup.find('body')
body = soup.find_all()[0]
print(body.text.split("/")[1:])

我们将得到一个您可以根据需要处理的列表：

[' Part-time, Full-time ', ' On Campus']

英文:

Here is the code that will return what you need. The values you are looking for do not belong to the 'span' tag, you can use the search for find('body'), or find_all() and refer to the first element found.

from bs4 import BeautifulSoup


html = &#39;&lt;span class=&quot;SecondaryFacts DesktopOnlyBlock&quot; data-v-3c87c7ca=&quot;&quot;&gt;Master &lt;span class=&quot;Divider&quot; data-v-3c87c7ca=&quot;&quot;&gt;/&lt;/span&gt; Part-time, Full-time &lt;span class=&quot;Divider&quot; data-v-3c87c7ca=&quot;&quot;&gt;/&lt;/span&gt; On Campus&lt;/span&gt;&#39;
soup = BeautifulSoup(html, &quot;lxml&quot;)
# body = soup.find(&#39;body&#39;)
body = soup.find_all()[0]
print(body.text.split(&quot;/&quot;)[1:])

We will get a list that you can process as you need:

[&#39; Part-time, Full-time &#39;, &#39; On Campus&#39;]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python提取位于另一个标签内的标签后面的文本。

问题

答案1

答案2

如何在Python中从装饰器本身中调用函数的装饰器。

GitHub仓库中的Python项目包文件夹应包括什么以管理依赖关系？

无法从 ‘strawberry.fastapi’ 导入 ‘GraphQL’。

如何在对数据框进行分组后设置断裂条顺序

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论