问题

以下是已经翻译好的部分：

Some items have title but some don't, sample html like this:

<div id="content">
    <h5>Title1</h5>
    <div class="text">text 1</div>

    <h5>Title2</h5>
    <div class="text">text 2</div>

    <div class="text">text 3</div>

    <div class="text">text 4</div>
</div>

Tried to get all the class `text`, and get their titles `h5`(if any).

`find_previous_sibling` can get the title, but the last two `text` also list the title which is not owned by them.

and also tried `previous_sibling`, then judge whether it is `h5` or `div`, `h5` as title, but it returns nothing.

html = BeautifulSoup(response.text, 'lxml')
content = html.find('div', {'id': 'content'})
paras = content.find_all('div', {'class': 'text'})

for para in paras:
    title = p.find_previous_sibling('h5')
    if title:
        print(title.get_text())

    pr = para.previous_sibling
    if pr:
        print(pr)

英文:

Some items have title but some don't, sample html like this:

&lt;div id=&quot;content&quot;&gt;
    &lt;h5&gt;Title1&lt;/h5&gt;
    &lt;div class=&quot;text&quot;&gt;text 1&lt;/div&gt;

    &lt;h5&gt;Title2&lt;/h5&gt;
    &lt;div class=&quot;text&quot;&gt;text 2&lt;/div&gt;

    &lt;div class=&quot;text&quot;&gt;text 3&lt;/div&gt;

    &lt;div class=&quot;text&quot;&gt;text 4&lt;/div&gt;
&lt;/div&gt;

Tried to get all the class text, and get their titles h5(if any).

find_previous_sibling can get the title, but the last two text also list the title which is not owned by them.

and also tried previous_sibling, then judge whether it is h5 or div, h5 as title, but it returns nothing.

html = BeautifulSoup(response.text,&#39;lxml&#39;)
content = html.find(&#39;div&#39;,{&#39;id&#39;: &#39;content&#39;})
paras = content.find_all(&#39;div&#39;, {&#39;class&#39;: &#39;text&#39;})

for para in paras:
	title = p.find_previous_sibling(&#39;h5&#39;)
	if title:
		print(title.get_text())

	pr = para.previous_sibling
	if pr:
		print(pr)

答案1

得分: 1

你可以在不带任何参数的情况下使用 `find_previous()` 来获取 `div` 元素之前的 DOM 元素，然后使用 `.name` 来检查它是否是 `<h5>` 元素：

```python3
from bs4 import BeautifulSoup

html = &quot;&quot;&quot;
&lt;div id=&quot;content&quot;&gt;
    &lt;h5&gt;Title1&lt;/h5&gt;
    &lt;div class=&quot;text&quot;&gt;text 1&lt;/div&gt;

    &lt;h5&gt;Title2&lt;/h5&gt;
    &lt;div class=&quot;text&quot;&gt;text 2&lt;/div&gt;

    &lt;div class=&quot;text&quot;&gt;text 3&lt;/div&gt;
    &lt;div class=&quot;text&quot;&gt;text 4&lt;/div&gt;
&lt;/div&gt;
&quot;&quot;&quot;

html = BeautifulSoup(html,&#39;html.parser&#39;)
content = html.find(&#39;div&#39;,{&#39;id&#39;: &#39;content&#39;})
paras = content.find_all(&#39;div&#39;, {&#39;class&#39;: &#39;text&#39;})

for para in paras:
    print(para.text)
    prev = para.find_previous()
    if prev and prev.name == &#39;h5&#39;:
        print(prev.text)

结果输出:

text 1
Title1
text 2
Title2
text 3
text 4


<details>
<summary>英文:</summary>

You could use `find_previous()` without any params to get the DOM element before the `div`, then use `.name` to check if it&#39;s a `&lt;h5&gt;`:

```python3
from bs4 import BeautifulSoup

html = &quot;&quot;&quot;
&lt;div id=&quot;content&quot;&gt;
    &lt;h5&gt;Title1&lt;/h5&gt;
    &lt;div class=&quot;text&quot;&gt;text 1&lt;/div&gt;

    &lt;h5&gt;Title2&lt;/h5&gt;
    &lt;div class=&quot;text&quot;&gt;text 2&lt;/div&gt;

    &lt;div class=&quot;text&quot;&gt;text 3&lt;/div&gt;
    &lt;div class=&quot;text&quot;&gt;text 4&lt;/div&gt;
&lt;/div&gt;
&quot;&quot;&quot;

html = BeautifulSoup(html,&#39;html.parser&#39;)
content = html.find(&#39;div&#39;,{&#39;id&#39;: &#39;content&#39;})
paras = content.find_all(&#39;div&#39;, {&#39;class&#39;: &#39;text&#39;})

for para in paras:
    print(para.text)
    prev = para.find_previous()
    if prev and prev.name == &#39;h5&#39;:
        print(prev.text)

Gives:

text 1
Title1
text 2
Title2
text 3
text 4

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

BeautifulSoup的previous_sibling不起作用。

问题

答案1

如何开始编写用于计算工资的表单的代码？

为什么在PyCharm中进行网页抓取时，我不断收到’None’作为响应？

Python，numba，具有自身类型字段的类

Datatables的Excel导出stripHtml不起作用。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论