英文:
BeautifulSoup previous_sibling not working
问题
以下是已经翻译好的部分:
Some items have title but some don't, sample html like this:
<div id="content">
<h5>Title1</h5>
<div class="text">text 1</div>
<h5>Title2</h5>
<div class="text">text 2</div>
<div class="text">text 3</div>
<div class="text">text 4</div>
</div>
Tried to get all the class `text`, and get their titles `h5`(if any).
`find_previous_sibling` can get the title, but the last two `text` also list the title which is not owned by them.
and also tried `previous_sibling`, then judge whether it is `h5` or `div`, `h5` as title, but it returns nothing.
html = BeautifulSoup(response.text, 'lxml')
content = html.find('div', {'id': 'content'})
paras = content.find_all('div', {'class': 'text'})
for para in paras:
title = p.find_previous_sibling('h5')
if title:
print(title.get_text())
pr = para.previous_sibling
if pr:
print(pr)
英文:
Some items have title but some don't, sample html like this:
<div id="content">
<h5>Title1</h5>
<div class="text">text 1</div>
<h5>Title2</h5>
<div class="text">text 2</div>
<div class="text">text 3</div>
<div class="text">text 4</div>
</div>
Tried to get all the class text
, and get their titles h5
(if any).
find_previous_sibling
can get the title, but the last two text
also list the title which is not owned by them.
and also tried previous_sibling
, then judge whether it is h5
or div
, h5
as title, but it returns nothing.
html = BeautifulSoup(response.text,'lxml')
content = html.find('div',{'id': 'content'})
paras = content.find_all('div', {'class': 'text'})
for para in paras:
title = p.find_previous_sibling('h5')
if title:
print(title.get_text())
pr = para.previous_sibling
if pr:
print(pr)
答案1
得分: 1
你可以在不带任何参数的情况下使用 `find_previous()` 来获取 `div` 元素之前的 DOM 元素,然后使用 `.name` 来检查它是否是 `<h5>` 元素:
```python3
from bs4 import BeautifulSoup
html = """
<div id="content">
<h5>Title1</h5>
<div class="text">text 1</div>
<h5>Title2</h5>
<div class="text">text 2</div>
<div class="text">text 3</div>
<div class="text">text 4</div>
</div>
"""
html = BeautifulSoup(html,'html.parser')
content = html.find('div',{'id': 'content'})
paras = content.find_all('div', {'class': 'text'})
for para in paras:
print(para.text)
prev = para.find_previous()
if prev and prev.name == 'h5':
print(prev.text)
结果输出:
text 1
Title1
text 2
Title2
text 3
text 4
<details>
<summary>英文:</summary>
You could use `find_previous()` without any params to get the DOM element before the `div`, then use `.name` to check if it's a `<h5>`:
```python3
from bs4 import BeautifulSoup
html = """
<div id="content">
<h5>Title1</h5>
<div class="text">text 1</div>
<h5>Title2</h5>
<div class="text">text 2</div>
<div class="text">text 3</div>
<div class="text">text 4</div>
</div>
"""
html = BeautifulSoup(html,'html.parser')
content = html.find('div',{'id': 'content'})
paras = content.find_all('div', {'class': 'text'})
for para in paras:
print(para.text)
prev = para.find_previous()
if prev and prev.name == 'h5':
print(prev.text)
Gives:
text 1
Title1
text 2
Title2
text 3
text 4
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论