英文:
Scape text after div using BeautifulSoup - .next_sibling doesn't work
问题
我正在尝试提取以下div之间的文本:
我尝试使用.next_sibling
,就像在这篇帖子中提到的一样:https://stackoverflow.com/questions/38754940/get-text-after-specific-tag-with-beautiful-soup
但它没有起作用。
我的当前代码:
for pageNumber in range(1565, 1566):
address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
page = requests.get(address)
soup = BeautifulSoup(page.content, 'html.parser')
containers = soup.findAll("div", {"class": "question"})
for container in containers:
h2 = container.find("div", {"class": "info"}).find("h2")
content = container.find("div", {"class": "info"}).find("div", {"class": "clear:both"})
desc = content.next_sibling
print(desc)
您能帮助我指导如何使用BeautifulSoup4访问div之间的文本吗?
英文:
I am trying to scrape the text between divs here:
I tried to use .next_sibling
like mentioned in this post: https://stackoverflow.com/questions/38754940/get-text-after-specific-tag-with-beautiful-soup
But it didn't work.
My current code:
for pageNumber in range(1565, 1566):
address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
page = requests.get(address)
soup = BeautifulSoup(page.content, 'html.parser')
containers = soup.findAll("div", {"class": "question"})
for container in containers:
h2 = container.find("div", {"class": "info"}).find("h2")
content = container.find("div", {"class": "info"}).find("div", {"class": "clear:both"})
desc = content.next_sibling
print(desc)
Could you help me in guding how to access the text between divs using BeautifulSoup4?
答案1
得分: 1
"class"属性在你搜索的第二个div中不存在。该属性是"style"。
你需要提供一个额外的检查来验证是否"元素存在",然后找到"next_sibling"。
现在尝试一下。
for pageNumber in range(1565, 1566):
address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
page = requests.get(address)
soup = BeautifulSoup(page.content, 'html.parser')
containers = soup.findAll("div", {"class": "question"})
for container in containers:
h2 = container.find("div", {"class": "info"}).find("h2")
content = container.find("div", {"class": "info"}).find("div", {"style": "clear:both"})
if content:
desc = content.next_sibling
print(desc)
以下是简单的CSS选择器选项。
for pageNumber in range(1565, 1566):
address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
page = requests.get(address)
soup = BeautifulSoup(page.content, 'html.parser')
containers = soup.findAll("div", {"class": "question"})
for container in containers:
h2 = container.find("div", {"class": "info"}).find("h2")
content = container.select_one("div[style='clear:both']")
if content:
desc = content.next_sibling
print(desc)
英文:
The class
attribute is not there the second div you are searching.The attribute is style
You need to provide one more check to verify if element is present
then find the next_sibling
.
Try Now.
for pageNumber in range(1565, 1566):
address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
page = requests.get(address)
soup = BeautifulSoup(page.content, 'html.parser')
containers = soup.findAll("div", {"class": "question"})
for container in containers:
h2 = container.find("div", {"class": "info"}).find("h2")
content = container.find("div", {"class": "info"}).find("div", {"style": "clear:both"})
if content:
desc = content.next_sibling
print(desc)
Here you go with simple css selector options.
for pageNumber in range(1565, 1566):
address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
page = requests.get(address)
soup = BeautifulSoup(page.content, 'html.parser')
containers = soup.findAll("div", {"class": "question"})
for container in containers:
h2 = container.find("div", {"class": "info"}).find("h2")
content = container.select_one("div[style='clear:both']")
if content:
desc = content.next_sibling
print(desc)
答案2
得分: 0
好的,这是翻译后的代码部分:
for pageNumber in range(1565, 1566):
address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
page = requests.get(address)
soup = BeautifulSoup(page.content, 'html.parser')
containers = soup.findAll("div", {"class": "question"})
for container in containers:
h2 = container.find("div", {"class": "info"}).find("h2")
info = container.find("div", {"class": "info"})
print(info(text=True, recursive=False))
英文:
Okay, I found another solution:
for pageNumber in range(1565, 1566):
address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
page = requests.get(address)
soup = BeautifulSoup(page.content, 'html.parser')
containers = soup.findAll("div", {"class": "question"})
for container in containers:
h2 = container.find("div", {"class": "info"}).find("h2")
info = container.find("div", {"class": "info"})
print(info(text=True, recursive=False))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论