使用BeautifulSoup在div后提取文本 – .next_sibling不起作用

huangapple go评论82阅读模式
英文:

Scape text after div using BeautifulSoup - .next_sibling doesn't work

问题

我正在尝试提取以下div之间的文本:

使用BeautifulSoup在div后提取文本 – .next_sibling不起作用

我尝试使用.next_sibling,就像在这篇帖子中提到的一样:https://stackoverflow.com/questions/38754940/get-text-after-specific-tag-with-beautiful-soup

但它没有起作用。

我的当前代码:

for pageNumber in range(1565, 1566):
    address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
    page = requests.get(address)
    soup = BeautifulSoup(page.content, 'html.parser')
    containers = soup.findAll("div", {"class": "question"})
    for container in containers:
        h2 = container.find("div", {"class": "info"}).find("h2")
        content = container.find("div", {"class": "info"}).find("div", {"class": "clear:both"})
        desc = content.next_sibling
        print(desc)

您能帮助我指导如何使用BeautifulSoup4访问div之间的文本吗?

英文:

I am trying to scrape the text between divs here:

使用BeautifulSoup在div后提取文本 – .next_sibling不起作用

I tried to use .next_sibling like mentioned in this post: https://stackoverflow.com/questions/38754940/get-text-after-specific-tag-with-beautiful-soup

But it didn't work.

My current code:

for pageNumber in range(1565, 1566):
    address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
    page = requests.get(address)
    soup = BeautifulSoup(page.content, 'html.parser')
    containers = soup.findAll("div", {"class": "question"})
    for container in containers:
        h2 = container.find("div", {"class": "info"}).find("h2")
        content = container.find("div", {"class": "info"}).find("div", {"class": "clear:both"})
        desc = content.next_sibling
        print(desc)

Could you help me in guding how to access the text between divs using BeautifulSoup4?

答案1

得分: 1

"class"属性在你搜索的第二个div中不存在。该属性是"style"。

你需要提供一个额外的检查来验证是否"元素存在",然后找到"next_sibling"。

现在尝试一下。

for pageNumber in range(1565, 1566):
    address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
    page = requests.get(address)
    soup = BeautifulSoup(page.content, 'html.parser')
    containers = soup.findAll("div", {"class": "question"})
    for container in containers:
        h2 = container.find("div", {"class": "info"}).find("h2")
        content = container.find("div", {"class": "info"}).find("div", {"style": "clear:both"})
        if content:
           desc = content.next_sibling
           print(desc)

以下是简单的CSS选择器选项。

for pageNumber in range(1565, 1566):
    address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
    page = requests.get(address)
    soup = BeautifulSoup(page.content, 'html.parser')
    containers = soup.findAll("div", {"class": "question"})
    for container in containers:
        h2 = container.find("div", {"class": "info"}).find("h2")
        content = container.select_one("div[style='clear:both']")
        if content:
           desc = content.next_sibling
           print(desc)
英文:

The class attribute is not there the second div you are searching.The attribute is style

You need to provide one more check to verify if element is present then find the next_sibling.

Try Now.

for pageNumber in range(1565, 1566):
    address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
    page = requests.get(address)
    soup = BeautifulSoup(page.content, 'html.parser')
    containers = soup.findAll("div", {"class": "question"})
    for container in containers:
        h2 = container.find("div", {"class": "info"}).find("h2")
        content = container.find("div", {"class": "info"}).find("div", {"style": "clear:both"})
        if content:
           desc = content.next_sibling
           print(desc)

Here you go with simple css selector options.

for pageNumber in range(1565, 1566):
    address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
    page = requests.get(address)
    soup = BeautifulSoup(page.content, 'html.parser')
    containers = soup.findAll("div", {"class": "question"})
    for container in containers:
        h2 = container.find("div", {"class": "info"}).find("h2")
        content = container.select_one("div[style='clear:both']")
        if content:
           desc = content.next_sibling
           print(desc)

答案2

得分: 0

好的,这是翻译后的代码部分:

for pageNumber in range(1565, 1566):
    address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
    page = requests.get(address)
    soup = BeautifulSoup(page.content, 'html.parser')
    containers = soup.findAll("div", {"class": "question"})
    for container in containers:
        h2 = container.find("div", {"class": "info"}).find("h2")
        info = container.find("div", {"class": "info"})
        print(info(text=True, recursive=False))
英文:

Okay, I found another solution:

for pageNumber in range(1565, 1566):
    address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
    page = requests.get(address)
    soup = BeautifulSoup(page.content, 'html.parser')
    containers = soup.findAll("div", {"class": "question"})
    for container in containers:
        h2 = container.find("div", {"class": "info"}).find("h2")
        info = container.find("div", {"class": "info"})
        print(info(text=True, recursive=False))

huangapple
  • 本文由 发表于 2020年1月3日 20:20:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/59578541.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定