找到全部未找到的类。

huangapple go评论61阅读模式
英文:

Find_all not not finding all clases

问题

我编写了这段代码来查找所有公司链接,但它只找到前两个,然后停止了。有什么想法为什么以及如何更改它?

import requests
from bs4 import BeautifulSoup

url = "https://www.gelbeseiten.de/branchen/rechtsanwalt/mannheim"
req = requests.get(url)
src = req.text
soup = BeautifulSoup(src, "lxml")
all_firmas = soup.find_all("article", class_="mod mod-Treffer")
for i in all_firmas:
i_2 = i.next_element.next_element
print(i_2.get("href"))
print("Category done!")



<details>
<summary>英文:</summary>

I wrote this code to find all firms links, but it finds only first two, then it stops. Any idea why and how can I change it?

import requests
from bs4 import BeautifulSoup

url = "https://www.gelbeseiten.de/branchen/rechtsanwalt/mannheim"
req = requests.get(url)
src = req.text
soup = BeautifulSoup(src, "lxml")
all_firmas = soup.find_all("article", class_="mod mod-Treffer")
for i in all_firmas:
i_2 = i.next_element.next_element
print(i_2.get("href"))
print("Category done!")


</details>


# 答案1
**得分**: 0

以下是已翻译的内容:

根据您的链接,只有两篇文章具有类别“mod mod-Treffer”。其他文章具有类别“mod mod-Treffer mod-Treffer--kurz”。

以下代码还可以使用正则表达式 (`import re`) 获取其他文章。

```python
all_firmas = soup.find_all("article", class_=re.compile("mod mod-Treffer.+"))
```

<details>
<summary>英文:</summary>

Following your link, only two articles have the class &quot;mod mod-Treffer&quot;. The other articles have the class &quot;mod mod-Treffer mod-Treffer--kurz&quot;

The following code also get the other articles using regex (`import re`).

```python
all_firmas = soup.find_all(&quot;article&quot;, class_=re.compile(&quot;mod mod-Treffer.+&quot;))
```

</details>



# 答案2
**得分**: 0

Using one class works, since all the articles have the mod-Treffer and mod is also applied to other elements you can just find with mod-Treffer like this

all_firmas = soup.find_all("article", class_="mod-Treffer")

To be more specific you can go with

all_firmas = soup.find("div", id="gs_treffer").find_all("article", class_="mod-Treffer")

<details>
<summary>英文:</summary>

Using one class works, since all the articles have the mod-Treffer and mod is also applied to other elements you can just find with mod-Treffer like this

    all_firmas = soup.find_all(&quot;article&quot;, class_=&quot;mod-Treffer&quot;)

To be more specific you can go with 

    all_firmas = soup.find(&quot;div&quot;, id=&quot;gs_treffer&quot;).find_all(&quot;article&quot;, class_=&quot;mod-Treffer&quot;)


</details>



# 答案3
**得分**: 0

你可以只使用CSS选择器与 [`select`](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class)。 这类似于 `find_all`。
```
all_firmas = soup.select("article.mod.mod-Treffer")
for i in all_firmas:
    print(i.a["href"])
```

<details>
<summary>英文:</summary>

You can use just use [`select`](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class) with CSS Selector. It is similar to `find_all`.
```
all_firmas = soup.select(&quot;article.mod.mod-Treffer&quot;)
for i in all_firmas:
    print(i.a[&quot;href&quot;])
```

</details>



huangapple
  • 本文由 发表于 2023年7月6日 21:19:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76629286.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定