英文:
How to find HTML elements by multiple tags with selenium
问题
我需要使用selenium从一个网页中抓取数据。我需要找到以下元素:
```html
<div class="content-left">
<ul></ul>
<ul></ul>
<p></p>
<ul></ul>
<p></p>
<ul></ul>
<p></p>
<ul>
<li></li>
<li></li>
</ul>
<p></p>
</div>
如你所见,<p>
和 <ul>
标签没有类,我不知道如何按顺序获取它们。
之前我使用过Beautifulsoup:
allP = bs.find('div', attrs={"class":"content-left"})
txt = ""
for p in allP.find_all(['p', 'li']):
但现在不再起作用(通过requests得到403错误)。我需要用selenium找到这些元素。
HTML:
<details>
<summary>英文:</summary>
I need to scrape data from a webpage with selenium. I need to find these elements:
<div class="content-left">
<ul></ul>
<ul></ul>
<p></p>
<ul></ul>
<p></p>
<ul></ul>
<p></p>
<ul>
<li></li>
<li></li>
</ul>
<p></p>
</div>
As you can see `<p>` and `<ul>` tags has no classes and I don't know how to get them in order.
I used Beautifulsoup before:
allP = bs.find('div', attrs={"class":"content-left"})
txt = ""
for p in allP.find_all(['p', 'li']):
But It's not working anymore (got 403 error by requests). And I need to find these elements with selenium.
HTML:
![This image](https://i.stack.imgur.com/lqfYm.png)
</details>
# 答案1
**得分**: 0
从`<p>`和`<li>`标记中提取文本,你可以使用[**Beautiful Soup**](https://stackoverflow.com/a/47871704/7429447)如下所示:
```python
from bs4 import BeautifulSoup
html_text = '''<div class="content-left">
<ul>1</ul>
<ul>2</ul>
<p>3</p>
<ul>4</ul>
<p>5</p>
<ul>6</ul>
<p>7</p>
<ul>
<li>8</li>
<li>9</li>
</ul>
<p>10</p>
</div>
'''
soup = BeautifulSoup(html_text, 'html.parser')
parent_element = soup.find("div", {"class": "content-left"})
for element in parent_element.find_all(['p', 'li']):
print(element.text)
控制台输出:
3
5
7
8
9
10
使用_Selenium_
使用Selenium,你可以使用list comprehension如下所示:
- 使用_CSS_SELECTOR_:
print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "div.content-left p, div.content-left li")])
英文:
To extract the texts from <p>
and <li>
tags only you can use Beautiful Soup as follows:
from bs4 import BeautifulSoup
html_text = '''
<div class="content-left">
<ul>1</ul>
<ul>2</ul>
<p>3</p>
<ul>4</ul>
<p>5</p>
<ul>6</ul>
<p>7</p>
<ul>
<li>8</li>
<li>9</li>
</ul>
<p>10</p>
</div>
'''
soup = BeautifulSoup(html_text, 'html.parser')
parent_element = soup.find("div", {"class": "content-left"})
for element in parent_element.find_all(['p', 'li']):
print(element.text)
Console output:
3
5
7
8
9
10
Using Selenium
Using Selenium you can use list comprehension as follows:
-
Using CSS_SELECTOR:
print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "div.content-left p, div.content-left li")])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论