英文:
Extract info using pagination- selenium bs4 python
问题
我正在使用网络爬虫的销售导航器。我已经能够导航到第一页,滚动8次并使用selenium和beautiful提取所有的姓名和职位。以下是代码。
driver.get(dm)
time.sleep(5)
time.sleep(5)
section = driver.find_element(By.XPATH, "//[@id='search-results-container']")
time.sleep(5)
counter = 0
while counter < 8: # 这将滚动8次
driver.execute_script('arguments[0].scrollTop = arguments[0].scrollTop + arguments[0].offsetHeight;',
section)
counter += 1
# 在滚动部分后添加一个计时器,以确保数据完全加载
time.sleep(7) # 你可能需要安装time库来使用这个语句
src2 = driver.page_source
# 现在使用beautiful soup
soup = BeautifulSoup(src2, 'lxml')
name_soup = soup.find_all('span', {'data-anonymize': 'person-name'})
names = []
for name in name_soup:
names.append(name.text.strip())
然而,还有8个页面,我需要提取所有的姓名并将其添加到names列表中。
请帮忙。
英文:
I am working with web scraping sales navigator. I was able to navigate to 1st page, scroll 8 times and extract all the names, titles using selenium and beautiful. below is the code.
driver.get(dm)
time.sleep(5)
time.sleep(5)
section = driver.find_element(By.XPATH, "//*[@id='search-results-container']")
time.sleep(5)
counter = 0
while counter < 8: # this will scroll 8 times
driver.execute_script('arguments[0].scrollTop = arguments[0].scrollTop + arguments[0].offsetHeight;',
section)
counter += 1
# add a timer for the data to fully load once you have scrolled the section
time.sleep(7) # You might need to install time library to use this statement
src2 = driver.page_source
# Now using beautiful soup
soup = BeautifulSoup(src2, 'lxml')
name_soup = soup.find_all('span', {'data-anonymize': 'person-name'})
names = []
for name in name_soup:
names.append(name.text.strip())
However, there are 8 more pages and I need to extract all the names and append it to names list.
Please help
答案1
得分: 0
通常,我在分页时使用的逻辑是:
while True:
## 页面抓取代码 [即,您当前的代码]
## 搜索下一页 [按钮/链接]
### 如果有下一页 --> 点击按钮或跳转链接
### 没有下一页 --> 退出循环
如果您提供了您尝试抓取的链接,我可能能够给您一个更具体的答案。例如,这个 是我经常用来抓取分页数据的函数,尽管它不适用于可滚动的页面...。
英文:
Generally, the logic I use for pagination is
while True:
## PAGE SCRAPING CODE [ie, your current code]
## SEARCH FOR NEXT PAGE
### IF NEXT PAGE --> click button or go to link
### NO NEXT PAGE --> BREAK
If you included the link you're trying to scrape, I might be able to give you a more specific answer. For example, this is a function I often use to scrape paginated data, although it's not meant to be for scrollable pages....
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论