无法在选择下拉框数值后,网址未更改时进行网页抓取。

huangapple go评论83阅读模式
英文:

Not able to scrape the website when URL not changing after selecting the dropdown values

问题

我有一个网站,我们在那里有捐赠者的名字以及捐赠范围。链接

这个网站有多个数据页面,我正在尝试抓取数据。

查看页面上的数据部分

无法在选择下拉框数值后,网址未更改时进行网页抓取。

我错在哪里?

import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
import selenium
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time`
 
driver = webdriver.Chrome("chromedriver.exe")
driver.get("https://www.clintonfoundation.org/about-the-clinton-foundation#reports-financials/")

drpdown = driver.find_element_by_id("filtered-list-taxonomy")
drpdown.click()
time.sleep(5)

element =driver.find_element_by_link_text("ALL")
element.click()

current_url= driver.current_url

response = requests.get(current_url)
soup = BeautifulSoup(response.content, 'html.parser')
article = donor_list.find_all('article', class_='col-12 items-list-block-item up-fade-40')
英文:

I have one website where we have the donor's name along with the donation range. link

this website has multiple pages of data that I am trying to scrape.

look at the data section on page

无法在选择下拉框数值后,网址未更改时进行网页抓取。

where am I going wrong?

import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
import selenium
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time`
 
driver = webdriver.Chrome("chromedriver.exe")
driver.get("https://www.clintonfoundation.org/about-the-clinton-foundation#reports-financials/")


drpdown = driver.find_element_by_id("filtered-list-taxonomy")
drpdown.click()
time.sleep(5)

element =driver.find_element_by_link_text("ALL")
element.click()

current_url= driver.current_url

response = requests.get(current_url)
soup = BeautifulSoup(response.content, 'html.parser')
article = donor_list.find_all('article', class_ ='col-12 items-list-block-item up-fade-40')

答案1

得分: 0

捐赠者列表来自可以查询和获取数据的POST请求。

这应该指导你朝正确的方向:

import requests
from bs4 import BeautifulSoup


endpoint = "https://www.clintonfoundation.org/wp/wp-admin/admin-ajax.php"

payload = {
    "action": "getFilteredListBlockContent",
    "nonce": "cd463523f4",
    "post_type": "supporter",
    "posts_per_page": 100,
    "taxonomy": "supporter_category",
    "paged": 1,
    "keyword": "",
    "term": "",
    "isCGI": "false",
}

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.68",
    "X-Requested-With": "XMLHttpRequest",
}

response = requests.post(endpoint, data=payload, headers=headers).json()["data"]
donors = [
    donor.getText(strip=True) for donor
    in BeautifulSoup(response, "html.parser").find_all("h3")
]

numbered_donors = [[number, donor] for number, donor in enumerate(donors, 1)]
print("\n".join([f"{number}. {donor}" for number, donor in numbered_donors]))
英文:

The list of donors comes from a POST request that you can query and get the data.

This should push you in the right direction:

import requests
from bs4 import BeautifulSoup


endpoint = "https://www.clintonfoundation.org/wp/wp-admin/admin-ajax.php"

payload = {
    "action": "getFilteredListBlockContent",
    "nonce": "cd463523f4",
    "post_type": "supporter",
    "posts_per_page": 100,
    "taxonomy": "supporter_category",
    "paged": 1,
    "keyword": "",
    "term": "",
    "isCGI": "false",
}

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.68",
    "X-Requested-With": "XMLHttpRequest",
}

response = requests.post(endpoint, data=payload, headers=headers).json()["data"]
donors = [
    donor.getText(strip=True) for donor
    in BeautifulSoup(response, "html.parser").find_all("h3")
]

numbered_donors = [[number, donor] for number, donor in enumerate(donors, 1)]
print("\n".join([f"{number}. {donor}" for number, donor in numbered_donors]))

This should print first 100 donors:

1. Bill & Melinda Gates Foundation
2. Elevate Social Businesses [Clinton Giustra Enterprise Partnership (Canada)]
3. Fred Eychaner and Alphawood Foundation
4. Frank Giustra, The Radcliffe Foundation
5. Postcode Lottery Group [Nationale Postcode Loterij] *
6. Cheryl and Haim Saban & The Saban Family Foundation *
7. The Children's Investment Fund Foundation
8. UNITAID
9. AUSAID **
10. Stephen L. Bing
11. Commonwealth of Australia **
12. COPRESIDA
13. Tom Golisano ^
14. J.B. and M.K. Pritzker Family Foundation
15. Kingdom of Norway [Government of Norway] **
16. Kingdom of Saudi Arabia
17. Norwegian Agency for Development Cooperation (NORAD) **
18. Denis J. O'Brien and Digicel
19. Susie Tompkins Buell Fund of the Marin Community Foundation
20. Swedish Postcode Lottery [The Swedish Postcode Lottery]
21. The Elma Foundation
22. The Hunter Foundation *
23. The Rockefeller Foundation
24. The Victor Pinchuk Foundation
25. The Wasserman Foundation *
26. Tracfone Wireless, Inc. *
27. Theodore W. Waitt
28. S. Daniel Abraham
29. Sheikh Mohammed H. Al-Amoudi
30. C40 Cities Climate Leadership Group, Inc.
31. Elton John Aids Foundation
32. Fidelity Charitable Gift Fund
33. Government of the Netherlands **
34. Irish Aid **
35. Jonathan and Jeannie Lavine, Trustees of the Crimson Lion Foundation *
36. John D. Mackay
37. OCP Corporation
38. Michael Schumacher
39. Bernard L. Schwartz *
40. State of Kuwait
41. The Clinton Family Foundation *
42. The Coca-Cola Company *
43. The Sherwood Foundation *
44. The Walton Family Foundation and the Alice L. Walton Foundation [Walton Family Foundation, Inc.] *
45. 100 Women in Hedgefunds
46. Absolute Return for Kids (ARK)
47. Jay Alix
48. Alliance for a Green Revolution in Africa (AGRA)
49. Nasser Al-Rashid
50. Altman/Kazickas Foundation
51. American Federation of Teachers
52. Angelopoulos Foundation ^
53. Gianna Angelopoulos *
54. Anheuser-Busch Foundation
55. Smith and Elizabeth Bagley *
56. Banc of California ^
57. Barclays Capital ^
58. Barclays plc
59. Laurie and Bill Benenson
60. Mary Bing and Doug Ellis
61. Bloomberg Philanthropies
62. Blue Cross and Blue Shield of North Carolina ^
63. Richard Blum and Blum Family Foundation
64. BMU - Federal Ministry for the Environment **
65. Booz Allen Hamilton ^
66. Bill Brandt, Patrice Bugelas-Brandt, and Development Specialists, Inc. [Development Specialists, Inc.]
67. Carlos Bremer *
68. Richard Caring
69. Centene Charitable Foundation *
70. Gilbert R. Chagoury
71. Cheniere Energy, Inc.
72. Christy and John Mack Foundation
73. Cisco ^
74. Gustavo Cisneros & Venevision
75. Citi Foundation ^
76. Clinton-Bush Haiti Fund
77. Stephen J. Cloobeck
78. Roy E. Cockrum
79. Victor P. Dahdaleh & The Victor Phillip Dahdaleh Charitable Foundation
80. Delos Living ^
81. Desert Classic Charities Inc
82. Robert Disbrow
83. Dubai Foundation
84. Duke Energy Corporation ^
85. EKTA Foundation *
86. Entergy *
87. Exxonmobil ^
88. Issam M. Fares
89. Raj Fernando
90. Ferraro Family Foundation
91. Fisher Brothers Foundation, Inc.
92. Joseph T. Ford
93. Wallace W. Fowler
94. Friends Of Saudi Arabia
95. Fundacion Telmex
96. Mala Gaonkar Haarman
97. GEMS Education
98. General Electric
99. Aileen Getty and the Aileen Getty Foundation *
100. Ariadne Getty *

huangapple
  • 本文由 发表于 2023年5月6日 22:05:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76189322.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定