无法在选择下拉框数值后,网址未更改时进行网页抓取。

huangapple go评论115阅读模式
英文:

Not able to scrape the website when URL not changing after selecting the dropdown values

问题

我有一个网站,我们在那里有捐赠者的名字以及捐赠范围。链接

这个网站有多个数据页面,我正在尝试抓取数据。

查看页面上的数据部分

无法在选择下拉框数值后,网址未更改时进行网页抓取。

我错在哪里?

  1. import requests
  2. from bs4 import BeautifulSoup
  3. import re
  4. import pandas as pd
  5. import selenium
  6. from selenium import webdriver
  7. from selenium.webdriver.support.ui import Select
  8. import time`
  9. driver = webdriver.Chrome("chromedriver.exe")
  10. driver.get("https://www.clintonfoundation.org/about-the-clinton-foundation#reports-financials/")
  11. drpdown = driver.find_element_by_id("filtered-list-taxonomy")
  12. drpdown.click()
  13. time.sleep(5)
  14. element =driver.find_element_by_link_text("ALL")
  15. element.click()
  16. current_url= driver.current_url
  17. response = requests.get(current_url)
  18. soup = BeautifulSoup(response.content, 'html.parser')
  19. article = donor_list.find_all('article', class_='col-12 items-list-block-item up-fade-40')
英文:

I have one website where we have the donor's name along with the donation range. link

this website has multiple pages of data that I am trying to scrape.

look at the data section on page

无法在选择下拉框数值后,网址未更改时进行网页抓取。

where am I going wrong?

  1. import requests
  2. from bs4 import BeautifulSoup
  3. import re
  4. import pandas as pd
  5. import selenium
  6. from selenium import webdriver
  7. from selenium.webdriver.support.ui import Select
  8. import time`
  9. driver = webdriver.Chrome("chromedriver.exe")
  10. driver.get("https://www.clintonfoundation.org/about-the-clinton-foundation#reports-financials/")
  11. drpdown = driver.find_element_by_id("filtered-list-taxonomy")
  12. drpdown.click()
  13. time.sleep(5)
  14. element =driver.find_element_by_link_text("ALL")
  15. element.click()
  16. current_url= driver.current_url
  17. response = requests.get(current_url)
  18. soup = BeautifulSoup(response.content, 'html.parser')
  19. article = donor_list.find_all('article', class_ ='col-12 items-list-block-item up-fade-40')

答案1

得分: 0

捐赠者列表来自可以查询和获取数据的POST请求。

这应该指导你朝正确的方向:

  1. import requests
  2. from bs4 import BeautifulSoup
  3. endpoint = "https://www.clintonfoundation.org/wp/wp-admin/admin-ajax.php"
  4. payload = {
  5. "action": "getFilteredListBlockContent",
  6. "nonce": "cd463523f4",
  7. "post_type": "supporter",
  8. "posts_per_page": 100,
  9. "taxonomy": "supporter_category",
  10. "paged": 1,
  11. "keyword": "",
  12. "term": "",
  13. "isCGI": "false",
  14. }
  15. headers = {
  16. "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.68",
  17. "X-Requested-With": "XMLHttpRequest",
  18. }
  19. response = requests.post(endpoint, data=payload, headers=headers).json()["data"]
  20. donors = [
  21. donor.getText(strip=True) for donor
  22. in BeautifulSoup(response, "html.parser").find_all("h3")
  23. ]
  24. numbered_donors = [[number, donor] for number, donor in enumerate(donors, 1)]
  25. print("\n".join([f"{number}. {donor}" for number, donor in numbered_donors]))
英文:

The list of donors comes from a POST request that you can query and get the data.

This should push you in the right direction:

  1. import requests
  2. from bs4 import BeautifulSoup
  3. endpoint = "https://www.clintonfoundation.org/wp/wp-admin/admin-ajax.php"
  4. payload = {
  5. "action": "getFilteredListBlockContent",
  6. "nonce": "cd463523f4",
  7. "post_type": "supporter",
  8. "posts_per_page": 100,
  9. "taxonomy": "supporter_category",
  10. "paged": 1,
  11. "keyword": "",
  12. "term": "",
  13. "isCGI": "false",
  14. }
  15. headers = {
  16. "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.68",
  17. "X-Requested-With": "XMLHttpRequest",
  18. }
  19. response = requests.post(endpoint, data=payload, headers=headers).json()["data"]
  20. donors = [
  21. donor.getText(strip=True) for donor
  22. in BeautifulSoup(response, "html.parser").find_all("h3")
  23. ]
  24. numbered_donors = [[number, donor] for number, donor in enumerate(donors, 1)]
  25. print("\n".join([f"{number}. {donor}" for number, donor in numbered_donors]))

This should print first 100 donors:

  1. 1. Bill & Melinda Gates Foundation
  2. 2. Elevate Social Businesses [Clinton Giustra Enterprise Partnership (Canada)]
  3. 3. Fred Eychaner and Alphawood Foundation
  4. 4. Frank Giustra, The Radcliffe Foundation
  5. 5. Postcode Lottery Group [Nationale Postcode Loterij] *
  6. 6. Cheryl and Haim Saban & The Saban Family Foundation *
  7. 7. The Children's Investment Fund Foundation
  8. 8. UNITAID
  9. 9. AUSAID **
  10. 10. Stephen L. Bing
  11. 11. Commonwealth of Australia **
  12. 12. COPRESIDA
  13. 13. Tom Golisano ^
  14. 14. J.B. and M.K. Pritzker Family Foundation
  15. 15. Kingdom of Norway [Government of Norway] **
  16. 16. Kingdom of Saudi Arabia
  17. 17. Norwegian Agency for Development Cooperation (NORAD) **
  18. 18. Denis J. O'Brien and Digicel
  19. 19. Susie Tompkins Buell Fund of the Marin Community Foundation
  20. 20. Swedish Postcode Lottery [The Swedish Postcode Lottery]
  21. 21. The Elma Foundation
  22. 22. The Hunter Foundation *
  23. 23. The Rockefeller Foundation
  24. 24. The Victor Pinchuk Foundation
  25. 25. The Wasserman Foundation *
  26. 26. Tracfone Wireless, Inc. *
  27. 27. Theodore W. Waitt
  28. 28. S. Daniel Abraham
  29. 29. Sheikh Mohammed H. Al-Amoudi
  30. 30. C40 Cities Climate Leadership Group, Inc.
  31. 31. Elton John Aids Foundation
  32. 32. Fidelity Charitable Gift Fund
  33. 33. Government of the Netherlands **
  34. 34. Irish Aid **
  35. 35. Jonathan and Jeannie Lavine, Trustees of the Crimson Lion Foundation *
  36. 36. John D. Mackay
  37. 37. OCP Corporation
  38. 38. Michael Schumacher
  39. 39. Bernard L. Schwartz *
  40. 40. State of Kuwait
  41. 41. The Clinton Family Foundation *
  42. 42. The Coca-Cola Company *
  43. 43. The Sherwood Foundation *
  44. 44. The Walton Family Foundation and the Alice L. Walton Foundation [Walton Family Foundation, Inc.] *
  45. 45. 100 Women in Hedgefunds
  46. 46. Absolute Return for Kids (ARK)
  47. 47. Jay Alix
  48. 48. Alliance for a Green Revolution in Africa (AGRA)
  49. 49. Nasser Al-Rashid
  50. 50. Altman/Kazickas Foundation
  51. 51. American Federation of Teachers
  52. 52. Angelopoulos Foundation ^
  53. 53. Gianna Angelopoulos *
  54. 54. Anheuser-Busch Foundation
  55. 55. Smith and Elizabeth Bagley *
  56. 56. Banc of California ^
  57. 57. Barclays Capital ^
  58. 58. Barclays plc
  59. 59. Laurie and Bill Benenson
  60. 60. Mary Bing and Doug Ellis
  61. 61. Bloomberg Philanthropies
  62. 62. Blue Cross and Blue Shield of North Carolina ^
  63. 63. Richard Blum and Blum Family Foundation
  64. 64. BMU - Federal Ministry for the Environment **
  65. 65. Booz Allen Hamilton ^
  66. 66. Bill Brandt, Patrice Bugelas-Brandt, and Development Specialists, Inc. [Development Specialists, Inc.]
  67. 67. Carlos Bremer *
  68. 68. Richard Caring
  69. 69. Centene Charitable Foundation *
  70. 70. Gilbert R. Chagoury
  71. 71. Cheniere Energy, Inc.
  72. 72. Christy and John Mack Foundation
  73. 73. Cisco ^
  74. 74. Gustavo Cisneros & Venevision
  75. 75. Citi Foundation ^
  76. 76. Clinton-Bush Haiti Fund
  77. 77. Stephen J. Cloobeck
  78. 78. Roy E. Cockrum
  79. 79. Victor P. Dahdaleh & The Victor Phillip Dahdaleh Charitable Foundation
  80. 80. Delos Living ^
  81. 81. Desert Classic Charities Inc
  82. 82. Robert Disbrow
  83. 83. Dubai Foundation
  84. 84. Duke Energy Corporation ^
  85. 85. EKTA Foundation *
  86. 86. Entergy *
  87. 87. Exxonmobil ^
  88. 88. Issam M. Fares
  89. 89. Raj Fernando
  90. 90. Ferraro Family Foundation
  91. 91. Fisher Brothers Foundation, Inc.
  92. 92. Joseph T. Ford
  93. 93. Wallace W. Fowler
  94. 94. Friends Of Saudi Arabia
  95. 95. Fundacion Telmex
  96. 96. Mala Gaonkar Haarman
  97. 97. GEMS Education
  98. 98. General Electric
  99. 99. Aileen Getty and the Aileen Getty Foundation *
  100. 100. Ariadne Getty *

huangapple
  • 本文由 发表于 2023年5月6日 22:05:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76189322.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定