英文:
I can't find the correct tags to scrape the class name, code, and description (description is via link)
问题
我是新手,正在尝试从这个网站上抓取课程代码、名称和描述:
无论我如何设置我的divs = soup.find_all(),都似乎没有打印任何内容(最终我将把所有数据打印到CSV文件中)。
这是我的代码:
import requests
from bs4 import BeautifulSoup
# 定义要抓取的URL
url = 'https://catalog.registrar.ucla.edu/search?parentAcademicOrg=7e561ea0db6fa0107f1572f5f39619b1&ct=subject'
# 发送GET请求并获取HTML响应
response = requests.get(url)
html = response.content
# 使用BeautifulSoup解析HTML
soup = BeautifulSoup(html, 'html.parser')
# 查找所有class为'courseblock'的div标签
divs = soup.find_all('div', {'class': 'courseblock'})
# 遍历divs并提取每个div中的课程名称
for div in divs:
# 从div内的第一个span标签中提取课程名称
course_name = div.find('span', {'class': 'courseblocktitle'}).text.strip()
# 打印课程名称
print(course_name)
与下面这行代码不同:
divs = soup.find_all('div', {'class': 'courseblock'}),
我还尝试了以下方式:
1. divs = soup.find_all('div', {'class': 'css-15y68hq-Box--Box-Box-Flex--Flex-Flex-results-styles--ResultItemContainer e1ecnqs53'}),
2. divs = soup.find_all('span', {'class': 'result-item-title'}),
我甚至还没有尝试描述部分,因为我卡在这里了。任何帮助将会很有帮助。
英文:
I'm brand new to scraping. I'm trying to scrape the class code, name, and description from this website:
No matter what I set my divs = soup.find_all(), nothing seems to print (eventually I will print all data to csv).
Here is what I have:
import requests
from bs4 import BeautifulSoup
# Define the URL to scrape
url = 'https://catalog.registrar.ucla.edu/search?parentAcademicOrg=7e561ea0db6fa0107f1572f5f39619b1&ct=subject'
# Send a GET request to the URL and get the HTML response
response = requests.get(url)
html = response.content
# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
# Find all the div tags with class 'courseblock'
divs = soup.find_all('div', {'class': 'courseblock'})
# Loop through the divs and extract the course name from each div
for div in divs:
# Extract the course name from the first span tag within the div
course_name = div.find('span', {'class': 'courseblocktitle'}).text.strip()
# Print the course name
print(course_name)
Instead of this line below:
divs = soup.find_all('div', {'class': 'courseblock'}),
I've also tried:
1. divs = soup.find_all('div', {'class': 'css-15y68hq-Box--Box-Box-Flex--Flex-Flex-results-styles--ResultItemContainer e1ecnqs53'}),
2. divs = soup.find_all('span', {'class': 'result-item-title'}),
I haven't even attempted the description because I'm stuck on this. Any help would be great.
答案1
得分: 0
以下是代码部分的翻译:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
api_endpoint = "https://api-us-west-1.prod.courseloop.com/publisher/search-academic-items?"
payload = {
"siteId": "ucla-prod",
"query": "",
"contenttype": "subject",
"searchFilters": [
{
"filterField": "implementationYear",
"filterValue": [
"2022"
],
"isExactMatch": False
},
{
"filterField": "parentAcademicOrg",
"filterValue": [
"7e561ea0db6fa0107f1572f5f39619b1"
],
"isExactMatch": False
}
],
"from": 0,
"size": 20
}
data = requests.post(api_endpoint, headers=headers, json=payload).json()
for item in data["data"]["results"]:
print(item["title"])
print(f"https://catalog.registrar.ucla.edu{item['uri']}")
输出:
A&O SCI 1 Climate Change: From Puzzles to Policy
https://catalog.registrar.ucla.edu/course/2022/AOSCI1
A&O SCI 1L Climate Change: From Puzzles to Policy—Laboratory
https://catalog.registrar.ucla.edu/course/2022/AOSCI1L
A&O SCI 2 Air Pollution
https://catalog.registrar.ucla.edu/course/2022/AOSCI2
A&O SCI 2L Air Pollution Laboratory
https://catalog.registrar.ucla.edu/course/2022/AOSCI2L
A&O SCI 3 Meteorology and Extreme Weather
https://catalog.registrar.ucla.edu/course/2022/AOSCI3
A&O SCI 3L Meteorology and Extreme Weather Laboratory
https://catalog.registrar.ucla.edu/course/2022/AOSCI3L
A&O SCI 5 Climates of Other Worlds
https://catalog.registrar.ucla.edu/course/2022/AOSCI5
A&O SCI M7 Perils of Space: Introduction to Space Weather
https://catalog.registrar.ucla.edu/course/2022/AOSCIM7
A&O SCI 19 Fiat Lux Freshman Seminars
https://catalog.registrar.ucla.edu/course/2022/AOSCI19
A&O SCI 51 Fundamentals of Climate Science
https://catalog.registrar.ucla.edu/course/2022/AOSCI51
A&O SCI M71 Introduction to Computing for Geoscientists
https://catalog.registrar.ucla.edu/course/2022/AOSCIM71
A&O SCI 88 Lower-Division Seminar
https://catalog.registrar.ucla.edu/course/2022/AOSCI88
A&O SCI 89 Honors Seminars
https://catalog.registrar.ucla.edu/course/2022/AOSCI89
A&O SCI 89HC Honors Contracts
https://catalog.registrar.ucla.edu/course/2022/AOSCI89HC
A&O SCI 90 Introduction to Undergraduate Research in Atmospheric and Oceanic Sciences
https://catalog.registrar.ucla.edu/course/2022/AOSCI90
A&O SCI 99 Student Research Program
https://catalog.registrar.ucla.edu/course/2022/AOSCI99
A&O SCI M100 Earth and Its Environment
https://catalog.registrar.ucla.edu/course/2022/AOSCIM100
A&O SCI 101 Fundamentals of Atmospheric Dynamics and Thermodynamics
https://catalog.registrar.ucla.edu/course/2022/AOSCI101
A&O SCI 102 Climate Change and Climate Modeling
https://catalog.registrar.ucla.edu/course/2022/AOSCI102
A&O SCI 103 Physical Oceanography
英文:
The data comes from an API endpoint that you can query.
Try this:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
api_endpoint = "https://api-us-west-1.prod.courseloop.com/publisher/search-academic-items?"
payload = {
"siteId": "ucla-prod",
"query": "",
"contenttype": "subject",
"searchFilters": [
{
"filterField": "implementationYear",
"filterValue": [
"2022"
],
"isExactMatch": False
},
{
"filterField": "parentAcademicOrg",
"filterValue": [
"7e561ea0db6fa0107f1572f5f39619b1"
],
"isExactMatch": False
}
],
"from": 0,
"size": 20
}
data = requests.post(api_endpoint, headers=headers, json=payload).json()
for item in data["data"]["results"]:
print(item["title"])
print(f"https://catalog.registrar.ucla.edu{item['uri']}")
Output:
A&O SCI 1 Climate Change: From Puzzles to Policy
https://catalog.registrar.ucla.edu/course/2022/AOSCI1
A&O SCI 1L Climate Change: From Puzzles to Policy—Laboratory
https://catalog.registrar.ucla.edu/course/2022/AOSCI1L
A&O SCI 2 Air Pollution
https://catalog.registrar.ucla.edu/course/2022/AOSCI2
A&O SCI 2L Air Pollution Laboratory
https://catalog.registrar.ucla.edu/course/2022/AOSCI2L
A&O SCI 3 Meteorology and Extreme Weather
https://catalog.registrar.ucla.edu/course/2022/AOSCI3
A&O SCI 3L Meteorology and Extreme Weather Laboratory
https://catalog.registrar.ucla.edu/course/2022/AOSCI3L
A&O SCI 5 Climates of Other Worlds
https://catalog.registrar.ucla.edu/course/2022/AOSCI5
A&O SCI M7 Perils of Space: Introduction to Space Weather
https://catalog.registrar.ucla.edu/course/2022/AOSCIM7
A&O SCI 19 Fiat Lux Freshman Seminars
https://catalog.registrar.ucla.edu/course/2022/AOSCI19
A&O SCI 51 Fundamentals of Climate Science
https://catalog.registrar.ucla.edu/course/2022/AOSCI51
A&O SCI M71 Introduction to Computing for Geoscientists
https://catalog.registrar.ucla.edu/course/2022/AOSCIM71
A&O SCI 88 Lower-Division Seminar
https://catalog.registrar.ucla.edu/course/2022/AOSCI88
A&O SCI 89 Honors Seminars
https://catalog.registrar.ucla.edu/course/2022/AOSCI89
A&O SCI 89HC Honors Contracts
https://catalog.registrar.ucla.edu/course/2022/AOSCI89HC
A&O SCI 90 Introduction to Undergraduate Research in Atmospheric and Oceanic Sciences
https://catalog.registrar.ucla.edu/course/2022/AOSCI90
A&O SCI 99 Student Research Program
https://catalog.registrar.ucla.edu/course/2022/AOSCI99
A&O SCI M100 Earth and Its Environment
https://catalog.registrar.ucla.edu/course/2022/AOSCIM100
A&O SCI 101 Fundamentals of Atmospheric Dynamics and Thermodynamics
https://catalog.registrar.ucla.edu/course/2022/AOSCI101
A&O SCI 102 Climate Change and Climate Modeling
https://catalog.registrar.ucla.edu/course/2022/AOSCI102
A&O SCI 103 Physical Oceanography
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论