我找不到正确的标签来抓取类名、代码和描述(描述通过链接提供)。

huangapple go评论62阅读模式
英文:

I can't find the correct tags to scrape the class name, code, and description (description is via link)

问题

我是新手,正在尝试从这个网站上抓取课程代码、名称和描述:

URL = https://catalog.registrar.ucla.edu/search?parentAcademicOrg=7e561ea0db6fa0107f1572f5f39619b1&ct=subject

无论我如何设置我的divs = soup.find_all(),都似乎没有打印任何内容(最终我将把所有数据打印到CSV文件中)。

这是我的代码:

import requests
from bs4 import BeautifulSoup

# 定义要抓取的URL
url = 'https://catalog.registrar.ucla.edu/search?parentAcademicOrg=7e561ea0db6fa0107f1572f5f39619b1&ct=subject'

# 发送GET请求并获取HTML响应
response = requests.get(url)
html = response.content

# 使用BeautifulSoup解析HTML
soup = BeautifulSoup(html, 'html.parser')

# 查找所有class为'courseblock'的div标签
divs = soup.find_all('div', {'class': 'courseblock'})

# 遍历divs并提取每个div中的课程名称
for div in divs:
    # 从div内的第一个span标签中提取课程名称
    course_name = div.find('span', {'class': 'courseblocktitle'}).text.strip()
    # 打印课程名称
    print(course_name)

与下面这行代码不同:

divs = soup.find_all('div', {'class': 'courseblock'}),

我还尝试了以下方式:

1. divs = soup.find_all('div', {'class': 'css-15y68hq-Box--Box-Box-Flex--Flex-Flex-results-styles--ResultItemContainer e1ecnqs53'}),

2. divs = soup.find_all('span', {'class': 'result-item-title'}),

我甚至还没有尝试描述部分,因为我卡在这里了。任何帮助将会很有帮助。

英文:

I'm brand new to scraping. I'm trying to scrape the class code, name, and description from this website:

URL = https://catalog.registrar.ucla.edu/search?parentAcademicOrg=7e561ea0db6fa0107f1572f5f39619b1&ct=subject

No matter what I set my divs = soup.find_all(), nothing seems to print (eventually I will print all data to csv).

Here is what I have:

import requests
from bs4 import BeautifulSoup

# Define the URL to scrape
url = 'https://catalog.registrar.ucla.edu/search?parentAcademicOrg=7e561ea0db6fa0107f1572f5f39619b1&ct=subject'

# Send a GET request to the URL and get the HTML response
response = requests.get(url)
html = response.content

# Parse the HTML using BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

# Find all the div tags with class 'courseblock'
divs = soup.find_all('div', {'class': 'courseblock'})

# Loop through the divs and extract the course name from each div
for div in divs:
    # Extract the course name from the first span tag within the div
    course_name = div.find('span', {'class': 'courseblocktitle'}).text.strip()
    # Print the course name
    print(course_name)

Instead of this line below:

divs = soup.find_all('div', {'class': 'courseblock'}),

I've also tried:

1. divs = soup.find_all('div', {'class': 'css-15y68hq-Box--Box-Box-Flex--Flex-Flex-results-styles--ResultItemContainer e1ecnqs53'}),

2. divs = soup.find_all('span', {'class': 'result-item-title'}),

I haven't even attempted the description because I'm stuck on this. Any help would be great.

答案1

得分: 0

以下是代码部分的翻译:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}

api_endpoint = "https://api-us-west-1.prod.courseloop.com/publisher/search-academic-items?"
payload = {
  "siteId": "ucla-prod",
  "query": "",
  "contenttype": "subject",
  "searchFilters": [
    {
      "filterField": "implementationYear",
      "filterValue": [
        "2022"
      ],
      "isExactMatch": False
    },
    {
      "filterField": "parentAcademicOrg",
      "filterValue": [
        "7e561ea0db6fa0107f1572f5f39619b1"
      ],
      "isExactMatch": False
    }
  ],
  "from": 0,
  "size": 20
}

data = requests.post(api_endpoint, headers=headers, json=payload).json()
for item in data["data"]["results"]:
    print(item["title"])
    print(f"https://catalog.registrar.ucla.edu{item['uri']}")

输出:

A&O SCI 1 Climate Change: From Puzzles to Policy
https://catalog.registrar.ucla.edu/course/2022/AOSCI1
A&O SCI 1L Climate Change: From Puzzles to Policy—Laboratory
https://catalog.registrar.ucla.edu/course/2022/AOSCI1L
A&O SCI 2 Air Pollution
https://catalog.registrar.ucla.edu/course/2022/AOSCI2
A&O SCI 2L Air Pollution Laboratory
https://catalog.registrar.ucla.edu/course/2022/AOSCI2L
A&O SCI 3 Meteorology and Extreme Weather
https://catalog.registrar.ucla.edu/course/2022/AOSCI3
A&O SCI 3L Meteorology and Extreme Weather Laboratory
https://catalog.registrar.ucla.edu/course/2022/AOSCI3L
A&O SCI 5 Climates of Other Worlds
https://catalog.registrar.ucla.edu/course/2022/AOSCI5
A&O SCI M7 Perils of Space: Introduction to Space Weather
https://catalog.registrar.ucla.edu/course/2022/AOSCIM7
A&O SCI 19 Fiat Lux Freshman Seminars
https://catalog.registrar.ucla.edu/course/2022/AOSCI19
A&O SCI 51 Fundamentals of Climate Science
https://catalog.registrar.ucla.edu/course/2022/AOSCI51
A&O SCI M71 Introduction to Computing for Geoscientists
https://catalog.registrar.ucla.edu/course/2022/AOSCIM71
A&O SCI 88 Lower-Division Seminar
https://catalog.registrar.ucla.edu/course/2022/AOSCI88
A&O SCI 89 Honors Seminars
https://catalog.registrar.ucla.edu/course/2022/AOSCI89
A&O SCI 89HC Honors Contracts
https://catalog.registrar.ucla.edu/course/2022/AOSCI89HC
A&O SCI 90 Introduction to Undergraduate Research in Atmospheric and Oceanic Sciences
https://catalog.registrar.ucla.edu/course/2022/AOSCI90
A&O SCI 99 Student Research Program
https://catalog.registrar.ucla.edu/course/2022/AOSCI99
A&O SCI M100 Earth and Its Environment
https://catalog.registrar.ucla.edu/course/2022/AOSCIM100
A&O SCI 101 Fundamentals of Atmospheric Dynamics and Thermodynamics
https://catalog.registrar.ucla.edu/course/2022/AOSCI101
A&O SCI 102 Climate Change and Climate Modeling
https://catalog.registrar.ucla.edu/course/2022/AOSCI102
A&O SCI 103 Physical Oceanography
英文:

The data comes from an API endpoint that you can query.

Try this:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}

api_endpoint = "https://api-us-west-1.prod.courseloop.com/publisher/search-academic-items?"
payload = {
  "siteId": "ucla-prod",
  "query": "",
  "contenttype": "subject",
  "searchFilters": [
    {
      "filterField": "implementationYear",
      "filterValue": [
        "2022"
      ],
      "isExactMatch": False
    },
    {
      "filterField": "parentAcademicOrg",
      "filterValue": [
        "7e561ea0db6fa0107f1572f5f39619b1"
      ],
      "isExactMatch": False
    }
  ],
  "from": 0,
  "size": 20
}

data = requests.post(api_endpoint, headers=headers, json=payload).json()
for item in data["data"]["results"]:
    print(item["title"])
    print(f"https://catalog.registrar.ucla.edu{item['uri']}")

Output:

A&O SCI 1 Climate Change: From Puzzles to Policy
https://catalog.registrar.ucla.edu/course/2022/AOSCI1
A&O SCI 1L Climate Change: From Puzzles to Policy—Laboratory
https://catalog.registrar.ucla.edu/course/2022/AOSCI1L
A&O SCI 2 Air Pollution
https://catalog.registrar.ucla.edu/course/2022/AOSCI2
A&O SCI 2L Air Pollution Laboratory
https://catalog.registrar.ucla.edu/course/2022/AOSCI2L
A&O SCI 3 Meteorology and Extreme Weather
https://catalog.registrar.ucla.edu/course/2022/AOSCI3
A&O SCI 3L Meteorology and Extreme Weather Laboratory
https://catalog.registrar.ucla.edu/course/2022/AOSCI3L
A&O SCI 5 Climates of Other Worlds
https://catalog.registrar.ucla.edu/course/2022/AOSCI5
A&O SCI M7 Perils of Space: Introduction to Space Weather
https://catalog.registrar.ucla.edu/course/2022/AOSCIM7
A&O SCI 19 Fiat Lux Freshman Seminars
https://catalog.registrar.ucla.edu/course/2022/AOSCI19
A&O SCI 51 Fundamentals of Climate Science
https://catalog.registrar.ucla.edu/course/2022/AOSCI51
A&O SCI M71 Introduction to Computing for Geoscientists
https://catalog.registrar.ucla.edu/course/2022/AOSCIM71
A&O SCI 88 Lower-Division Seminar
https://catalog.registrar.ucla.edu/course/2022/AOSCI88
A&O SCI 89 Honors Seminars
https://catalog.registrar.ucla.edu/course/2022/AOSCI89
A&O SCI 89HC Honors Contracts
https://catalog.registrar.ucla.edu/course/2022/AOSCI89HC
A&O SCI 90 Introduction to Undergraduate Research in Atmospheric and Oceanic Sciences
https://catalog.registrar.ucla.edu/course/2022/AOSCI90
A&O SCI 99 Student Research Program
https://catalog.registrar.ucla.edu/course/2022/AOSCI99
A&O SCI M100 Earth and Its Environment
https://catalog.registrar.ucla.edu/course/2022/AOSCIM100
A&O SCI 101 Fundamentals of Atmospheric Dynamics and Thermodynamics
https://catalog.registrar.ucla.edu/course/2022/AOSCI101
A&O SCI 102 Climate Change and Climate Modeling
https://catalog.registrar.ucla.edu/course/2022/AOSCI102
A&O SCI 103 Physical Oceanography

huangapple
  • 本文由 发表于 2023年4月4日 03:59:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75923342.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定