Python web-scraping issue: 无法从特定URL检索正文部分数据

huangapple go评论58阅读模式
英文:

Python web-scraping issue: unable to retrieve body section data from a specific URL

问题

以下是翻译好的部分:

"Unable to retrieve data from the body section while attempting web scraping using Python."

我尝试使用Python进行网络抓取时无法从正文部分检索数据。

"I am facing an issue where I am unable to retrieve data from the body section while performing web scraping using Python. I would appreciate some assistance with this problem."

我遇到了一个问题,无法在使用Python进行网络抓取时从正文部分检索数据。我会感激一些关于这个问题的帮助。

"Python Code:"

Python 代码:

import requests
from bs4 import BeautifulSoup

url_kakao = "https://www.kakaopay.com/news/pr"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/**********"}
 # 用户代理信息可能被视为机密,因此已被替换为" * "

res_kakao = requests.get(url_kakao, headers=headers)
res_kakao.raise_for_status()

soup_kakao = BeautifulSoup(res_kakao.text,'lxml')
kakao = soup_kakao.find_all("div",attrs={"class":"css-1mqcdgs e2lpi48"})
print(kakao)

→ 结论:无

输出为"NONE"的原因可能是无法成功抓取正文部分的数据。

在url_kakao网站上无法进行网络抓取吗?

英文:

Unable to retrieve data from the body section while attempting web scraping using Python.

I am facing an issue where I am unable to retrieve data from the body section while performing web scraping using Python. I would appreciate some assistance with this problem.

Python Code:


import requests
from bs4 import BeautifulSoup

url_kakao = "https://www.kakaopay.com/news/pr"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/**********"}
 # The user-agent information may be considered confidential, so it has been replaced with "*"

res_kakao = requests.get(url_kakao, headers=headers)
res_kakao.raise_for_status()

soup_kakao = BeautifulSoup(res_kakao.text,'lxml')
kakao = soup_kakao.find_all("div",attrs={"class":"css-1mqcdgs e2lpi48"})
print(kakao)

→ conclusion : NONE

The reason for the output being 'NONE' is likely because the data from the body section is not being successfully scraped.

Is web scraping not possible on the url_kakao website?

答案1

得分: 0

以下是您要翻译的内容:

使用JavaScript通过API动态加载内容:

import requests

requests.get('https://www.kakaopay.com/brand-api/news?page=0&size=10&locale=ko').json()

只需适应请求并调整page参数的值。

{'list': [{'id': 278,
   'news_contents_category': 'COMMON',
   'title': '5월, 카카오페이로 편의점 결제하면 무제한 혜택이!',
   'present_dttm': '2023. 5. 17.'},
  {'id': 277,
   'news_contents_category': 'COMMON',
   'title': '카카오페이, "3년 내 연 100억 건의 금융 니즈 해결 목표"',
   'present_dttm': '2023. 5. 15.'},
  {'id': 276,
   'news_contents_category': 'COMMON',
   'title': '카카오페이증권, ‘매일 이자 받기’ 서비스 시작',
   'present_dttm': '2023. 5. 4.'},...]}

使用JSON中的id,您可以访问文章:https://www.kakaopay.com/news/pr_detail?id=278

英文:

Content is loaded dynamically via JavaScript from an api:

import requests

requests.get('https://www.kakaopay.com/brand-api/news?page=0&size=10&locale=ko').json()

Simply adapt the request and adjust the the value for the page parameter.

{'list': [{'id': 278,
   'news_contents_category': 'COMMON',
   'title': '5월, 카카오페이로 편의점 결제하면 무제한 혜택이!',
   'present_dttm': '2023. 5. 17.'},
  {'id': 277,
   'news_contents_category': 'COMMON',
   'title': '카카오페이, "3년 내 연 100억 건의 금융 니즈 해결 목표"',
   'present_dttm': '2023. 5. 15.'},
  {'id': 276,
   'news_contents_category': 'COMMON',
   'title': '카카오페이증권, ‘매일 이자 받기’ 서비스 시작',
   'present_dttm': '2023. 5. 4.'},...]}

With the id from the JSON you could call the articles https://www.kakaopay.com/news/pr_detail?id=278

huangapple
  • 本文由 发表于 2023年5月22日 15:56:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76304069.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定