2023年5月22日 15:56:00go评论145阅读模式

英文:

Python web-scraping issue: unable to retrieve body section data from a specific URL

问题

以下是翻译好的部分：

"Unable to retrieve data from the body section while attempting web scraping using Python."

我尝试使用Python进行网络抓取时无法从正文部分检索数据。

"I am facing an issue where I am unable to retrieve data from the body section while performing web scraping using Python. I would appreciate some assistance with this problem."

我遇到了一个问题，无法在使用Python进行网络抓取时从正文部分检索数据。我会感激一些关于这个问题的帮助。

"Python Code:"

Python 代码：

import requests
from bs4 import BeautifulSoup

url_kakao = "https://www.kakaopay.com/news/pr"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/**********"}
 # 用户代理信息可能被视为机密，因此已被替换为" * "

res_kakao = requests.get(url_kakao, headers=headers)
res_kakao.raise_for_status()

soup_kakao = BeautifulSoup(res_kakao.text,'lxml')
kakao = soup_kakao.find_all("div",attrs={"class":"css-1mqcdgs e2lpi48"})
print(kakao)

→ 结论：无

输出为"NONE"的原因可能是无法成功抓取正文部分的数据。

在url_kakao网站上无法进行网络抓取吗？

英文:

Unable to retrieve data from the body section while attempting web scraping using Python.

I am facing an issue where I am unable to retrieve data from the body section while performing web scraping using Python. I would appreciate some assistance with this problem.

Python Code:


import requests
from bs4 import BeautifulSoup

url_kakao = &quot;https://www.kakaopay.com/news/pr&quot;
headers = {&quot;User-Agent&quot;: &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/**********&quot;}
 # The user-agent information may be considered confidential, so it has been replaced with &quot;*&quot;

res_kakao = requests.get(url_kakao, headers=headers)
res_kakao.raise_for_status()

soup_kakao = BeautifulSoup(res_kakao.text,&#39;lxml&#39;)
kakao = soup_kakao.find_all(&quot;div&quot;,attrs={&quot;class&quot;:&quot;css-1mqcdgs e2lpi48&quot;})
print(kakao)

→ conclusion : NONE

The reason for the output being 'NONE' is likely because the data from the body section is not being successfully scraped.

Is web scraping not possible on the url_kakao website?

答案1

得分: 0

以下是您要翻译的内容：

使用JavaScript通过API动态加载内容：

import requests

requests.get('https://www.kakaopay.com/brand-api/news?page=0&amp;size=10&amp;locale=ko').json()

只需适应请求并调整page参数的值。

{'list': [{'id': 278,
   'news_contents_category': 'COMMON',
   'title': '5월, 카카오페이로 편의점 결제하면 무제한 혜택이!',
   'present_dttm': '2023. 5. 17.'},
  {'id': 277,
   'news_contents_category': 'COMMON',
   'title': '카카오페이, "3년 내 연 100억 건의 금융 니즈 해결 목표"',
   'present_dttm': '2023. 5. 15.'},
  {'id': 276,
   'news_contents_category': 'COMMON',
   'title': '카카오페이증권, ‘매일 이자 받기’ 서비스 시작',
   'present_dttm': '2023. 5. 4.'},...]}

使用JSON中的id，您可以访问文章：https://www.kakaopay.com/news/pr_detail?id=278

英文:

Content is loaded dynamically via JavaScript from an api:

import requests

requests.get(&#39;https://www.kakaopay.com/brand-api/news?page=0&amp;size=10&amp;locale=ko&#39;).json()

Simply adapt the request and adjust the the value for the page parameter.

{&#39;list&#39;: [{&#39;id&#39;: 278,
   &#39;news_contents_category&#39;: &#39;COMMON&#39;,
   &#39;title&#39;: &#39;5월, 카카오페이로 편의점 결제하면 무제한 혜택이!&#39;,
   &#39;present_dttm&#39;: &#39;2023. 5. 17.&#39;},
  {&#39;id&#39;: 277,
   &#39;news_contents_category&#39;: &#39;COMMON&#39;,
   &#39;title&#39;: &#39;카카오페이, &quot;3년 내 연 100억 건의 금융 니즈 해결 목표&quot;&#39;,
   &#39;present_dttm&#39;: &#39;2023. 5. 15.&#39;},
  {&#39;id&#39;: 276,
   &#39;news_contents_category&#39;: &#39;COMMON&#39;,
   &#39;title&#39;: &#39;카카오페이증권, ‘매일 이자 받기’ 서비스 시작&#39;,
   &#39;present_dttm&#39;: &#39;2023. 5. 4.&#39;},...]}

With the id from the JSON you could call the articles https://www.kakaopay.com/news/pr_detail?id=278

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python web-scraping issue: 无法从特定URL检索正文部分数据

问题

答案1

asyncio.Future.done()在任务完成时为什么没有设置为True？

如何向DataFrame添加零数组列

Pandas json_normalize在存在空值时将整数转换为浮点数

为什么签名的验证不同？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论