2023年5月30日 05:30:55go评论106阅读模式

英文:

How can I scrape and record both price and date of a product with Python and BeautifulSoup?

问题

以下是代码的翻译部分：

from bs4 import BeautifulSoup
import requests
import time
import datetime
import pandas as pd
import csv
URL = 'https://www.amazon.com/Atomic-Habits-James-Clear-audiobook/dp/B07RFSSYBH/ref=sr_1_1?keywords=atomic+habits&amp;qid=1685192621&amp;s=books&amp;sprefix=atomi%2Cstripbooks-intl-ship%2C315&amp;sr=1-1'
HEADERS = ({'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36','Accept-Language': 'en-US, en:q =0.5'
})
page = requests.get(URL, headers = HEADERS)  
soup1 = BeautifulSoup(page.content, "html.parser")
soup2 = BeautifulSoup(soup1.prettify(),"html.parser")
title = soup2.find(id = 'product Title').get_text()
price = soup2.find("span",  attrs ={"class": 'a-size-base a-color-secondary'}).text
print(title)
print(price)
price = price.strip()[1:]
title = title.strip()
print(price)
print(title)
today = datetime.date.today()
print(today)
# inserting data into excel
header = ['title', 'price', 'Date']
data = [title, price, today]
with open('AmazonWebScraper.csv', 'w', newline='',encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    writer.writerow(data)   
df = pd.read_csv(r'C:\Users\User\Desktop\AmazonWebScraper.csv')
print(df)

希望这能帮助您解决问题。

英文:

The code goes like this:

from bs4 import BeautifulSoup
import requests
import time
import datetime
import pandas as pd
import csv
URL = &#39;https://www.amazon.com/Atomic-Habits-James-Clear-audiobook/dp/B07RFSSYBH/ref=sr_1_1?keywords=atomic+habits&amp;qid=1685192621&amp;s=books&amp;sprefix=atomi%2Cstripbooks-intl-ship%2C315&amp;sr=1-1&#39;
HEADERS = ({&#39;user-agent&#39;: &#39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36&#39;,&#39;Accept-Language&#39;: &#39;en-US, en:q =0.5&#39;
})
page = requests.get(URL, headers = HEADERS)  
soup1 = BeautifulSoup(page.content, &quot;html.parser&quot;)
soup2 = BeautifulSoup(soup1.prettify(),&quot;html.parser&quot;)
title = soup2.find(id = &#39;product Title&#39;).get_text()
price = soup2.find(&quot;span&quot;,  attrs ={&quot;class&quot;: &#39;a-size-base a-color-secondary&#39;}).text
print(title)
print(price)
price = price.strip()[1:]
title = title.strip()
    
print(price)
print(title)
 
today = datetime.date.today()
print(today)
# inserting data into excel
header = [&#39;title&#39;, &#39;price&#39;, &#39;Date&#39;]
data = [title, price, today]
with open(&#39;AmazonWebScraper.csv&#39;, &#39;w&#39;, newline=&#39;&#39;,encoding=&#39;UTF8&#39;) as f:
    writer = csv.writer(f)
    writer.writerow(header)
    writer.writerow(data)   
    
df = pd.read_csv(r&#39;C:\Users\User\Desktop\AmazonWebScraper.csv&#39;)
print(df)

Can anybody help me fix this, I don't see any error but the code isn't being executed?
I'm trying to put it through the end as a beginner on python.
Thanks in advance.

答案1

得分: 0

output: AmazonWebScraper.csv

   title                                              price   Date
0  Atomic Habits: An Easy & Proven Way to Build G...  $12.99  2023-05-30

英文:

you should try this way:

import datetime
import csv
import pandas as pd
from bs4 import BeautifulSoup
import requests
URL = &#39;https://www.amazon.com/Atomic-Habits-James-Clear-audiobook/dp/B07RFSSYBH/ref=sr_1_1?keywords=atomic+habits&amp;qid=1685192621&amp;s=books&amp;sprefix=atomi%2Cstripbooks-intl-ship%2C315&amp;sr=1-1&#39;
headers = ({&#39;user-agent&#39;: &#39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36&#39;,&#39;Accept-Language&#39;: &#39;en-US, en:q =0.5&#39; })
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.text, &quot;html.parser&quot;)
# print(soup.prettify())
title = soup.find(id=&#39;productTitle&#39;). get_text()
price = soup.find(&quot;span&quot;, attrs={&quot;class&quot;: &#39;a-size-base a-color-secondary&#39;}).text
price = price.strip()
title = title.strip()
today = datetime.date.today()
header = [&#39;title&#39;, &#39;price&#39;, &#39;Date&#39;]
data = [title, price, today]
with open(&#39;AmazonWebScraper.csv&#39;, &#39;w&#39;, newline=&#39;&#39;, encoding=&#39;utf-8&#39;) as f:
    writer = csv.writer(f)
    writer. writerow(header)
    writer.writerow(data)
df = pd.read_csv(r&#39;AmazonWebScraper.csv&#39;)
print(df)

output: AmazonWebScraper.csv

   title                                              price   Date
0  Atomic Habits: An Easy &amp; Proven Way to Build G...  $12.99  2023-05-30

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Python和BeautifulSoup来抓取并记录产品的价格和日期？

问题

答案1

You can plot [sin(nx)/sin(x)]^2 如何绘制？

无法在启动他人项目后使用Scrapy：ModuleNotFoundError

被重写的 `Process.run` 不会异步执行。

使用Python调用C库

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。