英文:
How can I scrape and record both price and date of a product with Python and BeautifulSoup?
问题
以下是代码的翻译部分:
from bs4 import BeautifulSoup
import requests
import time
import datetime
import pandas as pd
import csv
URL = 'https://www.amazon.com/Atomic-Habits-James-Clear-audiobook/dp/B07RFSSYBH/ref=sr_1_1?keywords=atomic+habits&qid=1685192621&s=books&sprefix=atomi%2Cstripbooks-intl-ship%2C315&sr=1-1'
HEADERS = ({'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36','Accept-Language': 'en-US, en:q =0.5'
})
page = requests.get(URL, headers = HEADERS)
soup1 = BeautifulSoup(page.content, "html.parser")
soup2 = BeautifulSoup(soup1.prettify(),"html.parser")
title = soup2.find(id = 'product Title').get_text()
price = soup2.find("span", attrs ={"class": 'a-size-base a-color-secondary'}).text
print(title)
print(price)
price = price.strip()[1:]
title = title.strip()
print(price)
print(title)
today = datetime.date.today()
print(today)
# inserting data into excel
header = ['title', 'price', 'Date']
data = [title, price, today]
with open('AmazonWebScraper.csv', 'w', newline='',encoding='UTF8') as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerow(data)
df = pd.read_csv(r'C:\Users\User\Desktop\AmazonWebScraper.csv')
print(df)
希望这能帮助您解决问题。
英文:
The code goes like this:
from bs4 import BeautifulSoup
import requests
import time
import datetime
import pandas as pd
import csv
URL = 'https://www.amazon.com/Atomic-Habits-James-Clear-audiobook/dp/B07RFSSYBH/ref=sr_1_1?keywords=atomic+habits&qid=1685192621&s=books&sprefix=atomi%2Cstripbooks-intl-ship%2C315&sr=1-1'
HEADERS = ({'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36','Accept-Language': 'en-US, en:q =0.5'
})
page = requests.get(URL, headers = HEADERS)
soup1 = BeautifulSoup(page.content, "html.parser")
soup2 = BeautifulSoup(soup1.prettify(),"html.parser")
title = soup2.find(id = 'product Title').get_text()
price = soup2.find("span", attrs ={"class": 'a-size-base a-color-secondary'}).text
print(title)
print(price)
price = price.strip()[1:]
title = title.strip()
print(price)
print(title)
today = datetime.date.today()
print(today)
# inserting data into excel
header = ['title', 'price', 'Date']
data = [title, price, today]
with open('AmazonWebScraper.csv', 'w', newline='',encoding='UTF8') as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerow(data)
df = pd.read_csv(r'C:\Users\User\Desktop\AmazonWebScraper.csv')
print(df)
Can anybody help me fix this, I don't see any error but the code isn't being executed?
I'm trying to put it through the end as a beginner on python.
Thanks in advance.
答案1
得分: 0
output: AmazonWebScraper.csv
title price Date
0 Atomic Habits: An Easy & Proven Way to Build G... $12.99 2023-05-30
英文:
you should try this way:
import datetime
import csv
import pandas as pd
from bs4 import BeautifulSoup
import requests
URL = 'https://www.amazon.com/Atomic-Habits-James-Clear-audiobook/dp/B07RFSSYBH/ref=sr_1_1?keywords=atomic+habits&qid=1685192621&s=books&sprefix=atomi%2Cstripbooks-intl-ship%2C315&sr=1-1'
headers = ({'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36','Accept-Language': 'en-US, en:q =0.5' })
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.text, "html.parser")
# print(soup.prettify())
title = soup.find(id='productTitle'). get_text()
price = soup.find("span", attrs={"class": 'a-size-base a-color-secondary'}).text
price = price.strip()
title = title.strip()
today = datetime.date.today()
header = ['title', 'price', 'Date']
data = [title, price, today]
with open('AmazonWebScraper.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer. writerow(header)
writer.writerow(data)
df = pd.read_csv(r'AmazonWebScraper.csv')
print(df)
output: AmazonWebScraper.csv
title price Date
0 Atomic Habits: An Easy & Proven Way to Build G... $12.99 2023-05-30
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论