如何使用Python和BeautifulSoup来抓取并记录产品的价格和日期?

huangapple go评论65阅读模式
英文:

How can I scrape and record both price and date of a product with Python and BeautifulSoup?

问题

以下是代码的翻译部分:

from bs4 import BeautifulSoup
import requests
import time
import datetime
import pandas as pd
import csv

URL = 'https://www.amazon.com/Atomic-Habits-James-Clear-audiobook/dp/B07RFSSYBH/ref=sr_1_1?keywords=atomic+habits&qid=1685192621&s=books&sprefix=atomi%2Cstripbooks-intl-ship%2C315&sr=1-1'

HEADERS = ({'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36','Accept-Language': 'en-US, en:q =0.5'
})

page = requests.get(URL, headers = HEADERS)  
soup1 = BeautifulSoup(page.content, "html.parser")
soup2 = BeautifulSoup(soup1.prettify(),"html.parser")
title = soup2.find(id = 'product Title').get_text()
price = soup2.find("span",  attrs ={"class": 'a-size-base a-color-secondary'}).text

print(title)
print(price)

price = price.strip()[1:]
title = title.strip()

print(price)
print(title)

today = datetime.date.today()
print(today)

# inserting data into excel

header = ['title', 'price', 'Date']
data = [title, price, today]

with open('AmazonWebScraper.csv', 'w', newline='',encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    writer.writerow(data)   

df = pd.read_csv(r'C:\Users\User\Desktop\AmazonWebScraper.csv')
print(df)

希望这能帮助您解决问题。

英文:

The code goes like this:

from bs4 import BeautifulSoup
import requests
import time
import datetime
import pandas as pd
import csv

URL = 'https://www.amazon.com/Atomic-Habits-James-Clear-audiobook/dp/B07RFSSYBH/ref=sr_1_1?keywords=atomic+habits&qid=1685192621&s=books&sprefix=atomi%2Cstripbooks-intl-ship%2C315&sr=1-1'

HEADERS = ({'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36','Accept-Language': 'en-US, en:q =0.5'
})

page = requests.get(URL, headers = HEADERS)  
soup1 = BeautifulSoup(page.content, "html.parser")
soup2 = BeautifulSoup(soup1.prettify(),"html.parser")
title = soup2.find(id = 'product Title').get_text()
price = soup2.find("span",  attrs ={"class": 'a-size-base a-color-secondary'}).text

print(title)
print(price)

price = price.strip()[1:]
title = title.strip()
    
print(price)
print(title)
 
today = datetime.date.today()
print(today)

# inserting data into excel

header = ['title', 'price', 'Date']
data = [title, price, today]

with open('AmazonWebScraper.csv', 'w', newline='',encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    writer.writerow(data)   
    
df = pd.read_csv(r'C:\Users\User\Desktop\AmazonWebScraper.csv')
print(df)

Can anybody help me fix this, I don't see any error but the code isn't being executed?
I'm trying to put it through the end as a beginner on python.
Thanks in advance.

答案1

得分: 0

output: AmazonWebScraper.csv

   title                                              price   Date
0  Atomic Habits: An Easy & Proven Way to Build G...  $12.99  2023-05-30
英文:

you should try this way:

import datetime
import csv
import pandas as pd
from bs4 import BeautifulSoup
import requests

URL = 'https://www.amazon.com/Atomic-Habits-James-Clear-audiobook/dp/B07RFSSYBH/ref=sr_1_1?keywords=atomic+habits&qid=1685192621&s=books&sprefix=atomi%2Cstripbooks-intl-ship%2C315&sr=1-1'

headers = ({'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36','Accept-Language': 'en-US, en:q =0.5' })

page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.text, "html.parser")
# print(soup.prettify())
title = soup.find(id='productTitle'). get_text()
price = soup.find("span", attrs={"class": 'a-size-base a-color-secondary'}).text

price = price.strip()
title = title.strip()
today = datetime.date.today()

header = ['title', 'price', 'Date']
data = [title, price, today]

with open('AmazonWebScraper.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer. writerow(header)
    writer.writerow(data)

df = pd.read_csv(r'AmazonWebScraper.csv')
print(df)

output: AmazonWebScraper.csv

   title                                              price   Date
0  Atomic Habits: An Easy & Proven Way to Build G...  $12.99  2023-05-30

huangapple
  • 本文由 发表于 2023年5月30日 05:30:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76360420.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定