如何使用Python和BeautifulSoup来抓取并记录产品的价格和日期?

huangapple go评论106阅读模式
英文:

How can I scrape and record both price and date of a product with Python and BeautifulSoup?

问题

以下是代码的翻译部分:

  1. from bs4 import BeautifulSoup
  2. import requests
  3. import time
  4. import datetime
  5. import pandas as pd
  6. import csv
  7. URL = 'https://www.amazon.com/Atomic-Habits-James-Clear-audiobook/dp/B07RFSSYBH/ref=sr_1_1?keywords=atomic+habits&qid=1685192621&s=books&sprefix=atomi%2Cstripbooks-intl-ship%2C315&sr=1-1'
  8. HEADERS = ({'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36','Accept-Language': 'en-US, en:q =0.5'
  9. })
  10. page = requests.get(URL, headers = HEADERS)
  11. soup1 = BeautifulSoup(page.content, "html.parser")
  12. soup2 = BeautifulSoup(soup1.prettify(),"html.parser")
  13. title = soup2.find(id = 'product Title').get_text()
  14. price = soup2.find("span", attrs ={"class": 'a-size-base a-color-secondary'}).text
  15. print(title)
  16. print(price)
  17. price = price.strip()[1:]
  18. title = title.strip()
  19. print(price)
  20. print(title)
  21. today = datetime.date.today()
  22. print(today)
  23. # inserting data into excel
  24. header = ['title', 'price', 'Date']
  25. data = [title, price, today]
  26. with open('AmazonWebScraper.csv', 'w', newline='',encoding='UTF8') as f:
  27. writer = csv.writer(f)
  28. writer.writerow(header)
  29. writer.writerow(data)
  30. df = pd.read_csv(r'C:\Users\User\Desktop\AmazonWebScraper.csv')
  31. print(df)

希望这能帮助您解决问题。

英文:

The code goes like this:

  1. from bs4 import BeautifulSoup
  2. import requests
  3. import time
  4. import datetime
  5. import pandas as pd
  6. import csv
  7. URL = 'https://www.amazon.com/Atomic-Habits-James-Clear-audiobook/dp/B07RFSSYBH/ref=sr_1_1?keywords=atomic+habits&qid=1685192621&s=books&sprefix=atomi%2Cstripbooks-intl-ship%2C315&sr=1-1'
  8. HEADERS = ({'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36','Accept-Language': 'en-US, en:q =0.5'
  9. })
  10. page = requests.get(URL, headers = HEADERS)
  11. soup1 = BeautifulSoup(page.content, "html.parser")
  12. soup2 = BeautifulSoup(soup1.prettify(),"html.parser")
  13. title = soup2.find(id = 'product Title').get_text()
  14. price = soup2.find("span", attrs ={"class": 'a-size-base a-color-secondary'}).text
  15. print(title)
  16. print(price)
  17. price = price.strip()[1:]
  18. title = title.strip()
  19. print(price)
  20. print(title)
  21. today = datetime.date.today()
  22. print(today)
  23. # inserting data into excel
  24. header = ['title', 'price', 'Date']
  25. data = [title, price, today]
  26. with open('AmazonWebScraper.csv', 'w', newline='',encoding='UTF8') as f:
  27. writer = csv.writer(f)
  28. writer.writerow(header)
  29. writer.writerow(data)
  30. df = pd.read_csv(r'C:\Users\User\Desktop\AmazonWebScraper.csv')
  31. print(df)

Can anybody help me fix this, I don't see any error but the code isn't being executed?
I'm trying to put it through the end as a beginner on python.
Thanks in advance.

答案1

得分: 0

output: AmazonWebScraper.csv

  1. title price Date
  2. 0 Atomic Habits: An Easy & Proven Way to Build G... $12.99 2023-05-30
英文:

you should try this way:

  1. import datetime
  2. import csv
  3. import pandas as pd
  4. from bs4 import BeautifulSoup
  5. import requests
  6. URL = 'https://www.amazon.com/Atomic-Habits-James-Clear-audiobook/dp/B07RFSSYBH/ref=sr_1_1?keywords=atomic+habits&qid=1685192621&s=books&sprefix=atomi%2Cstripbooks-intl-ship%2C315&sr=1-1'
  7. headers = ({'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36','Accept-Language': 'en-US, en:q =0.5' })
  8. page = requests.get(URL, headers=headers)
  9. soup = BeautifulSoup(page.text, "html.parser")
  10. # print(soup.prettify())
  11. title = soup.find(id='productTitle'). get_text()
  12. price = soup.find("span", attrs={"class": 'a-size-base a-color-secondary'}).text
  13. price = price.strip()
  14. title = title.strip()
  15. today = datetime.date.today()
  16. header = ['title', 'price', 'Date']
  17. data = [title, price, today]
  18. with open('AmazonWebScraper.csv', 'w', newline='', encoding='utf-8') as f:
  19. writer = csv.writer(f)
  20. writer. writerow(header)
  21. writer.writerow(data)
  22. df = pd.read_csv(r'AmazonWebScraper.csv')
  23. print(df)

output: AmazonWebScraper.csv

  1. title price Date
  2. 0 Atomic Habits: An Easy & Proven Way to Build G... $12.99 2023-05-30

huangapple
  • 本文由 发表于 2023年5月30日 05:30:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76360420.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定