网页抓取数据的格式化 BS4

huangapple go评论91阅读模式
英文:

Formatting web scraped data BS4

问题

I'm using the following code to scrape some data but at the moment it outputs like this:

output
Fulford Road
Water Lane
York
York
YO10 4PA
YO30 6PQ

The desired output is this:

line1 city postcode
Fulford Road York YO10 4PA
Water Lane York YO30 6PQ

Code

  1. import requests
  2. from bs4 import BeautifulSoup
  3. import pandas as pd
  4. list1 = []
  5. response3 = requests.get("https://stores.aldi.co.uk/yorkshire-amp-humber/york")
  6. soup3 = BeautifulSoup(response3.text, "html.parser")
  7. try:
  8. for a1 in soup3.find_all('span', attrs={'class':'Address-field Address-line1'}):
  9. line1 = a1.get_text()
  10. print(line1)
  11. for a2 in soup3.find_all('span', attrs={'class':'Address-field Address-city'}):
  12. line2 = a2.get_text()
  13. print(line2)
  14. for a3 in soup3.find_all('span', attrs={'class':'Address-field Address-postalCode'}):
  15. line3 = a3.get_text()
  16. print(line3)
  17. except:
  18. pass
  19. data = pd.DataFrame(list1)

I'd really appreciate any support you can give me to solve this.

Thanks,
S

英文:

I'm using the following code to scrape some data but at the moment it outputs like this:

output
Fulford Road
Water Lane
York
York
YO10 4PA
YO30 6PQ

The desired output is this:

line1 city postcode
Fulford Road York YO10 4PA
Water Lane York YO30 6PQ

Code

  1. import requests
  2. from bs4 import BeautifulSoup
  3. import pandas as pd
  4. list1 = []
  5. response3 = requests.get("https://stores.aldi.co.uk/yorkshire-amp-humber/york")
  6. soup3 = BeautifulSoup(response3.text, "html.parser")
  7. try:
  8. for a1 in soup3.find_all('span', attrs={'class':'Address-field Address-line1'}):
  9. line1 = a1.get_text()
  10. print(line1)
  11. for a2 in soup3.find_all('span', attrs={'class':'Address-field Address-city'}):
  12. line2 = a2.get_text()
  13. print(line2)
  14. for a3 in soup3.find_all('span', attrs={'class':'Address-field Address-postalCode'}):
  15. line3 = a3.get_text()
  16. print(line3)
  17. except:
  18. pass
  19. data = pd.DataFrame(list1)

I'd really appreciate any support you can give me to solve this.

Thanks,
S

答案1

得分: 1

以下是您要翻译的内容:

按照您的方法,我会这样做(使用单个for循环):

  1. list1 = []
  2. for addr in soup3.find_all("div", class_="Address"):
  3. line1 = addr.find("span", class_="Address-line1").get_text()
  4. city = addr.find("span", class_="Address-city").get_text()
  5. postcode = addr.find("span", class_="Address-postalCode").get_text()
  6. list1.append([line1, city, postcode])
  7. df = pd.DataFrame(list1, columns=["line1", "city", "postcode"])

另一种变体:

  1. from collections import defaultdict
  2. data = defaultdict(list)
  3. for addr in soup3.find_all("div", class_="Address"):
  4. data["line1"].append(addr.find("span", class_="Address-line1").get_text())
  5. data["city"].append(addr.find("span", class_="Address-city").get_text())
  6. data["postcode"].append(addr.find("span", class_="Address-postalCode").get_text())
  7. df = pd.DataFrame(data)

输出:

  1. print(df)
  2. line1 city postcode
  3. 0 Fulford Road York YO10 4PA
  4. 1 Water Lane York YO30 6PQ
英文:

Following your approach, I would do it this way (with a single for-loop) :

  1. list1 = []
  2. for addr in soup3.find_all("div", class_="Address"):
  3. line1 = addr.find("span", class_="Address-line1").get_text()
  4. city = addr.find("span", class_="Address-city").get_text()
  5. postcode = addr.find("span", class_="Address-postalCode").get_text()
  6. list1.append([line1, city, postcode])
  7. df = pd.DataFrame(list1, columns=["line1", "city", "postcode"])

Another variant :

  1. from collections import defaultdict
  2. data = defaultdict(list)
  3. for addr in soup3.find_all("div", class_="Address"):
  4. data["line1"].append(addr.find("span", class_="Address-line1").get_text())
  5. data["city"].append(addr.find("span", class_="Address-city").get_text())
  6. data["postcode"].append(addr.find("span", class_="Address-postalCode").get_text())
  7. df = pd.DataFrame(data)

Output :

  1. print(df)
  2. line1 city postcode
  3. 0 Fulford Road York YO10 4PA
  4. 1 Water Lane York YO30 6PQ

huangapple
  • 本文由 发表于 2023年5月23日 00:40:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76308305.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定