英文:
Formatting web scraped data BS4
问题
I'm using the following code to scrape some data but at the moment it outputs like this:
output |
---|
Fulford Road |
Water Lane |
York |
York |
YO10 4PA |
YO30 6PQ |
The desired output is this:
line1 | city | postcode |
---|---|---|
Fulford Road | York | YO10 4PA |
Water Lane | York | YO30 6PQ |
Code
import requests
from bs4 import BeautifulSoup
import pandas as pd
list1 = []
response3 = requests.get("https://stores.aldi.co.uk/yorkshire-amp-humber/york")
soup3 = BeautifulSoup(response3.text, "html.parser")
try:
for a1 in soup3.find_all('span', attrs={'class':'Address-field Address-line1'}):
line1 = a1.get_text()
print(line1)
for a2 in soup3.find_all('span', attrs={'class':'Address-field Address-city'}):
line2 = a2.get_text()
print(line2)
for a3 in soup3.find_all('span', attrs={'class':'Address-field Address-postalCode'}):
line3 = a3.get_text()
print(line3)
except:
pass
data = pd.DataFrame(list1)
I'd really appreciate any support you can give me to solve this.
Thanks,
S
英文:
I'm using the following code to scrape some data but at the moment it outputs like this:
output |
---|
Fulford Road |
Water Lane |
York |
York |
YO10 4PA |
YO30 6PQ |
The desired output is this:
line1 | city | postcode |
---|---|---|
Fulford Road | York | YO10 4PA |
Water Lane | York | YO30 6PQ |
Code
import requests
from bs4 import BeautifulSoup
import pandas as pd
list1 = []
response3 = requests.get("https://stores.aldi.co.uk/yorkshire-amp-humber/york")
soup3 = BeautifulSoup(response3.text, "html.parser")
try:
for a1 in soup3.find_all('span', attrs={'class':'Address-field Address-line1'}):
line1 = a1.get_text()
print(line1)
for a2 in soup3.find_all('span', attrs={'class':'Address-field Address-city'}):
line2 = a2.get_text()
print(line2)
for a3 in soup3.find_all('span', attrs={'class':'Address-field Address-postalCode'}):
line3 = a3.get_text()
print(line3)
except:
pass
data = pd.DataFrame(list1)
I'd really appreciate any support you can give me to solve this.
Thanks,
S
答案1
得分: 1
以下是您要翻译的内容:
按照您的方法,我会这样做(使用单个for循环):
list1 = []
for addr in soup3.find_all("div", class_="Address"):
line1 = addr.find("span", class_="Address-line1").get_text()
city = addr.find("span", class_="Address-city").get_text()
postcode = addr.find("span", class_="Address-postalCode").get_text()
list1.append([line1, city, postcode])
df = pd.DataFrame(list1, columns=["line1", "city", "postcode"])
另一种变体:
from collections import defaultdict
data = defaultdict(list)
for addr in soup3.find_all("div", class_="Address"):
data["line1"].append(addr.find("span", class_="Address-line1").get_text())
data["city"].append(addr.find("span", class_="Address-city").get_text())
data["postcode"].append(addr.find("span", class_="Address-postalCode").get_text())
df = pd.DataFrame(data)
输出:
print(df)
line1 city postcode
0 Fulford Road York YO10 4PA
1 Water Lane York YO30 6PQ
英文:
Following your approach, I would do it this way (with a single for-loop) :
list1 = []
for addr in soup3.find_all("div", class_="Address"):
line1 = addr.find("span", class_="Address-line1").get_text()
city = addr.find("span", class_="Address-city").get_text()
postcode = addr.find("span", class_="Address-postalCode").get_text()
list1.append([line1, city, postcode])
df = pd.DataFrame(list1, columns=["line1", "city", "postcode"])
Another variant :
from collections import defaultdict
data = defaultdict(list)
for addr in soup3.find_all("div", class_="Address"):
data["line1"].append(addr.find("span", class_="Address-line1").get_text())
data["city"].append(addr.find("span", class_="Address-city").get_text())
data["postcode"].append(addr.find("span", class_="Address-postalCode").get_text())
df = pd.DataFrame(data)
Output :
print(df)
line1 city postcode
0 Fulford Road York YO10 4PA
1 Water Lane York YO30 6PQ
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论