网页抓取数据的格式化 BS4

huangapple go评论67阅读模式
英文:

Formatting web scraped data BS4

问题

I'm using the following code to scrape some data but at the moment it outputs like this:

output
Fulford Road
Water Lane
York
York
YO10 4PA
YO30 6PQ

The desired output is this:

line1 city postcode
Fulford Road York YO10 4PA
Water Lane York YO30 6PQ

Code

import requests
from bs4 import BeautifulSoup
import pandas as pd

list1 = []

response3 = requests.get("https://stores.aldi.co.uk/yorkshire-amp-humber/york")
soup3 = BeautifulSoup(response3.text, "html.parser")

try:
    for a1 in soup3.find_all('span', attrs={'class':'Address-field Address-line1'}):
        line1 = a1.get_text()
        print(line1)

    for a2 in soup3.find_all('span', attrs={'class':'Address-field Address-city'}):
        line2 = a2.get_text()
        print(line2)

    for a3 in soup3.find_all('span', attrs={'class':'Address-field Address-postalCode'}):
        line3 = a3.get_text()
        print(line3)

except:
    pass

data = pd.DataFrame(list1)

I'd really appreciate any support you can give me to solve this.

Thanks,
S

英文:

I'm using the following code to scrape some data but at the moment it outputs like this:

output
Fulford Road
Water Lane
York
York
YO10 4PA
YO30 6PQ

The desired output is this:

line1 city postcode
Fulford Road York YO10 4PA
Water Lane York YO30 6PQ

Code

import requests
from bs4 import BeautifulSoup
import pandas as pd



list1 = []



response3 = requests.get("https://stores.aldi.co.uk/yorkshire-amp-humber/york")
soup3 = BeautifulSoup(response3.text, "html.parser")
    
    
try:
    for a1 in soup3.find_all('span', attrs={'class':'Address-field Address-line1'}):
        line1 = a1.get_text()
        print(line1)

    for a2 in soup3.find_all('span', attrs={'class':'Address-field Address-city'}):
        line2 = a2.get_text()
        print(line2)

    for a3 in soup3.find_all('span', attrs={'class':'Address-field Address-postalCode'}):
        line3 = a3.get_text()
        print(line3)

  
except:
        pass
    
data = pd.DataFrame(list1) 

I'd really appreciate any support you can give me to solve this.

Thanks,
S

答案1

得分: 1

以下是您要翻译的内容:

按照您的方法,我会这样做(使用单个for循环):

list1 = []
for addr in soup3.find_all("div", class_="Address"):
    line1 = addr.find("span", class_="Address-line1").get_text()
    city = addr.find("span", class_="Address-city").get_text()
    postcode = addr.find("span", class_="Address-postalCode").get_text()
    list1.append([line1, city, postcode])

df = pd.DataFrame(list1, columns=["line1", "city", "postcode"])

另一种变体:

from collections import defaultdict

data = defaultdict(list)

for addr in soup3.find_all("div", class_="Address"):
    data["line1"].append(addr.find("span", class_="Address-line1").get_text())
    data["city"].append(addr.find("span", class_="Address-city").get_text())
    data["postcode"].append(addr.find("span", class_="Address-postalCode").get_text())

df = pd.DataFrame(data)

输出:

print(df)

          line1  city  postcode
0  Fulford Road  York  YO10 4PA
1    Water Lane  York  YO30 6PQ
英文:

Following your approach, I would do it this way (with a single for-loop) :

list1 = []
for addr in soup3.find_all("div", class_="Address"):
    line1 = addr.find("span", class_="Address-line1").get_text()
    city = addr.find("span", class_="Address-city").get_text()
    postcode = addr.find("span", class_="Address-postalCode").get_text()
    list1.append([line1, city, postcode])

df = pd.DataFrame(list1, columns=["line1", "city", "postcode"])

Another variant :

from collections import defaultdict

data = defaultdict(list)

for addr in soup3.find_all("div", class_="Address"):
    data["line1"].append(addr.find("span", class_="Address-line1").get_text())
    data["city"].append(addr.find("span", class_="Address-city").get_text())
    data["postcode"].append(addr.find("span", class_="Address-postalCode").get_text())

df = pd.DataFrame(data)

Output :

print(df)

          line1  city  postcode
0  Fulford Road  York  YO10 4PA
1    Water Lane  York  YO30 6PQ

huangapple
  • 本文由 发表于 2023年5月23日 00:40:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76308305.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定