在BeautifulSoup中格式化html_text。

huangapple go评论73阅读模式
英文:

Formatting html_text in BeautifulSoup

问题

I can help you format the address as requested. Here's the formatted address with elements separated:

Moor Street Station | Queensway | West Midlands | Birmingham | B4 7UL | United Kingdom

英文:

I've created the following to scrape the address details from a website:

#import packages
import requests
from bs4 import BeautifulSoup

postcodelist = []

l = "https://www.whsmith.co.uk/stores/details/?StoreID=" + "1513"
            
html_text = requests.get(l).text

soup = BeautifulSoup(html_text,'html.parser')

info = soup.find('div', attrs={'class':'t-store-details__store-address'})

name = info.text.strip()

print(name)

How do I format the address so that I can separate the elements out (e.g. address, city, postcode) in to separate columns as per below:

Moor Street Station Queensway | West Midlands | Birmingham | B4 7UL | United Kingdom

答案1

得分: 0

以下是翻译好的内容:

"Since you need columns, you potentially need a DataFrame ? If so, you can use get_text to separate the infos with a delimiter, then split them and finally make your DataFrame with [tag:pandas] :

#pip install pandas
import pandas as pd

infos = (soup.find("div", {"class": "t-store-details__store-address"})
.get_text("\n").split("\n") # add this line/statement
)

df = pd.DataFrame([infos]) #use the columns parameter to set a custom header

Output :

print(df)

            0                    1          2              3           4       5               6

0 Store Address: Moor Street Station Queensway West Midlands Birmingham B4 7UL United Kingdom

英文:

Since you need columns, you potentially need a DataFrame ? If so, you can use get_text to separate the infos with a delimiter, then split them and finally make your DataFrame with [tag:pandas] :

#pip install pandas
import pandas as pd

infos = (soup.find("div", {"class": "t-store-details__store-address"})
         .get_text("\n").split("\n") # add this line/statement
)
​
df = pd.DataFrame([infos]) #use the `columns` parameter to set a custom header


Output :

print(df)
​
                0                    1          2              3           4       5               6
0  Store Address:  Moor Street Station  Queensway  West Midlands  Birmingham  B4 7UL  United Kingdom

huangapple
  • 本文由 发表于 2023年4月17日 04:04:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76030069.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定