英文:
Beautiful Soup Web Scraping - href
问题
我想提取HTML中的"href"部分(例如,示例中的网址链接:https://storelocator.homebargains.co.uk/store/A779/Quedgeley+Retail+Park,+Gloucester)。有没有办法获取它?
import requests
from bs4 import BeautifulSoup
url = "https://storelocator.homebargains.co.uk/all-stores"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
info = soup.find("td")
print(info)
英文:
I have the following code:
I want to extract the "href" bit from the html (e.g. the web link: https://storelocator.homebargains.co.uk/store/A779/Quedgeley+Retail+Park,+Gloucester) in this example. Any idea how I'd grab that?
import requests
from bs4 import BeautifulSoup
url = "https://storelocator.homebargains.co.uk/all-stores"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
info = soup.find("td")
print(info)
答案1
得分: 0
from bs4 import BeautifulSoup
import requests
BASE_URL = "https://storelocator.homebargains.co.uk"
STORES = f"{BASE_URL}/all-stores"
soup = BeautifulSoup(requests.get(STORES).text, "html.parser")
for a in soup.find_all("a", href=True):
if a["href"].startswith("/store"):
print(f"Text: {a.text} - URL: {BASE_URL}{a['href']}")
英文:
Something like this could do.
from bs4 import BeautifulSoup
import requests
BASE_URL = "https://storelocator.homebargains.co.uk"
STORES = f"{BASE_URL}/all-stores"
soup = BeautifulSoup(requests.get(STORES).text, "html.parser")
for a in soup.find_all("a", href=True):
if a["href"].startswith("/store"):
print(f"Text: {a.text} - URL: {BASE_URL}{a['href']}")
答案2
得分: 0
你可以使用css selectors
来获取所有商店链接,通过选择它们的特定位置避免重复:
[ 'https://storelocator.homebargains.co.uk'+a.get('href') for a in soup.select('tr td:first-of-type.store a')]
或者使用set comprehension
:
set('https://storelocator.homebargains.co.uk'+a.get('href') for a in soup.select('tr td.store a'))
提取href
可以使用get('href')
。
示例
import requests
from bs4 import BeautifulSoup
url = "https://storelocator.homebargains.co.uk/all-stores"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
['https://storelocator.homebargains.co.uk'+a.get('href') for a in soup.select('tr td:first-of-type.store a')]
输出
['https://storelocator.homebargains.co.uk/store/A779/Quedgeley+Retail+Park,+Gloucester',
'https://storelocator.homebargains.co.uk/store/A794/Wren+Retail+Park,+Torquay;+Torquay',
'https://storelocator.homebargains.co.uk/store/A816/Blairgowrie',
'https://storelocator.homebargains.co.uk/store/A270/Boulevard+Retail+Park,+Aberdeen',
'https://storelocator.homebargains.co.uk/store/A277/Inverurie+Retail+Park,+Oldeldrum+Road',
'https://storelocator.homebargains.co.uk/store/A708/Berryden+Retail+Park,+Aberdeen',
'https://storelocator.homebargains.co.uk/store/A616/Bridge+of+Don+Retail+Park,+Denmore+Road,+Bridge+of+Don',
'https://storelocator.homebargains.co.uk/store/A433/Westhill+Shopping+Centre,+Aberdeen',
'https://storelocator.homebargains.co.uk/store/A131/Eastgate+Retail+Park,+Accrington',
'https://storelocator.homebargains.co.uk/store/A349/Graham+Street,+Airdrie',
'https://storelocator.homebargains.co.uk/store/A128/Rookery+Parade,+Aldridge,+West+Midlands',
'https://storelocator.homebargains.co.uk/store/A136/Institute+Lane,+Alfreton',...]
英文:
You could use css selectors
to get all the links to the stores avoiding duplicates by selecting them specific:
['https://storelocator.homebargains.co.uk'+a.get('href') for a in soup.select('tr td:first-of-type.store a')]
or use a set comprehension
:
set('https://storelocator.homebargains.co.uk'+a.get('href') for a in soup.select('tr td.store a'))
To extract the href
you could use get('href')
.
Example
import requests
from bs4 import BeautifulSoup
url = "https://storelocator.homebargains.co.uk/all-stores"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
['https://storelocator.homebargains.co.uk'+a.get('href') for a in soup.select('tr td:first-of-type.store a')]
Output
['https://storelocator.homebargains.co.uk/store/A779/Quedgeley+Retail+Park,+Gloucester',
'https://storelocator.homebargains.co.uk/store/A794/Wren+Retail+Park,+Torquay;+Torquay',
'https://storelocator.homebargains.co.uk/store/A816/Blairgowrie',
'https://storelocator.homebargains.co.uk/store/A270/Boulevard+Retail+Park,+Aberdeen',
'https://storelocator.homebargains.co.uk/store/A277/Inverurie+Retail+Park,+Oldeldrum+Road',
'https://storelocator.homebargains.co.uk/store/A708/Berryden+Retail+Park,+Aberdeen',
'https://storelocator.homebargains.co.uk/store/A616/Bridge+of+Don+Retail+Park,+Denmore+Road,+Bridge+of+Don',
'https://storelocator.homebargains.co.uk/store/A433/Westhill+Shopping+Centre,+Aberdeen',
'https://storelocator.homebargains.co.uk/store/A131/Eastgate+Retail+Park,+Accrington',
'https://storelocator.homebargains.co.uk/store/A349/Graham+Street,+Airdrie',
'https://storelocator.homebargains.co.uk/store/A128/Rookery+Parade,+Aldridge,+West+Midlands',
'https://storelocator.homebargains.co.uk/store/A136/Institute+Lane,+Alfreton',...]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论