Beautiful Soup 网页抓取 – href

huangapple go评论69阅读模式
英文:

Beautiful Soup Web Scraping - href

问题

我想提取HTML中的"href"部分(例如,示例中的网址链接:https://storelocator.homebargains.co.uk/store/A779/Quedgeley+Retail+Park,+Gloucester)。有没有办法获取它?

import requests
from bs4 import BeautifulSoup

url = "https://storelocator.homebargains.co.uk/all-stores"
soup = BeautifulSoup(requests.get(url).text, "html.parser")

info = soup.find("td")

print(info)
英文:

I have the following code:

I want to extract the "href" bit from the html (e.g. the web link: https://storelocator.homebargains.co.uk/store/A779/Quedgeley+Retail+Park,+Gloucester) in this example. Any idea how I'd grab that?

import requests
from bs4 import BeautifulSoup

url = "https://storelocator.homebargains.co.uk/all-stores"
soup = BeautifulSoup(requests.get(url).text, "html.parser")

info = soup.find("td")

print(info)

答案1

得分: 0

from bs4 import BeautifulSoup
import requests

BASE_URL = "https://storelocator.homebargains.co.uk"
STORES = f"{BASE_URL}/all-stores"
soup = BeautifulSoup(requests.get(STORES).text, "html.parser")

for a in soup.find_all("a", href=True):
    if a["href"].startswith("/store"):
        print(f"Text: {a.text} - URL: {BASE_URL}{a['href']}")
英文:

Something like this could do.

from bs4 import BeautifulSoup
import requests

BASE_URL = "https://storelocator.homebargains.co.uk"
STORES = f"{BASE_URL}/all-stores"
soup = BeautifulSoup(requests.get(STORES).text, "html.parser")

for a in soup.find_all("a", href=True):
    if a["href"].startswith("/store"):
        print(f"Text: {a.text} - URL: {BASE_URL}{a['href']}")

答案2

得分: 0

你可以使用css selectors来获取所有商店链接,通过选择它们的特定位置避免重复:

[ 'https://storelocator.homebargains.co.uk'+a.get('href') for a in soup.select('tr td:first-of-type.store a')]

或者使用set comprehension

set('https://storelocator.homebargains.co.uk'+a.get('href') for a in soup.select('tr td.store a'))

提取href可以使用get('href')

示例
import requests
from bs4 import BeautifulSoup

url = "https://storelocator.homebargains.co.uk/all-stores"
soup = BeautifulSoup(requests.get(url).text, "html.parser")

['https://storelocator.homebargains.co.uk'+a.get('href') for a in soup.select('tr td:first-of-type.store a')]
输出
['https://storelocator.homebargains.co.uk/store/A779/Quedgeley+Retail+Park,+Gloucester',
 'https://storelocator.homebargains.co.uk/store/A794/Wren+Retail+Park,+Torquay;+Torquay',
 'https://storelocator.homebargains.co.uk/store/A816/Blairgowrie',
 'https://storelocator.homebargains.co.uk/store/A270/Boulevard+Retail+Park,+Aberdeen',
 'https://storelocator.homebargains.co.uk/store/A277/Inverurie+Retail+Park,+Oldeldrum+Road',
 'https://storelocator.homebargains.co.uk/store/A708/Berryden+Retail+Park,+Aberdeen',
 'https://storelocator.homebargains.co.uk/store/A616/Bridge+of+Don+Retail+Park,+Denmore+Road,+Bridge+of+Don',
 'https://storelocator.homebargains.co.uk/store/A433/Westhill+Shopping+Centre,+Aberdeen',
 'https://storelocator.homebargains.co.uk/store/A131/Eastgate+Retail+Park,+Accrington',
 'https://storelocator.homebargains.co.uk/store/A349/Graham+Street,+Airdrie',
 'https://storelocator.homebargains.co.uk/store/A128/Rookery+Parade,+Aldridge,+West+Midlands',
 'https://storelocator.homebargains.co.uk/store/A136/Institute+Lane,+Alfreton',...]
英文:

You could use css selectors to get all the links to the stores avoiding duplicates by selecting them specific:

['https://storelocator.homebargains.co.uk'+a.get('href') for a in soup.select('tr td:first-of-type.store a')]

or use a set comprehension:

set('https://storelocator.homebargains.co.uk'+a.get('href') for a in soup.select('tr td.store a'))

To extract the href you could use get('href').

Example
import requests
from bs4 import BeautifulSoup

url = "https://storelocator.homebargains.co.uk/all-stores"
soup = BeautifulSoup(requests.get(url).text, "html.parser")

['https://storelocator.homebargains.co.uk'+a.get('href') for a in soup.select('tr td:first-of-type.store a')]
Output
['https://storelocator.homebargains.co.uk/store/A779/Quedgeley+Retail+Park,+Gloucester',
 'https://storelocator.homebargains.co.uk/store/A794/Wren+Retail+Park,+Torquay;+Torquay',
 'https://storelocator.homebargains.co.uk/store/A816/Blairgowrie',
 'https://storelocator.homebargains.co.uk/store/A270/Boulevard+Retail+Park,+Aberdeen',
 'https://storelocator.homebargains.co.uk/store/A277/Inverurie+Retail+Park,+Oldeldrum+Road',
 'https://storelocator.homebargains.co.uk/store/A708/Berryden+Retail+Park,+Aberdeen',
 'https://storelocator.homebargains.co.uk/store/A616/Bridge+of+Don+Retail+Park,+Denmore+Road,+Bridge+of+Don',
 'https://storelocator.homebargains.co.uk/store/A433/Westhill+Shopping+Centre,+Aberdeen',
 'https://storelocator.homebargains.co.uk/store/A131/Eastgate+Retail+Park,+Accrington',
 'https://storelocator.homebargains.co.uk/store/A349/Graham+Street,+Airdrie',
 'https://storelocator.homebargains.co.uk/store/A128/Rookery+Parade,+Aldridge,+West+Midlands',
 'https://storelocator.homebargains.co.uk/store/A136/Institute+Lane,+Alfreton',...]

huangapple
  • 本文由 发表于 2023年4月17日 20:44:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76035288.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定