英文:
Trying to paginate scraped data
问题
以下是您要翻译的内容:
I'm trying to get a list of links for each page I'm trying to scrape. I can get the required data from page1 but when I try and expand to other pages I am having a hard time. Can anyone point me in the right direction pls?
from requests_html import HTMLSession
s = HTMLSession()
def get_product_links(page):
url = "https://lakesshoweringspaces.com/catalogue-product-filter/page/{page}"
r = s.get(url)
products = r.html.find("article.contentwrapper section.collection-wrapper-item")
for item in products:
res = links.append(item.find("a", first=True))
if res:
print(res.attrs["href"])
else:
print("no match")
return links
page1 = get_product_links(1)
print(page1)
英文:
I'm trying to get a list of links for each page I'm trying to scrape. I can get the required data from page1 but when I try and expand to other pages I am having a hard time. Can anyone point me in the right direction pls?
from requests_html import HTMLSession
s = HTMLSession()
def get_product_links(page):
url = "https://lakesshoweringspaces.com/catalogue-product-filter/page/{page}"
r = s.get(url)
products = r.html.find("article.contentwrapper section.collection-wrapper-item")
for item in products:
res = links.append(item.find("a", first=True))
if res:
print(res.attrs["href"])
else:
print("no match")
return links
page1 = get_product_links(1)
print(page1)
答案1
得分: 0
I seem to have got this working:
from IPython.core.interactiveshell import page
from requests_html import HTMLSession
s = HTMLSession()
def get_product_links(page):
url = f'https://lakesshoweringspaces.com/catalogue-product-filter/page/{page}'
links = []
r = s.get(url)
products = r.html.find('article.contentwrapper section')
for item in products:
q = links.append(item.find("a", first=True))
if q:
print(q.attrs["href"]).text.strip()
else:
print("")
return links
test_link = 'https://lakesshoweringspaces.com/catalogue_product/alassio/'
r = s.get(test_link)
print(r.html.find('div.product-sidecontent h3', first=True).text.strip())
英文:
I seem to have got this working:
from IPython.core.interactiveshell import page
from requests_html import HTMLSession
s = HTMLSession()
def get_product_links(page):
url = f'https://lakesshoweringspaces.com/catalogue-product-filter/page/{page}'
links = []
r = s.get(url)
products = r.html.find('article.contentwrapper section')
for item in products:
q = links.append(item.find("a", first=True))
if q:
print(q.attrs["href"]).text.strip()
else:
print("")
return links
test_link = 'https://lakesshoweringspaces.com/catalogue_product/alassio/'
r = s.get(test_link)
print(r.html.find('div.product-sidecontent h3', first=True).text.strip())
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论