尝试对抓取的数据进行分页化

huangapple go评论57阅读模式
英文:

Trying to paginate scraped data

问题

以下是您要翻译的内容:

I'm trying to get a list of links for each page I'm trying to scrape. I can get the required data from page1 but when I try and expand to other pages I am having a hard time. Can anyone point me in the right direction pls?

from requests_html import HTMLSession

s = HTMLSession()
def get_product_links(page):
  url = "https://lakesshoweringspaces.com/catalogue-product-filter/page/{page}"

  r = s.get(url)
  products = r.html.find("article.contentwrapper section.collection-wrapper-item")
  for item in products:
      res = links.append(item.find("a", first=True))
      if res:
        print(res.attrs["href"])
      else:
        print("no match")
  return links

  page1 = get_product_links(1)
  print(page1)
英文:

I'm trying to get a list of links for each page I'm trying to scrape. I can get the required data from page1 but when I try and expand to other pages I am having a hard time. Can anyone point me in the right direction pls?

from requests_html import HTMLSession

s = HTMLSession()
def get_product_links(page):
  url = "https://lakesshoweringspaces.com/catalogue-product-filter/page/{page}"

  r = s.get(url)
  products = r.html.find("article.contentwrapper section.collection-wrapper-item")
  for item in products:
      res = links.append(item.find("a", first=True))
      if res:
        print(res.attrs["href"])
      else:
        print("no match")
  return links

  page1 = get_product_links(1)
  print(page1)

答案1

得分: 0

I seem to have got this working:

from IPython.core.interactiveshell import page
from requests_html import HTMLSession

s = HTMLSession()
def get_product_links(page):
  url = f'https://lakesshoweringspaces.com/catalogue-product-filter/page/{page}'
  links = []
  r = s.get(url)
  products = r.html.find('article.contentwrapper section')
  for item in products:
    q = links.append(item.find("a", first=True))
    if q:
        print(q.attrs["href"]).text.strip()
    else:
        print("")
  return links


test_link = 'https://lakesshoweringspaces.com/catalogue_product/alassio/'

r = s.get(test_link)

print(r.html.find('div.product-sidecontent h3', first=True).text.strip())
英文:

I seem to have got this working:

from IPython.core.interactiveshell import page
from requests_html import HTMLSession

s = HTMLSession()
def get_product_links(page):
  url = f'https://lakesshoweringspaces.com/catalogue-product-filter/page/{page}'
  links = []
  r = s.get(url)
  products = r.html.find('article.contentwrapper section')
  for item in products:
    q = links.append(item.find("a", first=True))
    if q:
        print(q.attrs["href"]).text.strip()
    else:
        print("")
  return links


test_link = 'https://lakesshoweringspaces.com/catalogue_product/alassio/'

r = s.get(test_link)

print(r.html.find('div.product-sidecontent h3', first=True).text.strip())

huangapple
  • 本文由 发表于 2023年2月10日 07:00:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/75405336.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定