如何在点击按钮时获取出现的数据?

huangapple go评论64阅读模式
英文:

How to scrape data that appear when click on button?

问题

我正在尝试从网站上提取电话号码,但只有在我点击第一个号码时才会出现。换句话说,电话号码将被隐藏在HTML代码中,当我点击时才会显示。你能帮忙吗?
我使用了以下代码:

import requests
from bs4 import BeautifulSoup

url = "https://hipages.com.au/connect/makermanservices"

req = requests.get(url).text

soup = BeautifulSoup(req, "html.parser")

phone = soup.find('a', class_='PhoneNumber__MobileOnly-sc-4ewwun-1 izNnbI phone-number__mobile')
print(phone)
英文:

I am trying to scrape phone numbers from website, but the numbers will appear only if I click on the first number. In other words, the phone will be hidden in the HTML code, and when I click it will appear. can you help please?
I used the following code:

import requests
from bs4 import BeautifulSoup

url = "https://hipages.com.au/connect/makermanservices"

req = requests.get(url).text

soup = BeautifulSoup(req,"html.parser")

phone = soup.find('a', class_='PhoneNumber__MobileOnly-sc-4ewwun-1 izNnbI phone-number__mobile')
print(phone)

答案1

得分: 1

通过一些黑客技巧,你可以使用 bs4pandas 获取电话号码。

例如:

import json
import re

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://hipages.com.au/connect/makermanservices"
script_text = "window.__INITIAL_PROPS__="

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.48",
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "lxml")
script = soup.find("script", string=lambda t: t and script_text in t)
data = json.loads(re.search(script_text + r"(.+)", script.string).group(1))
df = (
    pd.read_json(data)
    ["fetchKey-7-0-0_/connect/makermanservices"]
    ["site"]
    ["primary_location"]
    ["phone"]
)
print(df)

这应该打印出:

1800 801 828
英文:

With a little bit of hacking, you can get the phone number with the help of bs4 and pandas.

For example:

import json
import re

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://hipages.com.au/connect/makermanservices"
script_text = "window.__INITIAL_PROPS__="

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.48",
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "lxml")
script = soup.find("script", string=lambda t: t and script_text in t)
data = json.loads(re.search(script_text + r"(.+)", script.string).group(1))
df = (
    pd.read_json(data)
    ["fetchKey-7-0-0_/connect/makermanservices"]
    ["site"]
    ["primary_location"]
    ["phone"]
)
print(df)

This should print:

1800 801 828

huangapple
  • 本文由 发表于 2023年4月19日 21:50:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055331.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定