英文:
read data from chrome console to python
问题
以下是您要翻译的代码部分:
import requests
import lxml.html as html
import pandas as pd
url_padre = "https://www.op.gg/summoners/br/tercermundista"
link_farm = '//div[@class="stats"]//div[@class="cs"]'
r = requests.get(url_padre)
home = r.content.decode("utf-8")
parser = html.fromstring(home)
farm = parser.xpath(link_farm)
print(farm)
这段代码输出 "[]",但当在 Chrome 控制台中使用以下 XPath 查询时:$x('//div[@class="stats"]//div[@class="cs"]').map(x => x.innerText)
,可以得到我想要的数字,但我的 Python 代码不行。出了什么问题?
您需要的代码来解决这个问题:
import requests
import lxml.html as html
import pandas as pd
url_padre = "https://www.op.gg/summoners/br/tercermundista"
link_farm = '//div[@class="stats"]//div[@class="cs"]'
r = requests.get(url_padre)
home = r.content.decode("utf-8")
parser = html.fromstring(home)
farm = parser.xpath(link_farm)
for item in farm:
print(item.text_content())
这段代码将输出您想要的数字。如果仍然存在问题,请提供更多详细信息以获取更多帮助。
英文:
`I have a code in python to read xpath from a website (https://www.op.gg/summoners/kr/Hide%20on%20bush)
import requests
import lxml.html as html
import pandas as pd
url_padre = "https://www.op.gg/summoners/br/tercermundista"
link_farm = '//div[@class="stats"]//div[@class="cs"]'
r = requests.get(url_padre)
home=r.content.decode("utf-8")
parser=html.fromstring(home)
farm=parser.xpath(link_farm)
print(farm)`
this code print "[]"
but when in the console chrome put this xpath: $x('//div[@class="stats"]//div[@class="cs"]').map(x=>x.innerText), this print the numbers i want, but my python code dont do it
What is the mistake?
i want a code to solve my mistake
--------------------------edit---------------------------
Error Traceback (most recent call last)
c:\Users\GCO\Desktop\Analisis de datos\borradores\fsdfs.ipynb Cell 2 in 3
1 from playwright.sync_api import sync_playwright
----> 3 with sync_playwright() as p, p.chromium.launch() as browser:
4 page = browser.new_page()
5 page.goto("https://www.op.gg/summoners/kr/Hide%20on%20bush", timeout=10000)
File c:\Users\GCO\AppData\Local\Programs\Python\Python310\lib\site-packages\playwright\sync_api\_context_manager.py:47, in PlaywrightContextManager.__enter__(self)
45 self._own_loop = True
46 if self._loop.is_running():
---> 47 raise Error(
48 """It looks like you are using Playwright Sync API inside the asyncio loop.
49 Please use the Async API instead."""
50 )
52 # In Python 3.7, asyncio.Process.wait() hangs because it does not use ThreadedChildWatcher
53 # which is used in Python 3.8+. This is unix specific and also takes care about
54 # cleaning up zombie processes. See https://bugs.python.org/issue35621
55 if (
56 sys.version_info[0] == 3
57 and sys.version_info[1] == 7
58 and sys.platform != "win32"
59 and isinstance(asyncio.get_child_watcher(), asyncio.SafeChildWatcher)
60 ):
Error: It looks like you are using Playwright Sync API inside the asyncio loop.
Please use the Async API instead.
答案1
得分: 1
我了解到,您无法使用requests
获取动态生成的内容。
以下是使用playwright
解决此问题的方法,它可以在解析之前加载整个页面:
- 使用
pip install playwright
安装playwright。 - 使用
playwright install chromium --with-deps
安装浏览器和依赖项。 - 运行以下代码:
from playwright.sync_api import sync_playwright
with sync_playwright() as p, p.chromium.launch() as browser:
page = browser.new_page()
page.goto("https://www.op.gg/summoners/kr/Hide%20on%20bush", timeout=10000)
selector = "//div[@class='stats']//div[@class='cs']/div"
cs_stats = page.query_selector_all(selector)
print(len(cs_stats), [cs.inner_text() for cs in cs_stats])
如果您想继续使用lxml
作为解析工具,您可以使用以下代码:
from lxml import html
from playwright.sync_api import sync_playwright
with sync_playwright() as p, p.chromium.launch() as browser:
page = browser.new_page()
page.goto("https://www.op.gg/summoners/kr/Hide%20on%20bush", timeout=10000)
selector = "//div[@class='stats']//div[@class='cs']/div"
c = page.content()
parser = html.fromstring(c)
farm = parser.xpath(selector)
print(len(farm), [cs.text for cs in farm])
附言:
我还注意到op.gg使用相当简单的HTTP请求,不需要授权。您可以使用以下代码找到所需信息:
import json
from urllib.request import urlopen
url = "https://op.gg/api/v1.0/internal/bypass/games/kr/summoners/4b4tvMrpRRDLvXAiQ_Vmh5yMOsD0R3GPGTUVfIanp1Httg?&limit=20"
r = urlopen(url)
games = json.load(r).get("data", [])
print(games)
games
是包含所有所需信息的字典列表。CS统计信息存储在列表元素中,键为games[0]["myData"]["stats"]["minion_kill"]
。
唯一的困难是找到如何获取所需用户的summoner_id
(在您的示例中为4b4tvMrpRRDLvXAiQ_Vmh5yMOsD0R3GPGTUVfIanp1Httg
)。
英文:
As I understand you can not get dynamically generated content using requests
.
Here is solution using playwright
which can load whole page before parsing.
- Install playwright using
pip install playwright
- Install browser and dependencies using
playwright install chromium --with-deps
- Run following code
from playwright.sync_api import sync_playwright
with sync_playwright() as p, p.chromium.launch() as browser:
page = browser.new_page()
page.goto("https://www.op.gg/summoners/kr/Hide%20on%20bush", timeout=10000)
selector = "//div[@class='stats']//div[@class='cs']/div"
cs_stats = page.query_selector_all(selector)
print(len(cs_stats), [cs.inner_text() for cs in cs_stats])
If you want to stick with lxml
as parsing tool you can use following code:
from lxml import html
from playwright.sync_api import sync_playwright
with sync_playwright() as p, p.chromium.launch() as browser:
page = browser.new_page()
page.goto("https://www.op.gg/summoners/kr/Hide%20on%20bush", timeout=10000)
selector = "//div[@class='stats']//div[@class='cs']/div"
c = page.content()
parser = html.fromstring(c)
farm = parser.xpath(selector)
print(len(farm), [cs.text for cs in farm])
P.S.
Also I have noticed that op.gg use pretty simple HTTP requests that do not need authorization. You can find desired info using this code:
import json
from urllib.request import urlopen
url = "https://op.gg/api/v1.0/internal/bypass/games/kr/summoners/4b4tvMrpRRDLvXAiQ_Vmh5yMOsD0R3GPGTUVfIanp1Httg?&limit=20"
r = urlopen(url)
games = json.load(r).get("data", [])
print(games)
games
is a list of dicts that stores all info you need. CS stats are stored in list element under following keys: games[0]["myData"]["stats"]["minion_kill"]
The only difficult thing here is to find how to get summoner_id
for desired user (which is 4b4tvMrpRRDLvXAiQ_Vmh5yMOsD0R3GPGTUVfIanp1Httg
in your example)
答案2
得分: 1
你可以使用这个示例来从外部URL加载数据并计算CS值:
import re
import requests
url = "https://www.op.gg/summoners/kr/Hide%20on%20bush"
api_url = "https://op.gg/api/v1.0/internal/bypass/games/kr/summoners/{summoner_id}?=&limit=20&hl=en_US&game_type=total"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0"
}
html_doc = requests.get(url, headers=headers).text
summoner_id = re.search(r'"summoner_id":"(.*?)"', html_doc).group(1)
data = requests.get(api_url.format(summoner_id=summoner_id), headers=headers).json()
for d in data["data"]:
stats = d["myData"]["stats"]
kills = (
stats["minion_kill"]
+ stats["neutral_minion_kill_team_jungle"]
+ stats["neutral_minion_kill_enemy_jungle"]
+ stats["neutral_minion_kill"]
)
cs = kills / (d['game_length_second'] / 60)
print(f'{cs=:.1f}')
打印结果:
cs=6.7
cs=8.5
cs=8.2
cs=1.4
cs=7.3
cs=8.5
cs=6.8
cs=7.7
cs=8.7
cs=8.8
cs=5.6
cs=9.9
cs=7.0
cs=9.6
cs=9.7
cs=5.0
cs=7.5
cs=9.2
cs=9.0
cs=7.9
英文:
You can use this example how to load the data from external URL and compute the CS value:
import re
import requests
url = "https://www.op.gg/summoners/kr/Hide%20on%20bush"
api_url = "https://op.gg/api/v1.0/internal/bypass/games/kr/summoners/{summoner_id}?=&limit=20&hl=en_US&game_type=total"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0"
}
html_doc = requests.get(url, headers=headers).text
summoner_id = re.search(r'"summoner_id":"(.*?)"', html_doc).group(1)
data = requests.get(api_url.format(summoner_id=summoner_id), headers=headers).json()
for d in data["data"]:
stats = d["myData"]["stats"]
kills = (
stats["minion_kill"]
+ stats["neutral_minion_kill_team_jungle"]
+ stats["neutral_minion_kill_enemy_jungle"]
+ stats["neutral_minion_kill"]
)
cs = kills / (d['game_length_second'] / 60)
print(f'{cs=:.1f}')
Prints:
cs=6.7
cs=8.5
cs=8.2
cs=1.4
cs=7.3
cs=8.5
cs=6.8
cs=7.7
cs=8.7
cs=8.8
cs=5.6
cs=9.9
cs=7.0
cs=9.6
cs=9.7
cs=5.0
cs=7.5
cs=9.2
cs=9.0
cs=7.9
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论