英文:
Why do parts of the HTML that appear using Inspect Element, not appear after parsing the HTML using BeautifulSoup?
问题
我明白,以下是您提供的代码的翻译部分:
我正试图使用Python和Beautiful Soup从DraftKings网站上网爬取MLB棒球击打手球员提议赔率数据,但在使用SelectorGadget或检查元素时出现的数据未在解析的HTML中。我正在尝试爬取的网站是:https://sportsbook.draftkings.com/leagues/baseball/mlb?category=batter-props&subcategory=total-bases
我已成功编写了以下代码,以创建包含每个球员全垒打赔率的数据框。然而,当我将URL更改为不同的子类别(例如"total-bases")时,与每个变量相关联的CSS选择器不再出现在解析的HTML中。我不确定如何在CSS选择器和数据不出现在解析的HTML中的情况下创建新的数据框。目前,我正在更改URL变量以更改子类别,但我愿意接受其他建议。我应该如何继续?我需要使用BeautifulSoup之外的其他包吗?
请注意,这只是代码的翻译部分,不包括问题或其他内容。如果您有任何其他需求,请告诉我。
英文:
I am attempting to web scrape MLB Batter Player Prop Lines from DraftKings using Python and Beautiful Soup, but the data that appears while using SelectorGadget or Inspect Element is not in the parsed HTML. The website I am trying to scrape from is: https://sportsbook.draftkings.com/leagues/baseball/mlb?category=batter-props&subcategory=total-bases
I have successfully written the following code to create a data frame containing each player's homerun odds. However, when I change the URL to a different subcategory (i.e. "total-bases"), the CSS selectors tied to each variable no longer appear in the parsed HTML. I am unsure how to make new dataframes if the CSS selectors and the data is not appearing in the parsed HTML. Currently, I am altering the URL variable to change subcategories, but I am open to other suggestions. How should I proceed? Do I need to use a different package than BeautifulSoup?
import requests
import numpy as np
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://sportsbook.draftkings.com/leagues/baseball/mlb?category=batter-props&subcategory=home-runs'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
player_names = \[row.get_text() for row in soup.find_all(class\_="sportsbook-row-name")\]
lines = values = \[float(element.text.replace(",", "")) for element in soup.select("th+ .sportsbook-table__column-row .sportsbook-outcome-cell__line") if element.text.strip()\]
over = \[int(element.text.strip().replace('−', '-')) for element in soup.select("th+ .sportsbook-table__column-row .default-color")\]
under = \[int(element.text.strip().replace('−', '-')) for element in soup.select("td+ .sportsbook-table__column-row .default-color") if element.text.strip().replace('−', '-')\]
df = pd.DataFrame({"Player Name": player_names, "Line": lines, "Over": over, "Under": under})
答案1
得分: 0
在 web 浏览器中使用“检查元素”功能时,您正在查看浏览器解析 HTML 并进行任何必要调整或 JavaScript 操作后的最新 DOM 状态。
相比之下,使用 BeautifulSoup 解析 HTML 时,您直接处理原始的未经修改的 HTML 源代码,该源代码是从服务器接收到的。BeautifulSoup 解析 HTML 并创建解析树,但不执行任何 JavaScript 或应用浏览器可能执行的动态修改。由于 BeautifulSoup 不执行 JavaScript 或处理异步请求,这些操作的结果不会反映在解析后的 HTML 中。
英文:
When using the Inspect Element feature in web browsers, you are examining the up-to-date state of the DOM after the browser has parsed the HTML and made any required adjustments or JavaScript manipulations.
In contrast, when parsing HTML using BeautifulSoup , you are working directly with the original HTML source code in its unaltered form as received from the server. BeautifulSoup parses the HTML and creates a parse tree, but it does not execute any JavaScript or apply dynamic modifications that may be performed by the browser.Since BeautifulSoup does not execute JavaScript or handle asynchronous requests, the results of these actions will not be reflected in the parsed HTML.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论