如何使用Python的requests模块查找具有特定标签的数据?

huangapple go评论96阅读模式
英文:

How to find data with specific tags in Python using requests module?

问题

这是我需要从中获取内容的当前HTML部分。

<table class="table table-hover sortable-theme-minimal table-heatmap" data-sortable="">
 <thead>
  <tr>
   <th>
   </th>
   <th>
    Price
   </th>
   <th>
   </th>
   <th>
   </th>
   <th data-heatmap="1" data-heatmap-limit="5" style="text-align: center;cursor:pointer;">
    Day
   </th>
   <th data-heatmap="1" data-heatmap-limit="20" style="text-align: center;cursor:pointer">
    Month
   </th>
   <th data-heatmap="1" data-heatmap-limit="100" style="text-align: center;cursor:pointer">
    Year
   </th>
   <th class="hidden-xs" style="text-align: center;">
    Date
   </th>
  </tr>
 </thead>
 <tr data-decimals="3" data-subscribe="CL1:COM" data-symbol="CL1:COM">
  <td>
   <a href="/commodity/crude-oil">
    Crude Oil
   </a>
  </td>
  <td id="p">
   82.86

我需要获取那个82.86的数字,但是我似乎无法指定到tr data-decimals="3" data-subscribe="CL1:COM" data-symbol="CL1:COM"这一行。

以下是我当前的代码:

URL = "https://tradingeconomics.com/commodity/crude-oil"
page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, "html.parser")
results = soup.find("table", class_="table table-hover sortable-theme-minimal table-heatmap")
results1 = results.find("tr", text="data-symbol=\"CL1:COM\"")
print(results.prettify())

有没有办法指定"data-symbol"类并从其下的id="p"的td中获取数据?

英文:

This is the current HTML part that I need to get something from.

&lt;table class=&quot;table table-hover sortable-theme-minimal table-heatmap&quot; data-sortable=&quot;&quot;&gt;
 &lt;thead&gt;
  &lt;tr&gt;
   &lt;th&gt;
   &lt;/th&gt;
   &lt;th&gt;
    Price
   &lt;/th&gt;
   &lt;th&gt;
   &lt;/th&gt;
   &lt;th&gt;
   &lt;/th&gt;
   &lt;th data-heatmap=&quot;1&quot; data-heatmap-limit=&quot;5&quot; style=&quot;text-align: center;cursor:pointer;&quot;&gt;
    Day
   &lt;/th&gt;
   &lt;th data-heatmap=&quot;1&quot; data-heatmap-limit=&quot;20&quot; style=&quot;text-align: center;cursor:pointer&quot;&gt;
    Month
   &lt;/th&gt;
   &lt;th data-heatmap=&quot;1&quot; data-heatmap-limit=&quot;100&quot; style=&quot;text-align: center;cursor:pointer&quot;&gt;
    Year
   &lt;/th&gt;
   &lt;th class=&quot;hidden-xs&quot; style=&quot;text-align: center;&quot;&gt;
    Date
   &lt;/th&gt;
  &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tr data-decimals=&quot;3&quot; data-subscribe=&quot;CL1:COM&quot; data-symbol=&quot;CL1:COM&quot;&gt;
  &lt;td&gt;
   &lt;a href=&quot;/commodity/crude-oil&quot;&gt;
    Crude Oil
   &lt;/a&gt;
  &lt;/td&gt;
  &lt;td id=&quot;p&quot;&gt;
   82.86

I need to get that 82.86 number, but I can't seem to specify to the "tr data-decimals="3" data-subscribe="CL1:COM" data-symbol="CL1:COM"" line.

Here is my current code:

URL = &quot;https://tradingeconomics.com/commodity/crude-oil&quot;
page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, &quot;html.parser&quot;)
results = soup.find(&quot;table&quot;, class_=&quot;table table-hover sortable-theme-minimal table-heatmap&quot;)
results1 = results.find(&quot;tr&quot;, text=&quot;data-symbol=\&quot;CL1:COM\&quot;&quot;)
print(results.prettify())

Is there a way to specify the "data-symbol" class and get the data from the id="p" td under it?

答案1

得分: 1

你可以使用Beautiful Soup,尝试以下代码:

from bs4 import BeautifulSoup
import requests

URL = "https://tradingeconomics.com/commodity/crude-oil"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")

target_tr = soup.find("tr", attrs={"data-symbol": "CL1:COM"})
price_td = target_tr.find("td", id="p")

price = price_td.get_text(strip=True)
print("价格:", price)

另外,Selenium也可能会有用。祝好运!

英文:

You can use Beautiful Soup, try this:

from bs4 import BeautifulSoup
import requests

URL = &quot;https://tradingeconomics.com/commodity/crude-oil&quot;
page = requests.get(URL)
soup = BeautifulSoup(page.content, &quot;html.parser&quot;)

target_tr = soup.find(&quot;tr&quot;, attrs={&quot;data-symbol&quot;: &quot;CL1:COM&quot;})
price_td = target_tr.find(&quot;td&quot;, id=&quot;p&quot;)

price = price_td.get_text(strip=True)
print(&quot;Price:&quot;, price)

Also Selenium could prove useful..gl

huangapple
  • 本文由 发表于 2023年8月9日 04:29:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76863012.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定