如何使用Python的requests模块查找具有特定标签的数据?

huangapple go评论146阅读模式
英文:

How to find data with specific tags in Python using requests module?

问题

这是我需要从中获取内容的当前HTML部分。

  1. <table class="table table-hover sortable-theme-minimal table-heatmap" data-sortable="">
  2. <thead>
  3. <tr>
  4. <th>
  5. </th>
  6. <th>
  7. Price
  8. </th>
  9. <th>
  10. </th>
  11. <th>
  12. </th>
  13. <th data-heatmap="1" data-heatmap-limit="5" style="text-align: center;cursor:pointer;">
  14. Day
  15. </th>
  16. <th data-heatmap="1" data-heatmap-limit="20" style="text-align: center;cursor:pointer">
  17. Month
  18. </th>
  19. <th data-heatmap="1" data-heatmap-limit="100" style="text-align: center;cursor:pointer">
  20. Year
  21. </th>
  22. <th class="hidden-xs" style="text-align: center;">
  23. Date
  24. </th>
  25. </tr>
  26. </thead>
  27. <tr data-decimals="3" data-subscribe="CL1:COM" data-symbol="CL1:COM">
  28. <td>
  29. <a href="/commodity/crude-oil">
  30. Crude Oil
  31. </a>
  32. </td>
  33. <td id="p">
  34. 82.86

我需要获取那个82.86的数字,但是我似乎无法指定到tr data-decimals="3" data-subscribe="CL1:COM" data-symbol="CL1:COM"这一行。

以下是我当前的代码:

  1. URL = "https://tradingeconomics.com/commodity/crude-oil"
  2. page = requests.get(URL, headers=headers)
  3. soup = BeautifulSoup(page.content, "html.parser")
  4. results = soup.find("table", class_="table table-hover sortable-theme-minimal table-heatmap")
  5. results1 = results.find("tr", text="data-symbol=\"CL1:COM\"")
  6. print(results.prettify())

有没有办法指定"data-symbol"类并从其下的id="p"的td中获取数据?

英文:

This is the current HTML part that I need to get something from.

  1. &lt;table class=&quot;table table-hover sortable-theme-minimal table-heatmap&quot; data-sortable=&quot;&quot;&gt;
  2. &lt;thead&gt;
  3. &lt;tr&gt;
  4. &lt;th&gt;
  5. &lt;/th&gt;
  6. &lt;th&gt;
  7. Price
  8. &lt;/th&gt;
  9. &lt;th&gt;
  10. &lt;/th&gt;
  11. &lt;th&gt;
  12. &lt;/th&gt;
  13. &lt;th data-heatmap=&quot;1&quot; data-heatmap-limit=&quot;5&quot; style=&quot;text-align: center;cursor:pointer;&quot;&gt;
  14. Day
  15. &lt;/th&gt;
  16. &lt;th data-heatmap=&quot;1&quot; data-heatmap-limit=&quot;20&quot; style=&quot;text-align: center;cursor:pointer&quot;&gt;
  17. Month
  18. &lt;/th&gt;
  19. &lt;th data-heatmap=&quot;1&quot; data-heatmap-limit=&quot;100&quot; style=&quot;text-align: center;cursor:pointer&quot;&gt;
  20. Year
  21. &lt;/th&gt;
  22. &lt;th class=&quot;hidden-xs&quot; style=&quot;text-align: center;&quot;&gt;
  23. Date
  24. &lt;/th&gt;
  25. &lt;/tr&gt;
  26. &lt;/thead&gt;
  27. &lt;tr data-decimals=&quot;3&quot; data-subscribe=&quot;CL1:COM&quot; data-symbol=&quot;CL1:COM&quot;&gt;
  28. &lt;td&gt;
  29. &lt;a href=&quot;/commodity/crude-oil&quot;&gt;
  30. Crude Oil
  31. &lt;/a&gt;
  32. &lt;/td&gt;
  33. &lt;td id=&quot;p&quot;&gt;
  34. 82.86

I need to get that 82.86 number, but I can't seem to specify to the "tr data-decimals="3" data-subscribe="CL1:COM" data-symbol="CL1:COM"" line.

Here is my current code:

  1. URL = &quot;https://tradingeconomics.com/commodity/crude-oil&quot;
  2. page = requests.get(URL, headers=headers)
  3. soup = BeautifulSoup(page.content, &quot;html.parser&quot;)
  4. results = soup.find(&quot;table&quot;, class_=&quot;table table-hover sortable-theme-minimal table-heatmap&quot;)
  5. results1 = results.find(&quot;tr&quot;, text=&quot;data-symbol=\&quot;CL1:COM\&quot;&quot;)
  6. print(results.prettify())

Is there a way to specify the "data-symbol" class and get the data from the id="p" td under it?

答案1

得分: 1

你可以使用Beautiful Soup,尝试以下代码:

  1. from bs4 import BeautifulSoup
  2. import requests
  3. URL = "https://tradingeconomics.com/commodity/crude-oil"
  4. page = requests.get(URL)
  5. soup = BeautifulSoup(page.content, "html.parser")
  6. target_tr = soup.find("tr", attrs={"data-symbol": "CL1:COM"})
  7. price_td = target_tr.find("td", id="p")
  8. price = price_td.get_text(strip=True)
  9. print("价格:", price)

另外,Selenium也可能会有用。祝好运!

英文:

You can use Beautiful Soup, try this:

  1. from bs4 import BeautifulSoup
  2. import requests
  3. URL = &quot;https://tradingeconomics.com/commodity/crude-oil&quot;
  4. page = requests.get(URL)
  5. soup = BeautifulSoup(page.content, &quot;html.parser&quot;)
  6. target_tr = soup.find(&quot;tr&quot;, attrs={&quot;data-symbol&quot;: &quot;CL1:COM&quot;})
  7. price_td = target_tr.find(&quot;td&quot;, id=&quot;p&quot;)
  8. price = price_td.get_text(strip=True)
  9. print(&quot;Price:&quot;, price)

Also Selenium could prove useful..gl

huangapple
  • 本文由 发表于 2023年8月9日 04:29:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76863012.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定