将网页上的HTML表格中的数据提取到Excel中。

huangapple go评论106阅读模式
英文:

Extract data from an HTML table on a Web page into Excel

问题

我需要一些关于使用VBA从网站获取数据的帮助。我在Excel中有一张包含ETF代码、链接和价格的表格,我试图使用VBA从每个链接中获取昨天的收盘价,但问题是我不确定在这个位置应该写什么"HTMLDoc.getElementById(???)"。我无法在该网站的HTML代码中找到任何ID,希望你可以帮助我。

网站示例:https://www.boerse-frankfurt.de/en/etf/amundi-prime-global-ucits-etf-dr-c/price-history/historical-prices-and-volumes

我的代码:

  1. Sub GetETFPrices()
  2. Dim IE As New InternetExplorer
  3. Dim HTMLDoc As HTMLDocument
  4. Dim ETFLink As String
  5. Dim ETFPrice As String
  6. Dim i As Long
  7. ' 遍历表格中的每一行
  8. For i = 2 To ThisWorkbook.Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Row
  9. ETFLink = ThisWorkbook.Worksheets("Sheet1").Cells(i, 2).Value ' 从第2列获取链接
  10. ' 在Internet Explorer中打开链接
  11. IE.Navigate ETFLink
  12. Do While IE.Busy
  13. DoEvents
  14. Loop
  15. ' HTML文档中获取ETF价格元素
  16. Set HTMLDoc = IE.document
  17. Set ETFRow = HTMLDoc.getElementById("???").Rows(HTMLDoc.getElementById("???").Rows.Length - 1)
  18. ETFPrice = ETFRow.Cells(3).innerText
  19. ' 更新Excel表格中的价格列
  20. ThisWorkbook.Worksheets("Sheet1").Cells(i, 3).Value = ETFPrice
  21. Next i
  22. IE.Quit
  23. End Sub
英文:

I need some help with taking data from website using vba. I have table in excel with etf ticker, link and price and using vba I try to take close price of yesterday from each link but the problem is that I am not sure what to write at this place "HTMLDoc.getElementById("???")". I can't find any ID from that website html code, hope you can help me with that.

website example: https://www.boerse-frankfurt.de/en/etf/amundi-prime-global-ucits-etf-dr-c/price-history/historical-prices-and-volumes

my code:

  1. Sub GetETFPrices()
  2. Dim IE As New InternetExplorer
  3. Dim HTMLDoc As HTMLDocument
  4. Dim ETFLink As String
  5. Dim ETFPrice As String
  6. Dim i As Long
  7. ' Loop through each row in the table
  8. For i = 2 To ThisWorkbook.Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).row
  9. ETFLink = ThisWorkbook.Worksheets("Sheet1").Cells(i, 2).Value ' Get link from column 2
  10. ' Opening the link in Internet Explorer
  11. IE.Navigate ETFLink
  12. Do While IE.Busy
  13. DoEvents
  14. Loop
  15. ' Get the ETF price element from the HTML document
  16. Set HTMLDoc = IE.document
  17. Set ETFRow = HTMLDoc.getElementById("???").Rows(HTMLDoc.getElementById("???").Rows.Length - 1)
  18. ETFPrice = ETFRow.Cells(3).innerText
  19. ' Updating the price column in Excel sheet
  20. ThisWorkbook.Worksheets("Sheet1").Cells(i, 3).Value = ETFPrice
  21. Next i
  22. IE.Quit
  23. End Sub

答案1

得分: 1

如果您不必使用Internet Explorer,我建议使用Selenium,它与Chrome(和FireFox)兼容。

现在,看一下您提供的链接页面,这是一个情况,您的数据位于表格中,并且没有id属性可用于定位特定单元格。因此,我建议将整个表格从页面导入Excel中,然后您可以更容易地导航以查找您感兴趣的数据点。

您可以使用以下方法:

  1. Sub ImportHtmlTable()
  2. '初始化Selenium
  3. Dim bot As WebDriver
  4. Set bot = New WebDriver
  5. bot.Start "chrome", "YourUrl"
  6. bot.Get "/"
  7. DoEvents
  8. Dim Tables As WebElements
  9. Set Tables = bot.FindElementsByTag("table")
  10. Dim wb As Workbook
  11. Set wb = Workbooks("YourWorkbookName")
  12. HtmlTablesToRange Tables, wb.Sheets(1).Range("A1")
  13. bot.Close
  14. Set bot = Nothing
  15. End Sub
  16. '受以下代码启发:https://www.vba-market.com/
  17. Sub HtmlTablesToRange(Tables As WebElements, Destination As Range)
  18. Destination.CurrentRegion.ClearContents
  19. Dim tb As WebElement
  20. Dim ths As WebElements '表头 (th)
  21. Dim trs As WebElements ' (tr)
  22. Dim tds As WebElements '数据单元格 (td)
  23. For Each tb In Tables
  24. Dim theads As WebElements
  25. Set theads = tb.FindElementsByTag("thead")
  26. Dim thead As WebElement
  27. For Each thead In theads
  28. Set trs = thead.FindElementsByTag("tr")
  29. Dim tr As WebElement
  30. For Each tr In trs
  31. Set ths = tr.FindElementsByTag("th")
  32. Dim y As Long, z As Long
  33. y = 0 ' 重置到第一列
  34. Dim th As WebElement
  35. For Each th In ths
  36. Destination.Offset(z, y).Value = th.text
  37. y = y + 1
  38. Next th
  39. z = z + 1
  40. Next tr
  41. Next thead
  42. Dim tbodys As WebElements
  43. Set tbodys = tb.FindElementsByTag("tbody")
  44. Dim tbody As WebElement
  45. For Each tbody In tbodys
  46. Set trs = tbody.FindElementsByTag("tr")
  47. For Each tr In trs
  48. Set tds = tr.FindElementsByTag("td")
  49. y = 0 ' 重置到第一列
  50. Dim td As WebElement
  51. For Each td In tds
  52. Destination.Offset(z, y).Value = td.text
  53. y = y + 1
  54. Next td
  55. z = z + 1
  56. Next tr
  57. Next tbody
  58. z = z + 1
  59. Next tb
  60. End Sub

如果需要提高性能,您还可以在代码执行期间关闭Application.ScreenUpdating

请注意,您可能需要更新Chrome驱动程序,通常位于C:\Users\YourUserName\AppData\Local\SeleniumBasic

免责声明:请确保您被允许从您感兴趣的网站收集数据。

英文:

If you don't have to use Internet Explorer, I would suggest to use Selenium which is compatible with Chrome (and FireFox).

Now, looking at the page you linked, this is the case where your data is inside a table and there is no id attribute that will allow you to target a specific cell. Because of that, I would suggest to import the whole table from the page in Excel and then you can more easily navigate to find the data point you are interested in.

You could use this approach for instance:

  1. Sub ImportHtmlTable()
  2. 'Initialize Selenium
  3. Dim bot As WebDriver
  4. Set bot = New WebDriver
  5. bot.Start "chrome", "YourUrl"
  6. bot.Get "/"
  7. DoEvents
  8. Dim Tables As WebElements
  9. Set Tables = bot.FindElementsByTag("table")
  10. Dim wb As Workbook
  11. Set wb = Workbooks("YourWorkbookName")
  12. HtmlTablesToRange Tables, wb.Sheets(1).Range("A1")
  13. bot.Close
  14. Set bot = Nothing
  15. End Sub
  16. 'Inspired by code from: https://www.vba-market.com/
  17. Sub HtmlTablesToRange(Tables As WebElements, Destination As Range)
  18. Destination.CurrentRegion.ClearContents
  19. Dim tb As WebElement
  20. Dim ths As WebElements 'Headers (th)
  21. Dim trs As WebElements 'Rows (tr)
  22. Dim tds As WebElements 'Data cells (td)
  23. For Each tb In Tables
  24. Dim theads As WebElements
  25. Set theads = tb.FindElementsByTag("thead")
  26. Dim thead As WebElement
  27. For Each thead In theads
  28. Set trs = thead.FindElementsByTag("tr")
  29. Dim tr As WebElement
  30. For Each tr In trs
  31. Set ths = tr.FindElementsByTag("th")
  32. Dim y As Long, z As Long
  33. y = 0 ' Resets back to first column
  34. Dim th As WebElement
  35. For Each th In ths
  36. Destination.Offset(z, y).Value = th.text
  37. y = y + 1
  38. Next th
  39. z = z + 1
  40. Next tr
  41. Next thead
  42. Dim tbodys As WebElements
  43. Set tbodys = tb.FindElementsByTag("tbody")
  44. Dim tbody As WebElement
  45. For Each tbody In tbodys
  46. Set trs = tbody.FindElementsByTag("tr")
  47. For Each tr In trs
  48. Set tds = tr.FindElementsByTag("td")
  49. y = 0 ' Resets back to first column
  50. Dim td As WebElement
  51. For Each td In tds
  52. Destination.Offset(z, y).Value = td.text
  53. y = y + 1
  54. Next td
  55. z = z + 1
  56. Next tr
  57. Next tbody
  58. z = z + 1
  59. Next tb
  60. End Sub

If you need to improve performance, you can also turn off Application.ScreenUpdating during the execution of the code.

Note that you might need to update the Chrome driver usually located in C:\Users\YourUserName\AppData\Local\SeleniumBasic

Disclaimer: Always make sure that you are allowed to gather data from the website you are interested in.

huangapple
  • 本文由 发表于 2023年4月19日 18:22:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76053378.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定