将网页上的HTML表格中的数据提取到Excel中。

huangapple go评论83阅读模式
英文:

Extract data from an HTML table on a Web page into Excel

问题

我需要一些关于使用VBA从网站获取数据的帮助。我在Excel中有一张包含ETF代码、链接和价格的表格,我试图使用VBA从每个链接中获取昨天的收盘价,但问题是我不确定在这个位置应该写什么"HTMLDoc.getElementById(???)"。我无法在该网站的HTML代码中找到任何ID,希望你可以帮助我。

网站示例:https://www.boerse-frankfurt.de/en/etf/amundi-prime-global-ucits-etf-dr-c/price-history/historical-prices-and-volumes

我的代码:

Sub GetETFPrices()
    Dim IE As New InternetExplorer
    Dim HTMLDoc As HTMLDocument
    Dim ETFLink As String
    Dim ETFPrice As String
    Dim i As Long

    ' 遍历表格中的每一行
    For i = 2 To ThisWorkbook.Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Row
        ETFLink = ThisWorkbook.Worksheets("Sheet1").Cells(i, 2).Value ' 从第2列获取链接
        
        ' 在Internet Explorer中打开链接
        IE.Navigate ETFLink

        Do While IE.Busy
            DoEvents
        Loop

        ' 从HTML文档中获取ETF价格元素
        Set HTMLDoc = IE.document
        Set ETFRow = HTMLDoc.getElementById("???").Rows(HTMLDoc.getElementById("???").Rows.Length - 1)
        ETFPrice = ETFRow.Cells(3).innerText

        ' 更新Excel表格中的价格列
        ThisWorkbook.Worksheets("Sheet1").Cells(i, 3).Value = ETFPrice
    Next i

    IE.Quit
End Sub
英文:

I need some help with taking data from website using vba. I have table in excel with etf ticker, link and price and using vba I try to take close price of yesterday from each link but the problem is that I am not sure what to write at this place "HTMLDoc.getElementById("???")". I can't find any ID from that website html code, hope you can help me with that.

website example: https://www.boerse-frankfurt.de/en/etf/amundi-prime-global-ucits-etf-dr-c/price-history/historical-prices-and-volumes

my code:


Sub GetETFPrices()
    Dim IE As New InternetExplorer
    Dim HTMLDoc As HTMLDocument
    Dim ETFLink As String
    Dim ETFPrice As String
    Dim i As Long

    ' Loop through each row in the table
    For i = 2 To ThisWorkbook.Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).row
        ETFLink = ThisWorkbook.Worksheets("Sheet1").Cells(i, 2).Value ' Get link from column 2
        
        ' Opening the link in Internet Explorer
        IE.Navigate ETFLink

        
        Do While IE.Busy
            DoEvents
        Loop

        ' Get the ETF price element from the HTML document
        Set HTMLDoc = IE.document
        Set ETFRow = HTMLDoc.getElementById("???").Rows(HTMLDoc.getElementById("???").Rows.Length - 1)
        ETFPrice = ETFRow.Cells(3).innerText

        ' Updating the price column in Excel sheet
        ThisWorkbook.Worksheets("Sheet1").Cells(i, 3).Value = ETFPrice
    Next i

    IE.Quit
End Sub

答案1

得分: 1

如果您不必使用Internet Explorer,我建议使用Selenium,它与Chrome(和FireFox)兼容。

现在,看一下您提供的链接页面,这是一个情况,您的数据位于表格中,并且没有id属性可用于定位特定单元格。因此,我建议将整个表格从页面导入Excel中,然后您可以更容易地导航以查找您感兴趣的数据点。

您可以使用以下方法:

Sub ImportHtmlTable()

    '初始化Selenium
    Dim bot As WebDriver
    Set bot = New WebDriver
    
    bot.Start "chrome", "YourUrl"
    bot.Get "/"
    DoEvents

    Dim Tables As WebElements
    Set Tables = bot.FindElementsByTag("table")

    Dim wb As Workbook
    Set wb = Workbooks("YourWorkbookName")
    HtmlTablesToRange Tables, wb.Sheets(1).Range("A1")
    
    bot.Close
    Set bot = Nothing

End Sub

'受以下代码启发:https://www.vba-market.com/
Sub HtmlTablesToRange(Tables As WebElements, Destination As Range)

    Destination.CurrentRegion.ClearContents

    Dim tb As WebElement
    Dim ths As WebElements '表头 (th)
    Dim trs As WebElements '行 (tr)
    Dim tds As WebElements '数据单元格 (td)
    
    For Each tb In Tables
    
        Dim theads As WebElements
        Set theads = tb.FindElementsByTag("thead")
        Dim thead As WebElement
        For Each thead In theads
            Set trs = thead.FindElementsByTag("tr")
            Dim tr As WebElement
            For Each tr In trs
                Set ths = tr.FindElementsByTag("th")
                Dim y As Long, z As Long
                y = 0 ' 重置到第一列
                Dim th As WebElement
                For Each th In ths
                    Destination.Offset(z, y).Value = th.text
                    y = y + 1
                Next th
                z = z + 1
            Next tr
        Next thead
 
        Dim tbodys As WebElements
        Set tbodys = tb.FindElementsByTag("tbody")
        Dim tbody As WebElement
        For Each tbody In tbodys
            Set trs = tbody.FindElementsByTag("tr")
            For Each tr In trs
                Set tds = tr.FindElementsByTag("td")
                y = 0 ' 重置到第一列
                Dim td As WebElement
                For Each td In tds
                    Destination.Offset(z, y).Value = td.text
                    y = y + 1
                Next td
                z = z + 1
            Next tr
        Next tbody
        z = z + 1
    Next tb

End Sub

如果需要提高性能,您还可以在代码执行期间关闭Application.ScreenUpdating

请注意,您可能需要更新Chrome驱动程序,通常位于C:\Users\YourUserName\AppData\Local\SeleniumBasic

免责声明:请确保您被允许从您感兴趣的网站收集数据。

英文:

If you don't have to use Internet Explorer, I would suggest to use Selenium which is compatible with Chrome (and FireFox).

Now, looking at the page you linked, this is the case where your data is inside a table and there is no id attribute that will allow you to target a specific cell. Because of that, I would suggest to import the whole table from the page in Excel and then you can more easily navigate to find the data point you are interested in.

You could use this approach for instance:

Sub ImportHtmlTable()

    'Initialize Selenium
    Dim bot As WebDriver
    Set bot = New WebDriver
    
    bot.Start "chrome", "YourUrl"
    bot.Get "/"
    DoEvents

    Dim Tables As WebElements
    Set Tables = bot.FindElementsByTag("table")

    Dim wb As Workbook
    Set wb = Workbooks("YourWorkbookName")
    HtmlTablesToRange Tables, wb.Sheets(1).Range("A1")
    
    bot.Close
    Set bot = Nothing

End Sub

'Inspired by code from: https://www.vba-market.com/
Sub HtmlTablesToRange(Tables As WebElements, Destination As Range)

    Destination.CurrentRegion.ClearContents

    Dim tb As WebElement
    Dim ths As WebElements 'Headers    (th)
    Dim trs As WebElements 'Rows       (tr)
    Dim tds As WebElements 'Data cells (td)
    
    For Each tb In Tables
    
        Dim theads As WebElements
        Set theads = tb.FindElementsByTag("thead")
        Dim thead As WebElement
        For Each thead In theads
            Set trs = thead.FindElementsByTag("tr")
            Dim tr As WebElement
            For Each tr In trs
                Set ths = tr.FindElementsByTag("th")
                Dim y As Long, z As Long
                y = 0 ' Resets back to first column
                Dim th As WebElement
                For Each th In ths
                    Destination.Offset(z, y).Value = th.text
                    y = y + 1
                Next th
                z = z + 1
            Next tr
        Next thead
 
        Dim tbodys As WebElements
        Set tbodys = tb.FindElementsByTag("tbody")
        Dim tbody As WebElement
        For Each tbody In tbodys
            Set trs = tbody.FindElementsByTag("tr")
            For Each tr In trs
                Set tds = tr.FindElementsByTag("td")
                y = 0 ' Resets back to first column
                Dim td As WebElement
                For Each td In tds
                    Destination.Offset(z, y).Value = td.text
                    y = y + 1
                Next td
                z = z + 1
            Next tr
        Next tbody
        z = z + 1
    Next tb

End Sub

If you need to improve performance, you can also turn off Application.ScreenUpdating during the execution of the code.

Note that you might need to update the Chrome driver usually located in C:\Users\YourUserName\AppData\Local\SeleniumBasic

Disclaimer: Always make sure that you are allowed to gather data from the website you are interested in.

huangapple
  • 本文由 发表于 2023年4月19日 18:22:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76053378.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定