英文:
vb.net method to load 28million records to bigquery
问题
你好,我遇到了一个问题,无法找到一种将2800万条记录加载到BigQuery数据表中的方法,我可以使用插入命令来做,但效率不高,需要大约一周才能完成。有没有人知道通过vb.net更好的加载方法?非常感谢。
英文:
Good day, Im having trouble finding a way loading 28 million records into a bigquery datatable, I can do it with insert commands but its not efficient as it takes around a week to finish, does anyone know of a better way to load this through vb.net? thanks a lot
答案1
得分: 0
通常,一次处理大量数据的过程称为“批量”或“批处理”。
在您的情况下,它被称为批量加载数据。
有不同的方法可供选择,您需要查看哪种方法最适合您。也许是您可以直接从VB访问的API,也许您可以将数据写入文件,然后手动插入它。或者自动化该过程。
英文:
Generally, the process to process lots of data at once is called "bulk" or "batch".
In your case, it's called batch loading data.
Different ways are available, you have to check out what makes the most sense for you. Maybe it is an API you can access directly from VB, maybe you can write it to a file and insert it by hand. Or automate that process.
答案2
得分: 0
我下载了Google.Cloud.BigQuery.V2并使用了以下代码:"你需要使用.NET 6或更高版本才能运行它",文件必须是.csv格式。我从Oracle下载了数据并将其拆分成多个csv文件,然后迭代加载到gbq中。此外,要进行身份验证,您需要下载GoogleCloudSDKInstaller.exe,并通过命令行运行命令"gcloud auth application-default login"来生成所需的文件,这些文件将自动获取您的凭据,您可以在这里详细了解:https://cloud.google.com/docs/authentication/provide-credentials-adc?hl=es-419#how-to
Dim Drctrs As New System.IO.DirectoryInfo(Application.StartupPath & "\Results")
Dim projectId As String = "edw-sandbox"
Dim datasetId As String = "AD_HOC"
Dim tableId As String = "TABLE"
Dim client As BigQueryClient = BigQueryClient.Create(projectId)
Dim uploadCsvOptions As UploadCsvOptions = New UploadCsvOptions()
Dim stream As System.IO.FileStream
Dim fleinfo() As System.IO.FileInfo = Drctrs.GetFiles
For Each fle In fleinfo
stream = System.IO.File.Open(fle.FullName, IO.FileMode.Open)
Dim job As BigQueryJob = client.UploadCsv(projectId, datasetId, tableId, Nothing, stream, uploadCsvOptions)
job.PollUntilCompleted().ThrowOnAnyError()
Dim TABLE As BigQueryTable = client.GetTable(datasetId, tableId)
MsgBox(TABLE.Resource.NumRows)
Next
请注意,以上是您提供的代码的翻译。如果您需要进一步的信息或有其他问题,请随时提出。
英文:
Downloaded Google.Cloud.BigQuery.V2 from nuget and used this code "you need .net 6 or above for it to work" , files must be in .csv format. I download the data from oracle and split it into various csv files which I itterate to load them to gbq. Also to authenticate you'll need to download the GoogleCloudSDKInstaller.exe and through the shell run a command "gcloud auth application-default login" to generate the required files that will get your credentials automatically you can read more about it here: https://cloud.google.com/docs/authentication/provide-credentials-adc?hl=es-419#how-to
Dim Drctrs As New System.IO.DirectoryInfo(Application.StartupPath & "\Results")
Dim projectId As String = "edw-sandbox"
Dim datasetId As String = "AD_HOC"
Dim tableId As String = "TABLE"
Dim client As BigQueryClient = BigQueryClient.Create(projectId)
Dim uploadCsvOptions As UploadCsvOptions = New UploadCsvOptions()
Dim stream As System.IO.FileStream
Dim fleinfo() As System.IO.FileInfo = Drctrs.GetFiles
For Each fle In fleinfo
stream = System.IO.File.Open(fle.FullName, IO.FileMode.Open)
Dim job As BigQueryJob = client.UploadCsv(projectId, datasetId, tableId, Nothing, stream, uploadCsvOptions)
job.PollUntilCompleted().ThrowOnAnyError()
Dim TABLE As BigQueryTable = client.GetTable(datasetId, tableId)
MsgBox(TABLE.Resource.NumRows)
Next
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论