英文:
How can I convert Apache Beam Rows to BigQuery TableFlows in a Java DataFlow pipeline?
问题
我需要使用Java在DataFlow管道中将数据写入BigQuery,因此需要获取一个TableRow
的PCollection。
TableSchema tableSchema = // 省略
BigQueryIO.Write<TableRow> bigQuerySink = BigQueryIO.<TableRow>write()
.to(new TableReference()
.setProjectId(options.getProject())
.setDatasetId(options.getBigQueryDataset())
.setTableId(options.getBigQueryTable()))
.withSchema(tableSchema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND);
到目前为止,在我的管道中,我有来自Apache Beam org.apache.beam.sdk.values.Row
的通用Row
对象的PCollection。我还有Beam模式Schema
。
因此,我需要实现一个DoFn<Row, TableRow>
,我可能会想出一些东西,但是支持多值类型如ARRAY(INT64)
到INT64
以及REPLICATED
等并不容易。
我觉得我正在重新发明轮子,这个类已经存在了吗?我是否需要这样的映射?
我查看了Dataflow模板和Google云库中的这个映射,但没有找到它。
英文:
I need to write data into a BigQuery within a DataFlow pipeline using Java. As such i need to get a PCollection of TableRow
TableSchema tableSchema = // omitted
BigQueryIO.Write<TableRow> bigQuerySink = BigQueryIO.<TableRow>write()
.to(new TableReference()
.setProjectId(options.getProject())
.setDatasetId(options.getBigQueryDataset())
.setTableId(options.getBigQueryTable()))
.withSchema(tableSchema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND);
As of now, in my pipeline, I have a PCollection of generic Row
objects from Apache Beam org.apache.beam.sdk.values.Row
. I also have the beam schema Schema
.
So i need to implement a DoFn<Row, TableRow>
and I will probably come up with something but it is not easy supporting multi valued types ARRAY(INT64)
to INT64
with REPLICATED etc.
My feeling is I am reinventing the wheel, does this class already exist ? Do I need such a mapping ?
I look at this mapper in the Dataflow templates and the google cloud libraries without finding it.
答案1
得分: 0
"I found the Mapper I was looking in BigQueryUtils
class here
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java#L508-L516
Required dependency is:
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
</dependency>
```"
<details>
<summary>英文:</summary>
Answering my question (seems like posting out your issue makes you twice shaper)
I found the Mapper I was looking in `BigQueryUtils` class here
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java#L508-L516
Required dependency is:
```xml
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
</dependency>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论