GCP Cloud Spanner CDC 通过 Dataflow 推送到 Pubsub。

huangapple go评论47阅读模式
英文:

GCP Cloud Spanner CDC pushes to Pubsub via Dataflow

问题

I am trying to build dataflow for change data from cloud spanner to pubsub topic, however after providing necessary information when I click on create job, it directly fails and gives error as follow:
"Failed to start the VM, launcher-202305251132348291864736406387748, used for launching because of status code: INVALID_ARGUMENT, reason: Invalid Error: Message: Invalid value for field 'resource.networkInterfaces[0].network': 'global/networks/default'. The referenced network resource cannot be found. HTTP Code: 400."

It is very much unclear if I need to create any VPC since there's no requirement in GCP doc about needing VPC and for what? If VPC is really such important, why GCP doc doesn't mention about it?

By the way, another question If you know, For Cloud Spanner when I am creating Change Data Stream I didn't create a separate instance/DB to store metadata for change since my changes & table won't have thousands of rows, it will be quiet smaller as only text values in 8 columns will it hold. is it fine to have metadata & actual DB same?

英文:

I am trying to build dataflow for change data from cloud spanner to pubsub topic, however after providing necessary information when I click on create job, it directly fails and gives error as follow:
"Failed to start the VM, launcher-202305251132348291864736406387748, used for launching because of status code: INVALID_ARGUMENT, reason: Invalid Error: Message: Invalid value for field 'resource.networkInterfaces[0].network': 'global/networks/default'. The referenced network resource cannot be found. HTTP Code: 400."

It is very much unclear if I need to create any VPC since there's no requirement in GCP doc about needing VPC and for what? if VPC is really such important, why GCP doc doesn't mention about it?

By the way, another question If you know, For Cloud Spanner when I am creating Change Data Stream I didn't create a separate instance/DB to store metadata for change since my changes & table won't have thousands of rows, it will be quiet smaller as only text values in 8 columns will it hold. is it fine to have metadaba & actual DB same?

答案1

得分: 1

我已经找到答案,我只需要提供子网络值,其他信息会从中获取。

关于问题-2,我可以使用相同的表。GCP将创建另一个元数据表,但在配置数据流作业时,我应该将数据库名称和元数据数据库名称指定为相同的表,而不是GCP创建的那个。

英文:

I have figured the answer that I need to provide only subnetwork value and it will take other information out of it.

Regarding Question-2, I can use the same table. GCP will create another metadata table but still while I configuring dataflow job I should mention database name and metadata database name as same table, not the one GCP created.

答案2

得分: 0

建议在变更流水线中为元数据表使用单独的数据库(https://cloud.google.com/spanner/docs/change-streams/use-dataflow#metadata)。如果您正在使用相同的数据库,请确保变更流不会跟踪为流水线创建的元数据表,因为这将导致不必要的变更记录生成。

此外,您不应传递元数据表名称。Dataflow 流水线将自动为您创建元数据表。

英文:

It is recommended to use a separate database for the metadata table in the change stream pipeline (https://cloud.google.com/spanner/docs/change-streams/use-dataflow#metadata). If you are using the same database, make sure the change stream is not tracking the metadata tables created for the pipeline, since that will cause unnecessary generation of change records.

In addition, you should not pass in the metadata table name. The Dataflow pipeline will create the metadata table for you automatically.

huangapple
  • 本文由 发表于 2023年5月26日 16:19:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76338958.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定