问题

我们有一个产品，其中我们客户的数据保存在GCP BigTable实例中。每个客户都有自己的数据表，但所有表都位于同一个BigTable实例中。

现在我们的新要求是要有一个超级用户，能够查看所有客户的数据，并能够以一种“联合”方式查询它...一次性查询所有表。所以我正在尝试找到一种解决方法，不改变客户数据保存的方式，也不复制数据。
每个客户存储的数据量大约为6TB，同一个BigTable实例中可能会有10-20个客户。

谢谢！

英文:

We have a product where the data of our customers saved in GCP BigTable instance. Each customer has it's own table for it's own data, but all the tables located in the same instance of BigTable.

Now our new requirement, is to have a Superuser that able to see the data from all customers and able to query it in kind of "union" way...query all tables at once. So I'm trying to find a way how to solve it, without changing the way how customers data is saved and without duplicating the data.
The amount of data that is stored for each customer is about 6TB, and there might be 10-20 customers on the same instance of BigTable.

Thank you!

答案1

得分: 1

由于 Bigtable 的读请求使用流来获取请求的行的内容。建议多行读取时，使用客户端库而不是直接的 API 调用，该库使用 ADC 进行身份验证。因此，应为表设置的查看器角色应允许您读取数据。

您可以尝试使用 SQL 传统关键字 UNION ALL 来组合多个表的选择结果，并使用行键来获取所需的数据，并使用连接来避免基于表列和结构在模式中设置的数据重复。请查看以下示例：

SELECT * FROM `table1`
 UNION ALL 
SELECT * FROM `table2`

值得注意的是，从您当前拥有的大小的表中读取会影响查询性能，并且在长期内也不可行。相反，您可以创建视图，并进行最新的读取或检查数据，可以查阅官方文档以了解如何优化云 Bigtable 数据服务的使用。

英文:

As the read requests for Bigtable use the stream to get back the contents of the requested rows.This use would be multiple rows can and reads ,it is recommended that you make use of Client libraries instead of direct API calls,which use the ADC for authentication.So the viewer roles set for your user on the tables should allow you to read data.

You may try and use the SQL legacy keyword UNION ALL to combine the select results from multiple tables and fetch the data as you need with the row keys and use the joins to avoid the duplication of data based on the table columns and structures you have currently set up in the schema.Please find the below example

SELECT * FROM `table1`
 UNION ALL 
SELECT * FROM `table2`

It is to be noted that the reads from the tables of the size you currently have would have an impact on the query performance and would also not be feasible at a long term.Instead you can create views and make the latest reads or check the data there check this official documentation for optimized use of database service for cloud big table.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

合并多个BigTable并查询它们

问题

答案1

GCP Bigtable：区域内的可用性

如何将部署在GCP中的Janusgraph与Python运行时连接？

Dataproc serverless writing to Bigtable: org.apache.spark.SparkException: Task failed while writing rows

为什么Cloud Bigtable有存储限制？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论