英文:
What's the cost and performance implications for BigQuery Native JSON data type?
问题
我计划在BigQuery中使用原生 JSON 数据类型。在此之前,我计划了解 JSON 数据类型的影响以及最佳实践。
例如,BigQuery 是一个列式数据存储,所以如果我为每一行存储一个大的 JSON(例如总共 100 个属性的 5 MB),并且如果我的表对于单个分区包含 1000 万行,那么如果我在选择子句中使用某些特定的 JSON 字段(例如 key1.key2,key2.key2)作为聚合字段,会对成本和性能有什么影响?我查看了文档,但没有找到任何缺点或限制。这是否意味着在任何用例中使用它都是安全的,而对性能和分析成本的影响不大?
英文:
I am planning to use the Native JSON data type in BigQuery. Before that, I am planning to understand the implications of JSON data types and best practices for JSON data type.
For example, the BigQuery is a columnar data store, so if I store a large JSON say 5 MB (in total of 100 properties) for each row and if my table contains 10 million rows for a single partition what are the side effects in terms of cost and performance if I use some specific JSON field (say key1.key2, key2.key2) in a select clause or as an aggregated field? I tried the documentation and I don't find any drawbacks or limitations. Does that mean is it safe to use for any use cases without impacting much on performance and Analysis cost?
答案1
得分: 1
使用BigQuery的按需定价,您只支付您所读取的内容。
此外,正如您所提到的,BigQuery使用列式存储模式。因此,如果一列使用JSON类型,将会读取整个列。
如果您只获取JSON中的一个属性,这不会有任何变化。BigQuery需要读取整个列,然后应用JSON路径过滤器并获取您的值。
就性能而言,您可以推断出需要读取和处理整个JSON(以获取值)。因此,您的JSON越大和/或行数越多,处理时间就越长。
英文:
With BigQuery, if you use on-demand pricing, you pay for what you read.
In addition, as you mentioned, BigQuery use a columnar storage mode. Therefore, if a column use the JSON type, the whole column is read.
If you get only a property in the JSON, that change nothing. BigQuery need to read the whole column, then to apply the JSON path filter and get your value(s).
In term of performance, you can deduce that the whole JSON need to be read and process (to get the value). So, larger is your JSON and/or higher is the number of row -> longer is the processing time.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论