英文:
How does DataSQRL handle structured and unstructured data?
问题
我对 DataSQRL 如何处理结构化和非结构化数据感到好奇。结构化数据和非结构化数据之间有什么区别,DataSQRL 如何处理每种类型的数据?我如何配置要提取的数据类型?
我已阅读了文档中关于 "什么是SQRL?" 部分,特别是关于 嵌套表格 的内容,但不太清楚它是否真的像看起来那么简单。例如,表格嵌套是否有实际或设计上的深度限制?
英文:
I am curious about how DataSQRL handles structured and unstructured data. What are the differences between structured and unstructured data and how does DataSQRL process each type? How do I configure what type of data I’m ingesting?
I've read through the "What is SQRL?" section of the docs, in particular the bit about Nested Tables but it's not quite clear if it's really just as straight-forward as it seems. For example, is there a limit to how deeply tables can be nested either practically or by design?
答案1
得分: 1
DataSQRL可以摄取非结构化数据(如SQL表)和半结构化数据(如JSON文档)。数据格式文档页面列出了支持的输入格式。
半结构化数据,如JSON,被表示为嵌套表格,就像你所说的那样。嵌套表格的映射由数据源提供的模式配置文件控制。要自定义映射,您可以运行data discovery命令,然后进入创建数据源的目录,找到以.schema.yml
结尾的模式配置文件。您可以更改此文件以更新映射。
关于数据嵌套的深度,没有逻辑限制。我们已经测试过最多嵌套4个级别。因此,我现在认为这是一个实际的限制,如果数据比这更深度嵌套,就需要预处理数据。
英文:
DataSQRL can ingest both unstructured data (like SQL tables) and semi-structured data (like JSON documents). The data format documentation page lists the input formats that are supported.
Semi-structured data like JSON is represented as nested tables like you said. The mapping to nested tables is controlled by the schema configuration file that is provided by the data source. To customize the mapping, you can run the data discovery command and then go into the directory where the data source was created to find the schema configuration file that ends in .schema.yml
. You can change this file to update the mapping.
There is no logical limit to how deeply data can be nested. We have tested it up to 4 levels of nesting. So, I would consider that a practical limit for now and pre-process the data if it is more deeply nested than that.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论