英文:
Is there a way to have files in the Kedro Catalog, that are missing?
问题
我有一个Kedro管道,它生成一个文件,该文件在同一管道的下一次运行中再次使用。然而,当管道第一次运行时,该文件不存在,并且在管道中的一个节点中进行处理。此时,Kedro会抛出一个缺少文件的错误。是否有办法通过Kedro处理这个情况?也许可以添加一个目录参数missing=True
或optional=True
,这样Kedro就可以安全地忽略这个文件?
我目前实现的解决方案是创建一个空文件,并在我的节点中检查该文件是否为空的数据框。
英文:
I have a kedro pipeline which generates a file that is used again for the next run of that same pipeline. However, when the pipeline runs for the first time, that file does not exist, and it is handled in a node in the pipeline. Kedro throws an missing file error here at this time.
Is there a way this can be handled through Kedro? Maybe add an catalog parameter missing=True
or optional=True
, and Kedro can safely ignore the file?
How I currently implemented the solution was to create an empty file, and check if the file is an empty dataframe in my node.
答案1
得分: 1
I don't think this is possible.
I tried to propose a workaround using hooks to inject a custom MissingDataSet
on the fly, but this workflow didn't work: https://github.com/kedro-org/kedro/issues/2690#issuecomment-1607746840
Apparently DataCatalog
is not a singleton, so this is not straightforward.
英文:
I don't think this is possible.
I tried to propose a workaround using hooks to inject a custom MissingDataSet
on the fly, but this workflow didn't work: https://github.com/kedro-org/kedro/issues/2690#issuecomment-1607746840
Apparently DataCatalog
is not a singleton, so this is not straightforward.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论