是否有办法在Kedro目录中包含缺失的文件?

huangapple go评论73阅读模式
英文:

Is there a way to have files in the Kedro Catalog, that are missing?

问题

我有一个Kedro管道,它生成一个文件,该文件在同一管道的下一次运行中再次使用。然而,当管道第一次运行时,该文件不存在,并且在管道中的一个节点中进行处理。此时,Kedro会抛出一个缺少文件的错误。是否有办法通过Kedro处理这个情况?也许可以添加一个目录参数missing=Trueoptional=True,这样Kedro就可以安全地忽略这个文件?

我目前实现的解决方案是创建一个空文件,并在我的节点中检查该文件是否为空的数据框。

英文:

I have a kedro pipeline which generates a file that is used again for the next run of that same pipeline. However, when the pipeline runs for the first time, that file does not exist, and it is handled in a node in the pipeline. Kedro throws an missing file error here at this time.
Is there a way this can be handled through Kedro? Maybe add an catalog parameter missing=True or optional=True, and Kedro can safely ignore the file?

How I currently implemented the solution was to create an empty file, and check if the file is an empty dataframe in my node.

答案1

得分: 1

I don't think this is possible.

I tried to propose a workaround using hooks to inject a custom MissingDataSet on the fly, but this workflow didn't work: https://github.com/kedro-org/kedro/issues/2690#issuecomment-1607746840

Apparently DataCatalog is not a singleton, so this is not straightforward.

英文:

I don't think this is possible.

I tried to propose a workaround using hooks to inject a custom MissingDataSet on the fly, but this workflow didn't work: https://github.com/kedro-org/kedro/issues/2690#issuecomment-1607746840

Apparently DataCatalog is not a singleton, so this is not straightforward.

huangapple
  • 本文由 发表于 2023年6月26日 22:44:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76557758.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定