问题

英文:

I am implementing a Kubeflow pipeline and Tensors from one component to another?
d-color:#272822;">@component( packages_to_install=[ "pandas==1.3.4", "numpy==1.20.3", "unidecode", "nltk==3.6.5", "gcsfs==2023.1.0" ], ) style="color:#66d9ef">def prepare_data(dataset:str, data_artifact: Output[Dataset]) -> NamedTuple("Outputs", [("ratings", Dataset),("movies", Dataset),("train", Dataset),("test", Dataset)]): # Implementation of prepare_data component d-color:#272822;">@component( packages_to_install=[ "tensorflow-recommenders==0.7.0", "tensorflow==2.9.1", ], ) style="color:#66d9ef">def train_model(epochs: int, ratings: Input[Dataset], movies: Input[Dataset], train: Input[Dataset], test: Input[Dataset], model_artifact: Output[Model]) -> NamedTuple("Outputs", [("model_artifact", Model)]): # Implementation of train_model component d-color:#272822;">@dsl.pipeline( pipeline_root=PIPELINE_ROOT + "data-pipeline", name="pipeline-with-deployment", ) style="color:#66d9ef">def pipeline(): prepare_data_op = prepare_data('gs://bucket-777/data.csv').set_cpu_limit('16').set_memory_limit('32G').set_caching_options(False) training_op=train_model(3,prepare_data_op.outputs["ratings"],prepare_data_op.outputs["movies"],prepare_data_op.outputs["train"],prepare_data_op.outputs["test"]).set_cpu_limit('16').set_memory_limit('32G').set_caching_options(False) deploy_op = deploy_model(training_op.outputs["model_artifact"],"projectid","us-central1") in Vertex AI. Basically I have two components: prepare_data and train_model:


@component(
    packages_to_install = [
        &quot;pandas==1.3.4&quot;,
        &quot;numpy==1.20.3&quot;,
        &quot;unidecode&quot;,
        &quot;nltk==3.6.5&quot;,
        &quot;gcsfs==2023.1.0&quot;
        
    ],
)

def prepare_data(dataset:str,
        data_artifact: Output[Dataset]) -&gt; NamedTuple(&quot;Outputs&quot;, [(&quot;ratings&quot;, Dataset),(&quot;movies&quot;, Dataset),(&quot;train&quot;, Dataset),(&quot;test&quot;, Dataset)]):

and...
@component(
    packages_to_install = [
        &quot;tensorflow-recommenders==0.7.0&quot;,
        &quot;tensorflow==2.9.1&quot;,
    ],
)
def train_model(epochs: int, 
                ratings: Input[Dataset],
                movies: Input[Dataset],
                train: Input[Dataset],
                test: Input[Dataset],
    model_artifact: Output[Model]) -&gt; NamedTuple(&quot;Outputs&quot;, [(&quot;model_artifact&quot;, Model)]):

prepare_data is generating four Tensorflow datasets (movies, ratings,train and test) that will be used inside train_model component.
How do I save (or reference) these datasets from prepare_data to be used inside train_model? For instance, I get the following error:
AttributeError: &#39;Dataset&#39; object has no attribute &#39;map&#39;

For this line of code:
user_ids = ratings.map(lambda x: x[&quot;requisito&quot;])

My pipeline looks like this:
@dsl.pipeline(
    pipeline_root=PIPELINE_ROOT + &quot;data-pipeline&quot;,
    name=&quot;pipeline-with-deployment&quot;,
)

def pipeline():
    prepare_data_op = prepare_data(&#39;gs://bucket-777/data.csv&#39;).set_cpu_limit(&#39;16&#39;).set_memory_limit(&#39;32G&#39;).set_caching_options(False)
    
training_op=train_model(3,prepare_data_op.outputs[&quot;ratings&quot;],prepare_data_op.outputs[&quot;movies&quot;],prepare_data_op.outputs[&quot;train&quot;],prepare_data_op.outputs[&quot;test&quot;]).set_cpu_limit(&#39;16&#39;).set_memory_limit(&#39;32G&#39;).set_caching_options(False)

    
    deploy_op = deploy_model(training_op.outputs[&quot;model_artifact&quot;] ,&quot;projectid&quot;,&quot;us-central1&quot;)

training_op.outputs[&quot;model_artifact&quot;] is an index for similarity search. The whole thing works perfectly in a single piece data-train, but when I split, the datasets do not keep their properties.
Any ideas on how to overcome this issue are welcome.
I checked this stackoverflow question (here) but I am unsure on how to do this on Tensorflow Datasets and Tensors.


答案1
得分: 0
这将由Kubeflow团队在将来实施。这是一个计划中的功能，可以在这里看到：
https://github.com/kubeflow/pipelines/issues/8899#issuecomment-1452764426

英文:
This will be implemented in the future by Kubeflow team. It's a planned feature, as seen here:
https://github.com/kubeflow/pipelines/issues/8899#issuecomment-1452764426







通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。






						
	

点赞		

					https://go.coder-hub.com/75574144.html
复制链接
复制链接
		


go

Kubeflow – 如何将Tensorflow Dataset和张量从一个组件传递到另一个组件？

问题

答案1

Proper way to configure Firebase Hosting i18n with Angular SPA

LeetCode中变量名略微改动引起的运行时间差异看似违反直觉

从使用zip创建的元组列表中删除浮点数的重复项。

使用Go编写Apache Beam将消息写入PubSub

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论