2023年4月14日 00:06:39go评论142阅读模式

英文:

TypeError: Object of type Properties is not JSON serializable (Sagemaker Pipeline)

问题

以下是代码部分的中文翻译：

设置第一步（缩放步骤）：

scaling_processor = SKLearnProcessor(
    framework_version=FRAMEWORK_VERSION,
    instance_type="ml.m5.4xlarge",
    instance_count=processing_instance_count,
    base_job_name="data-process",
    role=role,
    sagemaker_session=pipeline_session,
)

scaling_processor_args = scaling_processor.run(
    inputs=[
        ProcessingInput(source=input_data, destination="/opt/ml/processing/input"),
    ],
    outputs=[
        ProcessingOutput(output_name="scaled_data", source="/opt/ml/processing/output/scaled_data/"),
        ProcessingOutput(output_name="train", source="/opt/ml/processing/output/train/"),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/output/test/")
    ],
    code="scripts/preprocess.py",
)

step_process = ProcessingStep(name="DataProcess", step_args=scaling_processor_args)

设置第二步（RF训练-BYO模式），这里是错误发生的地方：

estimator_cls = sagemaker.sklearn.SKLearn
FRAMEWORK_VERSION = "0.23-1"

rf_processor = FrameworkProcessor(
    estimator_cls,
    FRAMEWORK_VERSION,
    role=role,
    instance_count=1,
    instance_type='ml.m5.2xlarge',
    base_job_name='rf-modelling'
)

rf_processor_args = rf_processor.run(
    inputs=[
        ProcessingInput(source=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
                        destination="/opt/ml/processing/input"),
        ProcessingInput(source=step_process.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
                        destination="/opt/ml/processing/input"),
    ],
    outputs=[
        ProcessingOutput(output_name="rf_model", source="/opt/ml/processing/output/")
    ],
    code="scripts/train.py",
)

step_train = ProcessingStep(name="RFTrain", step_args=rf_processor_args)

请注意，您提到的错误可能是因为在第二步的rf_processor_args中使用了step_process.properties，并且某些属性不是JSON可序列化的。可能需要检查这些属性以解决问题。

英文:

I am trying to set up a Sagemaker pipeline that has 2 steps: preprocessing then training an RF model.
The first step produces 3 outputs: a scaled_data.csv, train.csv, and test.csv. The second step should take train and test CSVs to train the RF model. An error arises when running step 2 stating "TypeError: Object of type Properties is not JSON serializable".

Here is my code for setting the pipeline steps:

# upload data from local path to default bucket with prefix raw_data
WORK_DIRECTORY = &quot;data&quot;

input_data = sagemaker_session.upload_data(
    path=&quot;{}/{}&quot;.format(WORK_DIRECTORY, &quot;dataset.csv&quot;),
    bucket=bucket,
    key_prefix=&quot;{}/{}&quot;.format(prefix, &quot;input_data&quot;),
)

setting up the first step (scaling step)

scaling_processor = SKLearnProcessor(
    framework_version=FRAMEWORK_VERSION,
    instance_type=&quot;ml.m5.4xlarge&quot;,
    instance_count=processing_instance_count,
    base_job_name=&quot;data-process&quot;,
    role=role,
    sagemaker_session=pipeline_session,
)

scaling_processor_args = scaling_processor.run(
    inputs=[
        ProcessingInput(source=input_data, destination=&quot;/opt/ml/processing/input&quot;),
    ],
    outputs=[
        ProcessingOutput(output_name=&quot;scaled_data&quot;, source=&quot;/opt/ml/processing/output/scaled_data/&quot;),
        ProcessingOutput(output_name=&quot;train&quot;, source=&quot;/opt/ml/processing/output/train/&quot;),
        ProcessingOutput(output_name=&quot;test&quot;, source=&quot;/opt/ml/processing/output/test/&quot;)    ],
    
    code=&quot;scripts/preprocess.py&quot;,
)

step_process = ProcessingStep(name=&quot;DataProcess&quot;, step_args=scaling_processor_args)

setting up the 2nd step (RF Training-BYO mode), here is where the error arises:

estimator_cls = sagemaker.sklearn.SKLearn
FRAMEWORK_VERSION = &quot;0.23-1&quot;

rf_processor = FrameworkProcessor(
    estimator_cls,
    FRAMEWORK_VERSION,
    role = role,
    instance_count=1,
    instance_type=&#39;ml.m5.2xlarge&#39;,
    base_job_name=&#39;rf-modelling&#39;
)

rf_processor_args = rf_processor.run(
    inputs=[
        ProcessingInput(source=step_process.properties.ProcessingOutputConfig.Outputs[&quot;train&quot;].S3Output.S3Uri,
                        destination=&quot;/opt/ml/processing/input&quot;),
        ProcessingInput(source=step_process.properties.ProcessingOutputConfig.Outputs[&quot;test&quot;].S3Output.S3Uri,
                        destination=&quot;/opt/ml/processing/input&quot;),    ],
    outputs=[
        ProcessingOutput(output_name=&quot;rf_model&quot;,source = &quot;/opt/ml/processing/output/&quot;)
   ],
    
    code=&quot;scripts/train.py&quot;,
)

step_train = ProcessingStep(name=&quot;RFTrain&quot;, step_args=rf_processor_args)

An error arises when running step 2 stating "TypeError: Object of type Properties is not JSON serializable". The problem is with the line where I set processingInput for the 2nd step in rf_processor_args.
Any ideas what might cause this error?

答案1

得分: 1

以下是翻译好的部分：

# 设置第一个步骤（缩放步骤）

...

# --> 直接使用 ProcessingStep 并提供所有参数
step_process = ProcessingStep(
    name="DataProcess",
    processor=scaling_processor,
    inputs=[
        ProcessingInput(source=input_data, destination="/opt/ml/processing/input"),
    ],
    outputs=[
        ProcessingOutput(output_name="scaled_data", source="/opt/ml/processing/output/scaled_data/"),
        ProcessingOutput(output_name="train", source="/opt/ml/processing/output/train/"),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/output/test/"),
    ],
    code="scripts/preprocess.py",
)

# 设置第二个步骤（RF 训练-BYO 模式）

...

# --> 直接使用 ProcessingStep 并提供所有参数
step_train = ProcessingStep(
    name="RFTrain",
    processor=rf_processor,
    inputs=[
        ProcessingInput(source=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
                        destination="/opt/ml/processing/input/train"),
        ProcessingInput(source=step_process.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
                        destination="/opt/ml/processing/input/test"),
    ],
    outputs=[
        ProcessingOutput(output_name="rf_model", source="/opt/ml/processing/output/"),
    ],
    code="scripts/train.py",
)

英文:

The run() you call seems to be incorrect choice, as run() is used to run the processing job directly, instead of defining the pipeline steps, which you apparently want. Use ProcessingStep directly and feed it with all the necessary arguments:

# Setup the first step (scaling step)

...

# --&gt; Use ProcessingStep directly and provide all the args
step_process = ProcessingStep(
    name=&quot;DataProcess&quot;,
    processor=scaling_processor,
    inputs=[
        ProcessingInput(source=input_data, destination=&quot;/opt/ml/processing/input&quot;),
    ],
    outputs=[
        ProcessingOutput(output_name=&quot;scaled_data&quot;, source=&quot;/opt/ml/processing/output/scaled_data/&quot;),
        ProcessingOutput(output_name=&quot;train&quot;, source=&quot;/opt/ml/processing/output/train/&quot;),
        ProcessingOutput(output_name=&quot;test&quot;, source=&quot;/opt/ml/processing/output/test/&quot;),
    ],
    code=&quot;scripts/preprocess.py&quot;,
)

# Setup the 2nd step (RF Training-BYO mode)

...

# --&gt; Use ProcessingStep directly and provide all the args
step_train = ProcessingStep(
    name=&quot;RFTrain&quot;,
    processor=rf_processor,
    inputs=[
        ProcessingInput(source=step_process.properties.ProcessingOutputConfig.Outputs[&quot;train&quot;].S3Output.S3Uri,
                        destination=&quot;/opt/ml/processing/input/train&quot;),
        ProcessingInput(source=step_process.properties.ProcessingOutputConfig.Outputs[&quot;test&quot;].S3Output.S3Uri,
                        destination=&quot;/opt/ml/processing/input/test&quot;),
    ],
    outputs=[
        ProcessingOutput(output_name=&quot;rf_model&quot;, source=&quot;/opt/ml/processing/output/&quot;),
    ],
    code=&quot;scripts/train.py&quot;,
)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

TypeError: 类型为Properties的对象不可JSON序列化 (Sagemaker管道)

问题

答案1

如何使用代码区分普通用户和机器人（discord.py）？

Python round函数中的一个错误

比较Python Selenium中的类。

在Django中如何保存一个JSON对象

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论