2023年5月29日 19:10:56go评论68阅读模式

英文:

JSON malformed error for Batch Inference Job Input - Amazon Personalize

问题

I have created a solution version using the "similar-items" recipe in Amazon Personalize and trying to test it with a batch inference job. I followed AWS documentation which states that the input should be a list of itemIds, with a maximum of 500 items, and each itemId separated with a new line:

{"itemId": "105"}
{"itemId": "106"}
{"itemId": "441"}
...

Accordingly, I wrote the following code to transform my item_ids column into the described JSON format:

# convert item_id column to required JSON format with new lines entered between items
items_json = items_df['ITEM_ID'][1:200].to_json(orient='columns').replace(',', '}\n{')

# write output to JSON file
with open('items_json.json', 'w') as f:
    json.dump(items_json, f)

# write file to S3
from io import StringIO  
import s3fs

# Connect to S3 default profile
s3 = boto3.client('s3')

s3.put_object(
     Body=json.dumps(items_json),
     Bucket='bucket',
     Key='personalize/batch-recommendations-input/items_json.json'
)

Then when I run the batch inference job with that as input, it gives the following error: "User error: Input JSON is malformed."

My sample JSON input looks as follows:

    "{\"itemId\":\"12637\"} {\"itemId\":\"12931\"} {\"itemId\":\"13005\"}"

and after copying it to S3 as follows (adding backslashes to it)- don't know if that's significant in any way:

    "{\"itemId\":\"12637\"}\n{\"itemId\":\"12931\"}\n{\"itemId\":\"13005\"}"

To me, my format looks quite similar to what they asked for, any clue what might be causing the error?

英文:

I have created a solution version using "similar-items" recipe in Amazon Personalize and trying to test it with a batch inference job. I followed AWS documentation which states that the input should be a list of itemIds, with maximum of 500 items, and each itemId separated with a new line:

{&quot;itemId&quot;: &quot;105&quot;}
{&quot;itemId&quot;: &quot;106&quot;}
{&quot;itemId&quot;: &quot;441&quot;}
...

Accordingly, I wrote the following code to transform my item_ids column into the described JSON format:

    # convert item_id column to required JSON format with new lines entered between items
    items_json = items_df[&#39;ITEM_ID&#39;][1:200].to_json(orient=&#39;columns&#39;).replace(&#39;,&#39;,&#39;}\n{&#39;)

    # write output to json file
    with open(&#39;items_json.json&#39;, &#39;w&#39;) as f:
        json.dump(items_json, f)

    # write file to S3
    from io import StringIO  
    import s3fs

    #Connect to S3 default profile
    s3 = boto3.client(&#39;s3&#39;)

    s3.put_object(
         Body=json.dumps(items_json),
         Bucket=&#39;bucket&#39;,
         Key=&#39;personalize/batch-recommendations-input/items_json.json&#39;
    )

Then when I run the batch inference job with that as input, it gives the following error:
"User error: Input JSON is malformed."

My sample JSON input looks as follows:

    &quot;{&quot;itemId&quot;:&quot;12637&quot;} {&quot;itemId&quot;:&quot;12931&quot;} {&quot;itemId&quot;:&quot;13005&quot;}&quot;

and after copying it to S3 as follows (adding backslashes to it)- don't know if that's significant in anyway:

    &quot;{\&quot;itemId\&quot;:\&quot;12637\&quot;}\n{\&quot;itemId\&quot;:\&quot;12931\&quot;}\n{\&quot;itemId\&quot;:\&quot;13005\&quot;}&quot;

To me, my format looks quite similar to what they asked for, any clue what might be causing the error?

答案1

得分: 1

只需要对to_json的使用进行一些小的更改。具体来说，orient应该是records，lines应该是True。

完整示例：

import pandas as pd
import boto3

items_df = pd.read_csv("...")

# 确保项目ID列的名称是"itemId"
item_ids_df = items_df.rename(columns={"ITEM_ID": "itemId"})[["itemId"]]

# 将DataFrame以JSON行格式写入文件
item_ids_df.to_json("job_input.json", orient="records", lines=True)

# 上传到S3
boto3.Session().resource('s3').Bucket(bucket).Object("job_input.json").upload_file("job_input.json")

最后，您提到最大输入项目数为500。实际上，您的输入文件可以有多达50M个输入项目或文件大小为1GB。

英文:

You just need some small changes to the use of to_json. Specifically, orient should be records and lines should be True.

Full example:

import pandas as pd
import boto3

items_df = pd.read_csv(&quot;...&quot;)

# Make sure item ID column name is &quot;itemId&quot;
item_ids_df = items_df.rename(columns={&quot;ITEM_ID&quot;: &quot;itemId&quot;})[[&quot;itemId&quot;]]

# Write df to file in JSON lines format
item_ids_df.to_json(&quot;job_input.json&quot;, orient=&quot;records&quot;, lines=True)

# Upload to S3
boto3.Session().resource(&#39;s3&#39;).Bucket(bucket).Object(&quot;job_input.json&quot;).upload_file(&quot;job_input.json&quot;)

Lastly, you mentioned that the maximum number of input items is 500. Actually, your input file can have up to 50M input items or a file size of 1GB.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

JSON格式错误 – Amazon Personalize 的批处理推断作业输入

问题

答案1

如何在Python中访问未命名的对象

‘模块’ 对象在Python中不可调用

How can I split a string from every word in an array with the split method and then compare it to an array?

在Python/Pandas中，如何根据存储在列表中的条件筛选数据框？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论