英文:
JSON malformed error for Batch Inference Job Input - Amazon Personalize
问题
I have created a solution version using the "similar-items" recipe in Amazon Personalize and trying to test it with a batch inference job. I followed AWS documentation which states that the input should be a list of itemIds, with a maximum of 500 items, and each itemId separated with a new line:
{"itemId": "105"}
{"itemId": "106"}
{"itemId": "441"}
...
Accordingly, I wrote the following code to transform my item_ids column into the described JSON format:
# convert item_id column to required JSON format with new lines entered between items
items_json = items_df['ITEM_ID'][1:200].to_json(orient='columns').replace(',', '}\n{')
# write output to JSON file
with open('items_json.json', 'w') as f:
json.dump(items_json, f)
# write file to S3
from io import StringIO
import s3fs
# Connect to S3 default profile
s3 = boto3.client('s3')
s3.put_object(
Body=json.dumps(items_json),
Bucket='bucket',
Key='personalize/batch-recommendations-input/items_json.json'
)
Then when I run the batch inference job with that as input, it gives the following error: "User error: Input JSON is malformed."
My sample JSON input looks as follows:
"{\"itemId\":\"12637\"} {\"itemId\":\"12931\"} {\"itemId\":\"13005\"}"
and after copying it to S3 as follows (adding backslashes to it)- don't know if that's significant in any way:
"{\"itemId\":\"12637\"}\n{\"itemId\":\"12931\"}\n{\"itemId\":\"13005\"}"
To me, my format looks quite similar to what they asked for, any clue what might be causing the error?
英文:
I have created a solution version using "similar-items" recipe in Amazon Personalize and trying to test it with a batch inference job. I followed AWS documentation which states that the input should be a list of itemIds, with maximum of 500 items, and each itemId separated with a new line:
{"itemId": "105"}
{"itemId": "106"}
{"itemId": "441"}
...
Accordingly, I wrote the following code to transform my item_ids column into the described JSON format:
# convert item_id column to required JSON format with new lines entered between items
items_json = items_df['ITEM_ID'][1:200].to_json(orient='columns').replace(',','}\n{')
# write output to json file
with open('items_json.json', 'w') as f:
json.dump(items_json, f)
# write file to S3
from io import StringIO
import s3fs
#Connect to S3 default profile
s3 = boto3.client('s3')
s3.put_object(
Body=json.dumps(items_json),
Bucket='bucket',
Key='personalize/batch-recommendations-input/items_json.json'
)
Then when I run the batch inference job with that as input, it gives the following error:
"User error: Input JSON is malformed."
My sample JSON input looks as follows:
"{"itemId":"12637"} {"itemId":"12931"} {"itemId":"13005"}"
and after copying it to S3 as follows (adding backslashes to it)- don't know if that's significant in anyway:
"{\"itemId\":\"12637\"}\n{\"itemId\":\"12931\"}\n{\"itemId\":\"13005\"}"
To me, my format looks quite similar to what they asked for, any clue what might be causing the error?
答案1
得分: 1
只需要对to_json的使用进行一些小的更改。具体来说,orient
应该是records
,lines
应该是True
。
完整示例:
import pandas as pd
import boto3
items_df = pd.read_csv("...")
# 确保项目ID列的名称是"itemId"
item_ids_df = items_df.rename(columns={"ITEM_ID": "itemId"})[["itemId"]]
# 将DataFrame以JSON行格式写入文件
item_ids_df.to_json("job_input.json", orient="records", lines=True)
# 上传到S3
boto3.Session().resource('s3').Bucket(bucket).Object("job_input.json").upload_file("job_input.json")
最后,您提到最大输入项目数为500。实际上,您的输入文件可以有多达50M个输入项目或文件大小为1GB。
英文:
You just need some small changes to the use of to_json. Specifically, orient
should be records
and lines
should be True
.
Full example:
import pandas as pd
import boto3
items_df = pd.read_csv("...")
# Make sure item ID column name is "itemId"
item_ids_df = items_df.rename(columns={"ITEM_ID": "itemId"})[["itemId"]]
# Write df to file in JSON lines format
item_ids_df.to_json("job_input.json", orient="records", lines=True)
# Upload to S3
boto3.Session().resource('s3').Bucket(bucket).Object("job_input.json").upload_file("job_input.json")
Lastly, you mentioned that the maximum number of input items is 500. Actually, your input file can have up to 50M input items or a file size of 1GB.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论