英文:
Is it possible to join reference data into a nested dict in a pandas dataframe?
问题
我正在尝试连接两个pandas数据帧 - "left"表,其中包含一个具有复杂类型(字典数组)的列,而"right"表是一个扁平的参考表。
这些伪表格的表示如下:
left_df
parent_id | array_column |
---|---|
1 | [{id: 1}, {id: 3}] |
2 | [{id: 2}, {id: 4}] |
right_df
id | value |
---|---|
1 | one |
2 | two |
3 | three |
4 | four |
我试图查找/连接right df中的值到left df的array_column中的数组,使用id进行关联,但发现这相当棘手。
期望的结果
parent_id | array_column |
---|---|
1 | [{id: 1, value: 'one'}, {id: 3, value: 'three'}] |
2 | [{id: 2, value: 'two'}, {id: 4, value: 'four'}] |
我最初的天真方法是使用合并,如下所示。
desired_df = pd.merge(left_df, right_df, how='outer', left_on='array_column.[id]', right_on='id')
显然这是失败的 - 我不太确定如何进一步处理。实际上,目标是在数组中查找参考数据,但经过多次搜索,我无法清晰地表达问题,以便Google结果可以显示一些有助于解决问题的东西。
感谢任何人可以提供的关于此问题的指导,无论是使用pandas还是其他方法。谢谢!
英文:
I am trying to join two pandas data frames - the "left" table, which contains a column with a complex type (an array of dicts) and the "right" table is a flat reference table.
pseudo table representation of these as follows
left_df
parent_id | array_column |
---|---|
1 | [{id: 1}, {id: 3}] |
2 | [{id: 2}, {id: 4}] |
right_df
id | value |
---|---|
1 | one |
2 | two |
3 | three |
4 | four |
I'm aiming to lookup/join the values from the right df into the array in the array_column of the left df using id's, but have found this quite tricky.
desired outcome
parent_id | array_column |
---|---|
1 | [{id: 1, value:'one'}, {id: 3, value: 'three'}] |
2 | [{id: 2, value: 'two'}, {id: 4, value: 'four'}] |
My naive approach to start with was to use a merge, as per the following approach.
desired_df = pd.merge(left_df, right_df, how='outer', left_on = 'array_column.['id']', right_on = 'id')
Obviously this failed - not quite sure how I can progress further. Effectively the aim is to lookup reference data onto dicts within an array, but after much searching I've not been able to articulate the problem well enough for a google result to show something that can help.
Appreciate any guidance anyone can share on this, whether using pandas or not. Thank you!
答案1
得分: 0
合并可能不是正确的方法,因为您正在存储像包含字典列表等复杂对象类型。即便如此,您可以从right_df
创建一个字典,然后使用它与map
一起在left_df
中替换并追加新的键值对。
d = right_df.set_index('id')['value']
left_df['array_column'] = left_df['array_column'].map(lambda x: [{**y, 'value': d.get(y['id'])} for y in x])
结果:
parent_id array_column
0 1 [{'id': 1, 'value': 'one'}, {'id': 3, 'value': 'three'}]
1 2 [{'id': 2, 'value': 'two'}, {'id': 4, 'value': 'four'}]
英文:
Merge might not be the right approach since you are storing complex object types like list of dict having said that you can create a dictionary from the right_df then use it with map
to substitute and append the new key-val pairs in left_df
d = right_df.set_index('id')['value']
left_df['array_column'] = left_df['array_column'].map(lambda x: [{**y, 'value': d.get(y['id'])} for y in x])
Result
parent_id array_column
0 1 [{'id': 1, 'value': 'one'}, {'id': 3, 'value': 'three'}]
1 2 [{'id': 2, 'value': 'two'}, {'id': 4, 'value': 'four'}]
答案2
得分: 0
用合并操作,代码看起来是这样的:
temp = left_df.explode("array_column")
temp = temp.merge(
right_df, left_on=temp["array_column"].apply(lambda x: x.get("id")), right_on="id"
).drop(columns="id")
temp["array_column"] = temp.apply(
lambda x: {**x["array_column"], "value": x["value"]}, axis=1
)
out = temp.groupby("parent_id")["array_column"].agg(list).reset_index()
print(out)
parent_id array_columns
0 1 [{'id': 1, 'value': 'one'}, {'id': 3, 'value':...
1 2 [{'id': 2, 'value': 'two'}, {'id': 4, 'value':...
英文:
With merge it would look like:
temp = left_df.explode("array_column")
temp = temp.merge(
right_df, left_on=temp["array_column"].apply(lambda x: x.get("id")), right_on="id"
).drop(columns="id")
temp["array_column"] = temp.apply(
lambda x: {**x["array_column"], "value": x["value"]}, axis=1
)
out = temp.groupby("parent_id")["array_column"].agg(list).reset_index()
print(out)
parent_id array_columns
0 1 [{'id': 1, 'value': 'one'}, {'id': 3, 'value':...
1 2 [{'id': 2, 'value': 'two'}, {'id': 4, 'value':...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论