能否将引用数据加入到pandas数据框中的嵌套字典?

huangapple go评论100阅读模式
英文:

Is it possible to join reference data into a nested dict in a pandas dataframe?

问题

我正在尝试连接两个pandas数据帧 - "left"表,其中包含一个具有复杂类型(字典数组)的列,而"right"表是一个扁平的参考表。

这些伪表格的表示如下:

left_df

parent_id array_column
1 [{id: 1}, {id: 3}]
2 [{id: 2}, {id: 4}]

right_df

id value
1 one
2 two
3 three
4 four

我试图查找/连接right df中的值到left df的array_column中的数组,使用id进行关联,但发现这相当棘手。

期望的结果

parent_id array_column
1 [{id: 1, value: 'one'}, {id: 3, value: 'three'}]
2 [{id: 2, value: 'two'}, {id: 4, value: 'four'}]

我最初的天真方法是使用合并,如下所示。

desired_df = pd.merge(left_df, right_df, how='outer', left_on='array_column.[id]', right_on='id')

显然这是失败的 - 我不太确定如何进一步处理。实际上,目标是在数组中查找参考数据,但经过多次搜索,我无法清晰地表达问题,以便Google结果可以显示一些有助于解决问题的东西。

感谢任何人可以提供的关于此问题的指导,无论是使用pandas还是其他方法。谢谢!

英文:

I am trying to join two pandas data frames - the "left" table, which contains a column with a complex type (an array of dicts) and the "right" table is a flat reference table.

pseudo table representation of these as follows

left_df

parent_id array_column
1 [{id: 1}, {id: 3}]
2 [{id: 2}, {id: 4}]

right_df

id value
1 one
2 two
3 three
4 four

I'm aiming to lookup/join the values from the right df into the array in the array_column of the left df using id's, but have found this quite tricky.

desired outcome

parent_id array_column
1 [{id: 1, value:'one'}, {id: 3, value: 'three'}]
2 [{id: 2, value: 'two'}, {id: 4, value: 'four'}]

My naive approach to start with was to use a merge, as per the following approach.

desired_df = pd.merge(left_df, right_df, how='outer', left_on = 'array_column.['id']', right_on = 'id')

Obviously this failed - not quite sure how I can progress further. Effectively the aim is to lookup reference data onto dicts within an array, but after much searching I've not been able to articulate the problem well enough for a google result to show something that can help.

Appreciate any guidance anyone can share on this, whether using pandas or not. Thank you!

答案1

得分: 0

合并可能不是正确的方法,因为您正在存储像包含字典列表等复杂对象类型。即便如此,您可以从right_df创建一个字典,然后使用它与map一起在left_df中替换并追加新的键值对。

d = right_df.set_index('id')['value']
left_df['array_column'] = left_df['array_column'].map(lambda x: [{**y, 'value': d.get(y['id'])} for y in x])

结果:

   parent_id                                      array_column
0          1  [{'id': 1, 'value': 'one'}, {'id': 3, 'value': 'three'}]
1          2  [{'id': 2, 'value': 'two'}, {'id': 4, 'value': 'four'}]
英文:

Merge might not be the right approach since you are storing complex object types like list of dict having said that you can create a dictionary from the right_df then use it with map to substitute and append the new key-val pairs in left_df

d = right_df.set_index('id')['value']
left_df['array_column'] = left_df['array_column'].map(lambda x: [{**y, 'value': d.get(y['id'])} for y in x])

Result

   parent_id                                              array_column
0          1  [{'id': 1, 'value': 'one'}, {'id': 3, 'value': 'three'}]
1          2   [{'id': 2, 'value': 'two'}, {'id': 4, 'value': 'four'}]

答案2

得分: 0

用合并操作,代码看起来是这样的:

temp = left_df.explode("array_column")
temp = temp.merge(
    right_df, left_on=temp["array_column"].apply(lambda x: x.get("id")), right_on="id"
).drop(columns="id")
temp["array_column"] = temp.apply(
    lambda x: {**x["array_column"], "value": x["value"]}, axis=1
)
out = temp.groupby("parent_id")["array_column"].agg(list).reset_index()
print(out)

   parent_id                                      array_columns
0          1  [{'id': 1, 'value': 'one'}, {'id': 3, 'value':...
1          2  [{'id': 2, 'value': 'two'}, {'id': 4, 'value':...
英文:

With merge it would look like:

temp = left_df.explode("array_column")
temp = temp.merge(
    right_df, left_on=temp["array_column"].apply(lambda x: x.get("id")), right_on="id"
).drop(columns="id")
temp["array_column"] = temp.apply(
    lambda x: {**x["array_column"], "value": x["value"]}, axis=1
)
out = temp.groupby("parent_id")["array_column"].agg(list).reset_index()
print(out)

   parent_id                                      array_columns
0          1  [{'id': 1, 'value': 'one'}, {'id': 3, 'value':...
1          2  [{'id': 2, 'value': 'two'}, {'id': 4, 'value':...

huangapple
  • 本文由 发表于 2023年3月7日 03:11:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75654896.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定