创建一个带有两列的Pandas数据框,取决于另一列的值。

huangapple go评论97阅读模式
英文:

Create a dictionary with two columns of a panda Dataframe depending on another column values

问题

我想创建一个字典,根据数据框中另一列的值来创建带有两列的Pandas数据框。

我有一个如下所示的数据框:

  1. a b c
  2. 0 brand F 0
  3. 1 brand G 1
  4. 2 seat yes 1
  5. 3 seat no 0

我想创建类似于以下的字典:

  1. brand = {'F':0,'G':1}
  2. seat = {'no':0,'yes':1}

尝试这样做没有给我想要的结果。

  1. dic = {}
  2. for x, y in zip(b.values, c.values):
  3. dic.setdefault(y, []).append(x)

谢谢!

英文:

I would like to create a dictionary with two columns of a panda Dataframe depending on another column values.

I have a dataframe as below:

  1. a b c
  2. 0 brand F 0
  3. 1 brand G 1
  4. 2 seat yes 1
  5. 3 seat no 0

I would like to create dictionaries like:

  1. brand = {'F':0,'G':1}
  2. seat = {'no':0,'yes':1}

Trying this does not give me the results.

  1. dic = {}
  2. for x,y in zip(b.values, c.values):
  3. dic.setdefault(y,[]).append(x)

Thank you!

答案1

得分: 2

你的方法的修改如下:

  1. dic = {}
  2. for a, b, c in zip(df['a'], df['b'], df['c']):
  3. dic.setdefault(a, {})[b] = c

[pandas] 等效的方法如下:

  1. dic = (df.set_index('b').groupby('a')['c']
  2. .agg(lambda g: g.to_dict())
  3. .to_dict()
  4. )

dic 的输出:

  1. {'brand': {'F': 0, 'G': 1},
  2. 'seat': {'yes': 1, 'no': 0}}
英文:

A modification of your approach would be:

  1. dic = {}
  2. for a, b, c in zip(df['a'], df['b'], df['c']):
  3. dic.setdefault(a, {})[b] = c

The [tag:pandas] equivalent:

  1. dic = (df.set_index('b').groupby('a')['c']
  2. .agg(lambda g: g.to_dict())
  3. .to_dict()
  4. )

Output dic:

  1. {'brand': {'F': 0, 'G': 1},
  2. 'seat': {'yes': 1, 'no': 0}}

答案2

得分: 1

你无法以编程方式将变量名称从列中的值分配给变量。一个简单的解决方案是使用字典,并将字典作为值。

  1. import pandas as pd
  2. df = pd.DataFrame({
  3. 'a': ['brand', 'brand', 'seat', 'seat'],
  4. 'b': ['F', 'G', 'yes', 'no'],
  5. 'c': [0, 1, 1, 0]
  6. })
  7. dic = {}
  8. for k, g in df.groupby('a'):
  9. dic[k] = dict(zip(*g[['b','c']].values.T))
  10. dic
  11. # 返回结果:
  12. {'brand': {'F': 0, 'G': 1}, 'seat': {'yes': 1, 'no': 0}}
英文:

You won't be able to programmatically assign variables names from values in the columns. An easy solution is a dictionary with dictionaries as the values.

  1. import pandas as pd
  2. df = pd.DataFrame({
  3. 'a': ['brand', 'brand', 'seat', 'seat'],
  4. 'b': ['F', 'G', 'yes', 'no'],
  5. 'c': [0, 1, 1, 0]
  6. })
  7. dic = {}
  8. for k, g in df.groupby('a'):
  9. dic[k] = dict(zip(*g[['b','c']].values.T))
  10. dic
  11. # returns:
  12. {'brand': {'F': 0, 'G': 1}, 'seat': {'yes': 1, 'no': 0}}

huangapple
  • 本文由 发表于 2023年4月13日 23:29:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76007263.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定