从现有列中创建新列在Python中。

huangapple go评论106阅读模式
英文:

Creating new columns from existing column in python

问题

我有一个类似以下的数据框架:

  1. data = [['A', 1, 100], ['A', 3, 100], ['A', 2, 100], ['A', 3, 100], ['A', 5, 100]]
  2. df = pd.DataFrame(data, columns=['?', 'Rating', 'Amount'])

我需要基于评分值创建新列,并将金额代入,看起来像这样:

  1. ? Rating Amount 1 2 3 5
  2. 0 A 1 100 100 0 0 0
  3. 1 A 3 100 0 0 100 0
  4. 2 A 2 100 0 100 0 0
  5. 3 A 3 100 0 0 100 0
  6. 4 A 5 100 0 0 0 100

目前我有以下代码:

  1. ratingnames = np.unique(list(df['Rating']))
  2. ratingnames.sort()
  3. d = pd.DataFrame(0, index=np.arange(len(df['Rating'])), columns=ratingnames)
  4. for i in range(len(df['Rating'])):
  5. ratingvalue = df.loc[i, 'Rating']
  6. d.loc[i, ratingvalue] = df.loc[i, 'Amount']
  7. df = pd.concat([df, d], axis=1)

但我觉得还可以改进。有什么建议吗?谢谢!

以下是改进后的代码:

  1. # 使用pivot_table函数进行数据透视
  2. df_pivot = df.pivot_table(index=['?', 'Rating'], columns='Rating', values='Amount', fill_value=0).reset_index()
  3. # 重置列名
  4. df_pivot.columns.name = None
  5. # 重置索引并重命名列
  6. df_pivot = df_pivot.reset_index().rename_axis(None, axis=1)
  7. # 合并数据框架
  8. result_df = df.merge(df_pivot, on=['?', 'Rating']).fillna(0)

这个代码更简洁和高效,使用了DataFrame的pivot_table函数来进行数据透视,然后将结果合并回原始数据框架中。

英文:

I have a dataframe that looks something like this:

  1. data = [['A', 1, 100], ['A', 3, 100], ['A', 2, 100], ['A', 3, 100], ['A', 5, 100]]
  2. df = pd.DataFrame(data, columns = ['?', 'Rating', 'Amount'])
  1. ? Rating Amount
  2. 0 A 1 100
  3. 1 A 3 100
  4. 2 A 2 100
  5. 3 A 3 100
  6. 4 A 5 100

and I need to create new columns based on the Rating value substituting in the amount - looks something like this:

  1. ? Rating Amount 1 2 3 5
  2. 0 A 1 100 100 0 0 0
  3. 1 A 3 100 0 0 100 0
  4. 2 A 2 100 0 100 0 0
  5. 3 A 3 100 0 0 100 0
  6. 4 A 5 100 0 0 0 100

Right now I have this:

  1. ratingnames = np.unique(list(df['Rating']))
  2. ratingnames.sort()
  3. d = pd.DataFrame(0, index=np.arange(len(df['Rating'])), columns=ratingnames)
  4. for i in range(len(df['Rating'])):
  5. ratingvalue = df.loc[i, 'Rating']
  6. d.loc[i, ratingvalue] = df.loc[i, 'Amount']
  7. df = pd.concat([df, d], axis = 1)

but I feel like it could be improved upon. Any suggestions? Thanks!

答案1

得分: 2

使用get_dummies函数,并与df['Amount']相乘,然后在axis=1上进行concat

  1. output = pd.concat((df, pd.get_dummies(df['Rating']).mul(df['Amount'], axis=0)), axis=1)

  1. ? Rating Amount 1 2 3 5
  2. 0 A 1 100 100 0 0 0
  3. 1 A 3 100 0 0 100 0
  4. 2 A 2 100 0 100 0 0
  5. 3 A 3 100 0 0 100 0
  6. 4 A 5 100 0 0 0 100

时间:
从现有列中创建新列在Python中。

英文:

IIUC, use get_dummies and multiply with df['Amount'], then concat on axis=1:

  1. output = pd.concat((df,pd.get_dummies(df['Rating']).mul(df['Amount'],axis=0)),axis=1)

  1. ? Rating Amount 1 2 3 5
  2. 0 A 1 100 100 0 0 0
  3. 1 A 3 100 0 0 100 0
  4. 2 A 2 100 0 100 0 0
  5. 3 A 3 100 0 0 100 0
  6. 4 A 5 100 0 0 0 100

Timings:
从现有列中创建新列在Python中。

答案2

得分: 1

这将起作用:

  1. df=pd.concat([df, df.apply(lambda x: pd.Series({x["Rating"]: x["Amount"]}), axis=1).fillna(0).astype("int")], axis=1)

输出:

  1. ? Rating Amount 1 2 3 5
  2. 0 A 1 100 100 0 0 0
  3. 1 A 3 100 0 0 100 0
  4. 2 A 2 100 0 100 0 0
  5. 3 A 3 100 0 0 100 0
  6. 4 A 5 100 0 0 0 100
英文:

This will do the trick:

  1. df=pd.concat([df, df.apply(lambda x: pd.Series({x["Rating"]: x["Amount"]}), axis=1).fillna(0).astype("int")], axis=1)

Output:

  1. ? Rating Amount 1 2 3 5
  2. 0 A 1 100 100 0 0 0
  3. 1 A 3 100 0 0 100 0
  4. 2 A 2 100 0 100 0 0
  5. 3 A 3 100 0 0 100 0
  6. 4 A 5 100 0 0 0 100

huangapple
  • 本文由 发表于 2020年1月4日 00:05:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/59581694.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定