从现有列中创建新列在Python中。

huangapple go评论77阅读模式
英文:

Creating new columns from existing column in python

问题

我有一个类似以下的数据框架:

data = [['A', 1, 100], ['A', 3, 100], ['A', 2, 100], ['A', 3, 100], ['A', 5, 100]]
df = pd.DataFrame(data, columns=['?', 'Rating', 'Amount'])

我需要基于评分值创建新列,并将金额代入,看起来像这样:

	?	Rating	Amount	1	2	3	5
0	A	1	    100	    100	0	0	0
1	A	3	    100	    0	0	100	0
2	A	2	    100	    0	100	0	0
3	A	3	    100	    0	0	100	0
4	A	5	    100	    0	0	0	100

目前我有以下代码:

ratingnames = np.unique(list(df['Rating']))
ratingnames.sort()

d = pd.DataFrame(0, index=np.arange(len(df['Rating'])), columns=ratingnames)

for i in range(len(df['Rating'])):
    ratingvalue = df.loc[i, 'Rating']
    d.loc[i, ratingvalue] = df.loc[i, 'Amount']

df = pd.concat([df, d], axis=1)

但我觉得还可以改进。有什么建议吗?谢谢!

以下是改进后的代码:

# 使用pivot_table函数进行数据透视
df_pivot = df.pivot_table(index=['?', 'Rating'], columns='Rating', values='Amount', fill_value=0).reset_index()

# 重置列名
df_pivot.columns.name = None

# 重置索引并重命名列
df_pivot = df_pivot.reset_index().rename_axis(None, axis=1)

# 合并数据框架
result_df = df.merge(df_pivot, on=['?', 'Rating']).fillna(0)

这个代码更简洁和高效,使用了DataFrame的pivot_table函数来进行数据透视,然后将结果合并回原始数据框架中。

英文:

I have a dataframe that looks something like this:

data = [['A', 1, 100], ['A', 3, 100], ['A', 2, 100], ['A', 3, 100], ['A', 5, 100]]
df =  pd.DataFrame(data, columns = ['?', 'Rating', 'Amount'])
	?	Rating	Amount
0	A	1	    100
1	A	3	    100
2	A	2	    100
3	A	3	    100
4	A	5	    100

and I need to create new columns based on the Rating value substituting in the amount - looks something like this:

	?	Rating	Amount	1	2	3	5
0	A	1   	100 	100	0	0	0
1	A	3   	100 	0	0	100	0
2	A	2   	100 	0	100	0	0
3	A	3   	100 	0	0	100	0
4	A	5   	100 	0	0	0	100

Right now I have this:

ratingnames = np.unique(list(df['Rating']))
ratingnames.sort()

d = pd.DataFrame(0, index=np.arange(len(df['Rating'])), columns=ratingnames)

for i in range(len(df['Rating'])):
    ratingvalue = df.loc[i, 'Rating']
    d.loc[i, ratingvalue] = df.loc[i, 'Amount']

df = pd.concat([df, d], axis = 1)

but I feel like it could be improved upon. Any suggestions? Thanks!

答案1

得分: 2

使用get_dummies函数,并与df['Amount']相乘,然后在axis=1上进行concat

output = pd.concat((df, pd.get_dummies(df['Rating']).mul(df['Amount'], axis=0)), axis=1)

    ?  Rating  Amount    1    2    3    5
0  A       1     100  100    0    0    0
1  A       3     100    0    0  100    0
2  A       2     100    0  100    0    0
3  A       3     100    0    0  100    0
4  A       5     100    0    0    0  100

时间:
从现有列中创建新列在Python中。

英文:

IIUC, use get_dummies and multiply with df['Amount'], then concat on axis=1:

output = pd.concat((df,pd.get_dummies(df['Rating']).mul(df['Amount'],axis=0)),axis=1)

   ?  Rating  Amount    1    2    3    5
0  A       1     100  100    0    0    0
1  A       3     100    0    0  100    0
2  A       2     100    0  100    0    0
3  A       3     100    0    0  100    0
4  A       5     100    0    0    0  100

Timings:
从现有列中创建新列在Python中。

答案2

得分: 1

这将起作用:

df=pd.concat([df, df.apply(lambda x: pd.Series({x["Rating"]: x["Amount"]}), axis=1).fillna(0).astype("int")], axis=1)

输出:

   ?  Rating  Amount    1    2    3    5
0  A       1     100  100    0    0    0
1  A       3     100    0    0  100    0
2  A       2     100    0  100    0    0
3  A       3     100    0    0  100    0
4  A       5     100    0    0    0  100
英文:

This will do the trick:

df=pd.concat([df, df.apply(lambda x: pd.Series({x["Rating"]: x["Amount"]}), axis=1).fillna(0).astype("int")], axis=1)

Output:

   ?  Rating  Amount    1    2    3    5
0  A       1     100  100    0    0    0
1  A       3     100    0    0  100    0
2  A       2     100    0  100    0    0
3  A       3     100    0    0  100    0
4  A       5     100    0    0    0  100

huangapple
  • 本文由 发表于 2020年1月4日 00:05:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/59581694.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定