英文:
Creating new columns from existing column in python
问题
我有一个类似以下的数据框架:
data = [['A', 1, 100], ['A', 3, 100], ['A', 2, 100], ['A', 3, 100], ['A', 5, 100]]
df = pd.DataFrame(data, columns=['?', 'Rating', 'Amount'])
我需要基于评分值创建新列,并将金额代入,看起来像这样:
? Rating Amount 1 2 3 5
0 A 1 100 100 0 0 0
1 A 3 100 0 0 100 0
2 A 2 100 0 100 0 0
3 A 3 100 0 0 100 0
4 A 5 100 0 0 0 100
目前我有以下代码:
ratingnames = np.unique(list(df['Rating']))
ratingnames.sort()
d = pd.DataFrame(0, index=np.arange(len(df['Rating'])), columns=ratingnames)
for i in range(len(df['Rating'])):
ratingvalue = df.loc[i, 'Rating']
d.loc[i, ratingvalue] = df.loc[i, 'Amount']
df = pd.concat([df, d], axis=1)
但我觉得还可以改进。有什么建议吗?谢谢!
以下是改进后的代码:
# 使用pivot_table函数进行数据透视
df_pivot = df.pivot_table(index=['?', 'Rating'], columns='Rating', values='Amount', fill_value=0).reset_index()
# 重置列名
df_pivot.columns.name = None
# 重置索引并重命名列
df_pivot = df_pivot.reset_index().rename_axis(None, axis=1)
# 合并数据框架
result_df = df.merge(df_pivot, on=['?', 'Rating']).fillna(0)
这个代码更简洁和高效,使用了DataFrame的pivot_table函数来进行数据透视,然后将结果合并回原始数据框架中。
英文:
I have a dataframe that looks something like this:
data = [['A', 1, 100], ['A', 3, 100], ['A', 2, 100], ['A', 3, 100], ['A', 5, 100]]
df = pd.DataFrame(data, columns = ['?', 'Rating', 'Amount'])
? Rating Amount
0 A 1 100
1 A 3 100
2 A 2 100
3 A 3 100
4 A 5 100
and I need to create new columns based on the Rating value substituting in the amount - looks something like this:
? Rating Amount 1 2 3 5
0 A 1 100 100 0 0 0
1 A 3 100 0 0 100 0
2 A 2 100 0 100 0 0
3 A 3 100 0 0 100 0
4 A 5 100 0 0 0 100
Right now I have this:
ratingnames = np.unique(list(df['Rating']))
ratingnames.sort()
d = pd.DataFrame(0, index=np.arange(len(df['Rating'])), columns=ratingnames)
for i in range(len(df['Rating'])):
ratingvalue = df.loc[i, 'Rating']
d.loc[i, ratingvalue] = df.loc[i, 'Amount']
df = pd.concat([df, d], axis = 1)
but I feel like it could be improved upon. Any suggestions? Thanks!
答案1
得分: 2
使用get_dummies
函数,并与df['Amount']
相乘,然后在axis=1
上进行concat
:
output = pd.concat((df, pd.get_dummies(df['Rating']).mul(df['Amount'], axis=0)), axis=1)
? Rating Amount 1 2 3 5
0 A 1 100 100 0 0 0
1 A 3 100 0 0 100 0
2 A 2 100 0 100 0 0
3 A 3 100 0 0 100 0
4 A 5 100 0 0 0 100
英文:
IIUC, use get_dummies
and multiply with df['Amount'],
then concat
on axis=1
:
output = pd.concat((df,pd.get_dummies(df['Rating']).mul(df['Amount'],axis=0)),axis=1)
? Rating Amount 1 2 3 5
0 A 1 100 100 0 0 0
1 A 3 100 0 0 100 0
2 A 2 100 0 100 0 0
3 A 3 100 0 0 100 0
4 A 5 100 0 0 0 100
答案2
得分: 1
这将起作用:
df=pd.concat([df, df.apply(lambda x: pd.Series({x["Rating"]: x["Amount"]}), axis=1).fillna(0).astype("int")], axis=1)
输出:
? Rating Amount 1 2 3 5
0 A 1 100 100 0 0 0
1 A 3 100 0 0 100 0
2 A 2 100 0 100 0 0
3 A 3 100 0 0 100 0
4 A 5 100 0 0 0 100
英文:
This will do the trick:
df=pd.concat([df, df.apply(lambda x: pd.Series({x["Rating"]: x["Amount"]}), axis=1).fillna(0).astype("int")], axis=1)
Output:
? Rating Amount 1 2 3 5
0 A 1 100 100 0 0 0
1 A 3 100 0 0 100 0
2 A 2 100 0 100 0 0
3 A 3 100 0 0 100 0
4 A 5 100 0 0 0 100
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论