英文:
Pandas new set of columns from present set of column sliced by a categorical column
问题
Bins | Color | Size |
---|---|---|
A | 'Red' | 50 |
B | 'Blue' | 60 |
英文:
I have table of form
Bins | A_Color | B_Color | A_Size | B_Size. |
---|---|---|---|---|
A | 'Red' | 50 | ||
B | 'Blue' | 60 |
and I wanted to have a common color and size columns instead of a set of columns for each category like this
Bins | Color | Size |
---|---|---|
A | 'Red' | 50 |
B | 'Blue' | 60 |
I tried the below however got NaN values for ['Color', 'Size'] columns
bins = ['A', 'B', 'C', 'D', 'E']
for b in bins:
df.loc[df['Bins'] == b, ['Color', 'Size']] = \
df.loc[df['Bins'] == b, [f'{b}_Color', f'{b}_Size']]
This is just an example, and the real data contains roughly 100K rows and more than 300+ columns.
答案1
得分: 2
你可以使用 pd.wide_to_long
。只需将列重命名以匹配格式(A_Color -> Color_A)
>>> (pd.wide_to_long(df.rename(columns=lambda x: '_'.join(x.split('_')[::-1])),
stubnames=['Color', 'Size'], i='Category', j='Cat',
sep='_', suffix='\w+')
.query('Category == Cat').droplevel('Cat').reset_index())
Category Color Size
0 A Red 50.0
1 B Blue 60.0
详细信息:
# 重命名列
>>> df1 = df.rename(columns=lambda x: '_'.join(x.split('_')[::-1]))
Category Color_A Color_B Size_A Size_B # <- 在这里
0 A Red NaN 50.0 NaN
1 B NaN Blue NaN 60.0
# 重塑数据框
>>> out = pd.wide_to_long(df1, stubnames=['Color', 'Size'], i='Category', j='Cat', sep='_', suffix='\w+')
Color Size
Category Cat
A A Red 50.0 # 保留
B A NaN NaN # 删除
A B NaN NaN # 删除
B B Blue 60.0 # 保留
# 过滤行
>>> out = out.query('Category == Cat')
Color Size
Category Cat
A A Red 50.0
B B Blue 60.0
# 获取最终数据框
>>> out = out.droplevel('Cat').reset_index()
Category Color Size
0 A Red 50.0
1 B Blue 60.0
英文:
You can use pd.wide_to_long
. You just have to rename your columns to match the format (A_Color -> Color_A)
>>> (pd.wide_to_long(df.rename(columns=lambda x: '_'.join(x.split('_')[::-1])),
stubnames=['Color', 'Size'], i='Category', j='Cat',
sep='_', suffix='\w+')
.query('Category == Cat').droplevel('Cat').reset_index())
Category Color Size
0 A Red 50.0
1 B Blue 60.0
Details:
# Rename columns
>>> df1 = df.rename(columns=lambda x: '_'.join(x.split('_')[::-1]))
Category Color_A Color_B Size_A Size_B # <- HERE
0 A Red NaN 50.0 NaN
1 B NaN Blue NaN 60.0
# Reshape dataframe
>>> out = pd.wide_to_long(df1, stubnames=['Color', 'Size'], i='Category', j='Cat', sep='_', suffix='\w+')
Color Size
Category Cat
A A Red 50.0 # Keep
B A NaN NaN # Drop
A B NaN NaN # Drop
B B Blue 60.0 # Keep
# Filter rows
>>> out = out.query('Category == Cat')
Color Size
Category Cat
A A Red 50.0
B B Blue 60.0
# Get final dataframe
>>> out = out.droplevel('Cat').reset_index()
Category Color Size
0 A Red 50.0
1 B Blue 60.0
答案2
得分: 1
代码部分不需要翻译,以下是已翻译的内容:
One idea is get first non missing value by splitted columns with _
:
一个想法是通过使用下划线 _
分割列来获取第一个非缺失值:
df1 = (df.set_index('Category')
.groupby(lambda x: x.split('_')[-1], axis=1)
.first()
.reset_index())
print(df1)
Category Color Size
0 A Red 50.0
1 B Blue 60.0
Solution with lookup
:
使用 lookup
的解决方案:
categories = ['Size','Color']
for c in categories:
idx, cols = pd.factorize(df['Category'].add(f'_{c}'))
df[c] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
Your solution should be changed by converting to numpy array:
您的解决方案应通过转换为 NumPy 数组来进行更改:
for b in bins:
df.loc[df['Category'] == b, ['Color', 'Size']] = \
df.loc[df['Category'] == b, [f'{b}_Color', f'{b}_Size']].to_numpy()
英文:
One idea is get first non missing value by splitted columns with _
:
df1 = (df.set_index('Category')
.groupby(lambda x: x.split('_')[-1], axis=1)
.first()
.reset_index())
print (df1)
Category Color Size
0 A Red 50.0
1 B Blue 60.0
Solution with lookup
:
categories = ['Size','Color']
for c in categories:
idx, cols = pd.factorize(df['Category'].add(f'_{c}'))
df[c] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
Your solution should be changed by converting to numpy array:
for b in bins:
df.loc[df['Category'] == b, ['Color', 'Size']] = \
df.loc[df['Category'] == b, [f'{b}_Color', f'{b}_Size']].to_numpy()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论