英文:
Reshaping a Dataframe with repeating column names
问题
我有这样的数据:
dataframe_1:
week SITE LAL SITE LAL
0 1 BARTON CHAPEL 1.1 PENASCAL I 1
1 2 BARTON CHAPEL 1.1 PENASCAL I 1
2 3 BARTON CHAPEL 1.1 PENASCAL I 1
我需要最终的数据框看起来像这样:
dataframe_2:
week SITE LAL
0 1 BARTON CHAPEL 1.1
1 2 BARTON CHAPEL 1.1
2 3 BARTON CHAPEL 1.1
3 1 PENASCAL I 1
4 2 PENASCAL I 1
5 3 PENASCAL I 1
我尝试使用'melt',但我无法得到期望的结果。也许我在使用错误的方法?
谢谢。
英文:
I have data that looks like this:
dataframe_1:
week SITE LAL SITE LAL
0 1 BARTON CHAPEL 1.1 PENASCAL I 1
1 2 BARTON CHAPEL 1.1 PENASCAL I 1
2 3 BARTON CHAPEL 1.1 PENASCAL I 1
And, i need the final dataframe to look like this:
dataframe_2:
week SITE LAL
0 1 BARTON CHAPEL 1.1
1 2 BARTON CHAPEL 1.1
2 3 BARTON CHAPEL 1.1
3 1 PENASCAL I 1
4 2 PENASCAL I 1
5 3 PENASCAL I 1
I've tried using 'melt' but I cannot get the desire result. Perhaps I'm using the wrong approach?
thank you,
答案1
得分: 3
如果你有明确的列名:
week SITE LAL SITE.1 LAL.1
0 1 BARTON CHAPEL 1.1 PENASCAL I 1
1 2 BARTON CHAPEL 1.1 PENASCAL I 1
2 3 BARTON CHAPEL 1.1 PENASCAL I 1
你可以使用 pandas.lreshape
函数:
pd.lreshape(df, {'SITE': ['SITE', 'SITE.1'], 'LAL': ['LAL', 'LAL.1']})
输出结果:
week SITE LAL
0 1 BARTON CHAPEL 1.1
1 2 BARTON CHAPEL 1.1
2 3 BARTON CHAPEL 1.1
3 1 PENASCAL I 1.0
4 2 PENASCAL I 1.0
5 3 PENASCAL I 1.0
out = (df
.melt('week').assign(idx=lambda d: d.groupby(['week', 'variable']).cumcount())
.pivot(index=['idx', 'week'], columns='variable', values='value')
.reset_index('week').rename_axis(index=None, columns=None)
)
输出结果:
week LAL SITE
0 1 1.1 BARTON CHAPEL
0 2 1.1 BARTON CHAPEL
0 3 1.1 BARTON CHAPEL
1 1 1 PENASCAL I
1 2 1 PENASCAL I
1 3 1 PENASCAL I
英文:
If you had unambiguous column names:
week SITE LAL SITE.1 LAL.1
0 1 BARTON CHAPEL 1.1 PENASCAL I 1
1 2 BARTON CHAPEL 1.1 PENASCAL I 1
2 3 BARTON CHAPEL 1.1 PENASCAL I 1
You could use pandas.lreshape
:
pd.lreshape(df, {'SITE': ['SITE', 'SITE.1'], 'LAL': ['LAL', 'LAL.1']})
Output:
week SITE LAL
0 1 BARTON CHAPEL 1.1
1 2 BARTON CHAPEL 1.1
2 3 BARTON CHAPEL 1.1
3 1 PENASCAL I 1.0
4 2 PENASCAL I 1.0
5 3 PENASCAL I 1.0
With your duplicated columns, you can melt
+ pivot
:
out = (df
.melt('week').assign(idx=lambda d: d.groupby(['week', 'variable']).cumcount())
.pivot(index=['idx', 'week'], columns='variable', values='value')
.reset_index('week').rename_axis(index=None, columns=None)
)
Output:
week LAL SITE
0 1 1.1 BARTON CHAPEL
0 2 1.1 BARTON CHAPEL
0 3 1.1 BARTON CHAPEL
1 1 1 PENASCAL I
1 2 1 PENASCAL I
1 3 1 PENASCAL I
答案2
得分: 1
这不是一个非常通用的解决方案,但将适用于您的示例:
df.groupby('week').apply(lambda _df: pd.concat((_df.iloc[:, 1:3], _df.iloc[:, 3:5]))).reset_index('week')
它按周分组,然后通过列选择和连接来重新塑造数据。最后移除了一个多余的索引列。
英文:
Not a very generalizable solution, but will work on your example:
df.groupby('week').apply( lambda _df : pd.concat((_df.iloc[:,1:3], _df.iloc[:,3:5]))).reset_index('week')
it groups by week and then reshapes with column selection + concatenation. Removing a superfluous index column in the end.
答案3
得分: 0
你可以通过从 dataframe_1 中提取你想要的列并在底部连接它们来实现这个目标:
# 创建一个包含 6 列和 10 行的虚拟数据框
df = pd.DataFrame(np.random.randint(0, 100, size=(10, 6)), columns=list('ABCDEF'))
df_tempo = df[['D', 'E', 'F']]
然后将临时数据框的列重命名为你的情况下相同的名称 ['week', 'SITE', 'LAL']:
df_tempo.columns = ['A', 'B', 'C']
然后将临时数据框与你想要保留的第一个数据框的子集连接起来:
df = pd.concat([df[['A', 'B', 'C']], df_tempo], axis=0, ignore_index=True)
希望对你有所帮助!
英文:
You can do this by extracting from the dataframe_1 the columns you want to extract and concatenate at the bottom:
# create a dummy dataframe with 6 columns and 10 rows
df = pd.DataFrame(np.random.randint(0,100,size=(10, 6)), columns=list('ABCDEF'))
df_tempo = df[['D','E','F']]
Then renaming the columns of the temporary dataframe with the same name in your case ['week', 'SITE', 'LAL']:
df_tempo.columns = ['A','B','C']
And then concatenate the temporary with the subset of the first dataframe you want to keep:
df = pd.concat([df[['A','B','C']], df_tempo], axis=0, ignore_index=True)
Hope it helps!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论