使用重复的列名重新塑造数据框

huangapple go评论69阅读模式
英文:

Reshaping a Dataframe with repeating column names

问题

我有这样的数据:

	dataframe_1:				
	week	SITE	    LAL	SITE	 LAL
0	1	BARTON CHAPEL	1.1	PENASCAL I	1
1	2	BARTON CHAPEL	1.1	PENASCAL I	1
2	3	BARTON CHAPEL	1.1	PENASCAL I	1

我需要最终的数据框看起来像这样:

	dataframe_2:		
	week	SITE	LAL
0	1	BARTON CHAPEL	1.1
1	2	BARTON CHAPEL	1.1
2	3	BARTON CHAPEL	1.1
3	1	PENASCAL I	1
4	2	PENASCAL I	1
5	3	PENASCAL I	1

我尝试使用'melt',但我无法得到期望的结果。也许我在使用错误的方法?
谢谢。

英文:

I have data that looks like this:

	dataframe_1:				
	week	SITE	    LAL	SITE	 LAL
0	1	BARTON CHAPEL	1.1	PENASCAL I	1
1	2	BARTON CHAPEL	1.1	PENASCAL I	1
2	3	BARTON CHAPEL	1.1	PENASCAL I	1

And, i need the final dataframe to look like this:

	dataframe_2:		
	week	SITE	LAL
0	1	BARTON CHAPEL	1.1
1	2	BARTON CHAPEL	1.1
2	3	BARTON CHAPEL	1.1
3	1	PENASCAL I	1
4	2	PENASCAL I	1
5	3	PENASCAL I	1

I've tried using 'melt' but I cannot get the desire result. Perhaps I'm using the wrong approach?
thank you,

答案1

得分: 3

如果你有明确的列名:

   week           SITE  LAL      SITE.1  LAL.1
0     1  BARTON CHAPEL  1.1  PENASCAL I      1
1     2  BARTON CHAPEL  1.1  PENASCAL I      1
2     3  BARTON CHAPEL  1.1  PENASCAL I      1

你可以使用 pandas.lreshape 函数:

pd.lreshape(df, {'SITE': ['SITE', 'SITE.1'], 'LAL': ['LAL', 'LAL.1']})

输出结果:

   week           SITE  LAL
0     1  BARTON CHAPEL  1.1
1     2  BARTON CHAPEL  1.1
2     3  BARTON CHAPEL  1.1
3     1     PENASCAL I  1.0
4     2     PENASCAL I  1.0
5     3     PENASCAL I  1.0

对于你的重复列,你可以使用 melt + pivot

out = (df
   .melt('week').assign(idx=lambda d: d.groupby(['week', 'variable']).cumcount())
   .pivot(index=['idx', 'week'], columns='variable', values='value')
   .reset_index('week').rename_axis(index=None, columns=None)
)

输出结果:

   week  LAL           SITE
0     1  1.1  BARTON CHAPEL
0     2  1.1  BARTON CHAPEL
0     3  1.1  BARTON CHAPEL
1     1    1     PENASCAL I
1     2    1     PENASCAL I
1     3    1     PENASCAL I
英文:

If you had unambiguous column names:

   week           SITE  LAL      SITE.1  LAL.1
0     1  BARTON CHAPEL  1.1  PENASCAL I      1
1     2  BARTON CHAPEL  1.1  PENASCAL I      1
2     3  BARTON CHAPEL  1.1  PENASCAL I      1

You could use pandas.lreshape:

pd.lreshape(df, {'SITE': ['SITE', 'SITE.1'], 'LAL': ['LAL', 'LAL.1']})

Output:

   week           SITE  LAL
0     1  BARTON CHAPEL  1.1
1     2  BARTON CHAPEL  1.1
2     3  BARTON CHAPEL  1.1
3     1     PENASCAL I  1.0
4     2     PENASCAL I  1.0
5     3     PENASCAL I  1.0

With your duplicated columns, you can melt + pivot:

out = (df
   .melt('week').assign(idx=lambda d: d.groupby(['week', 'variable']).cumcount())
   .pivot(index=['idx', 'week'], columns='variable', values='value')
   .reset_index('week').rename_axis(index=None, columns=None)
)

Output:

   week  LAL           SITE
0     1  1.1  BARTON CHAPEL
0     2  1.1  BARTON CHAPEL
0     3  1.1  BARTON CHAPEL
1     1    1     PENASCAL I
1     2    1     PENASCAL I
1     3    1     PENASCAL I

答案2

得分: 1

这不是一个非常通用的解决方案,但将适用于您的示例:

df.groupby('week').apply(lambda _df: pd.concat((_df.iloc[:, 1:3], _df.iloc[:, 3:5]))).reset_index('week')

它按周分组,然后通过列选择和连接来重新塑造数据。最后移除了一个多余的索引列。

英文:

Not a very generalizable solution, but will work on your example:

df.groupby('week').apply( lambda _df : pd.concat((_df.iloc[:,1:3], _df.iloc[:,3:5]))).reset_index('week')

it groups by week and then reshapes with column selection + concatenation. Removing a superfluous index column in the end.

答案3

得分: 0

你可以通过从 dataframe_1 中提取你想要的列并在底部连接它们来实现这个目标:

# 创建一个包含 6 列和 10 行的虚拟数据框
df = pd.DataFrame(np.random.randint(0, 100, size=(10, 6)), columns=list('ABCDEF'))

df_tempo = df[['D', 'E', 'F']]

然后将临时数据框的列重命名为你的情况下相同的名称 ['week', 'SITE', 'LAL']:

df_tempo.columns = ['A', 'B', 'C']

然后将临时数据框与你想要保留的第一个数据框的子集连接起来:

df = pd.concat([df[['A', 'B', 'C']], df_tempo], axis=0, ignore_index=True)

希望对你有所帮助!

英文:

You can do this by extracting from the dataframe_1 the columns you want to extract and concatenate at the bottom:

# create a dummy dataframe with 6 columns and 10 rows
df = pd.DataFrame(np.random.randint(0,100,size=(10, 6)), columns=list('ABCDEF'))

df_tempo = df[['D','E','F']]

Then renaming the columns of the temporary dataframe with the same name in your case ['week', 'SITE', 'LAL']:

df_tempo.columns = ['A','B','C']

And then concatenate the temporary with the subset of the first dataframe you want to keep:

df = pd.concat([df[['A','B','C']], df_tempo], axis=0, ignore_index=True)

Hope it helps!

huangapple
  • 本文由 发表于 2023年6月1日 22:42:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76383101.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定