2023年6月1日 22:42:48go评论161阅读模式

英文:

Reshaping a Dataframe with repeating column names

问题

我有这样的数据：

	dataframe_1:				
	week	SITE	    LAL	SITE	 LAL
0	1	BARTON CHAPEL	1.1	PENASCAL I	1
1	2	BARTON CHAPEL	1.1	PENASCAL I	1
2	3	BARTON CHAPEL	1.1	PENASCAL I	1

我需要最终的数据框看起来像这样：

	dataframe_2:		
	week	SITE	LAL
0	1	BARTON CHAPEL	1.1
1	2	BARTON CHAPEL	1.1
2	3	BARTON CHAPEL	1.1
3	1	PENASCAL I	1
4	2	PENASCAL I	1
5	3	PENASCAL I	1

我尝试使用'melt'，但我无法得到期望的结果。也许我在使用错误的方法？
谢谢。

英文:

I have data that looks like this:

	dataframe_1:				
	week	SITE	    LAL	SITE	 LAL
0	1	BARTON CHAPEL	1.1	PENASCAL I	1
1	2	BARTON CHAPEL	1.1	PENASCAL I	1
2	3	BARTON CHAPEL	1.1	PENASCAL I	1

And, i need the final dataframe to look like this:

	dataframe_2:		
	week	SITE	LAL
0	1	BARTON CHAPEL	1.1
1	2	BARTON CHAPEL	1.1
2	3	BARTON CHAPEL	1.1
3	1	PENASCAL I	1
4	2	PENASCAL I	1
5	3	PENASCAL I	1

I've tried using 'melt' but I cannot get the desire result. Perhaps I'm using the wrong approach?
thank you,

答案1

得分: 3

如果你有明确的列名：

   week           SITE  LAL      SITE.1  LAL.1
0     1  BARTON CHAPEL  1.1  PENASCAL I      1
1     2  BARTON CHAPEL  1.1  PENASCAL I      1
2     3  BARTON CHAPEL  1.1  PENASCAL I      1

你可以使用 pandas.lreshape 函数：

pd.lreshape(df, {'SITE': ['SITE', 'SITE.1'], 'LAL': ['LAL', 'LAL.1']})

输出结果：

   week           SITE  LAL
0     1  BARTON CHAPEL  1.1
1     2  BARTON CHAPEL  1.1
2     3  BARTON CHAPEL  1.1
3     1     PENASCAL I  1.0
4     2     PENASCAL I  1.0
5     3     PENASCAL I  1.0

对于你的重复列，你可以使用 melt + pivot：

out = (df
   .melt('week').assign(idx=lambda d: d.groupby(['week', 'variable']).cumcount())
   .pivot(index=['idx', 'week'], columns='variable', values='value')
   .reset_index('week').rename_axis(index=None, columns=None)
)

输出结果：

   week  LAL           SITE
0     1  1.1  BARTON CHAPEL
0     2  1.1  BARTON CHAPEL
0     3  1.1  BARTON CHAPEL
1     1    1     PENASCAL I
1     2    1     PENASCAL I
1     3    1     PENASCAL I

英文:

If you had unambiguous column names:

   week           SITE  LAL      SITE.1  LAL.1
0     1  BARTON CHAPEL  1.1  PENASCAL I      1
1     2  BARTON CHAPEL  1.1  PENASCAL I      1
2     3  BARTON CHAPEL  1.1  PENASCAL I      1

You could use pandas.lreshape:

pd.lreshape(df, {&#39;SITE&#39;: [&#39;SITE&#39;, &#39;SITE.1&#39;], &#39;LAL&#39;: [&#39;LAL&#39;, &#39;LAL.1&#39;]})

Output:

   week           SITE  LAL
0     1  BARTON CHAPEL  1.1
1     2  BARTON CHAPEL  1.1
2     3  BARTON CHAPEL  1.1
3     1     PENASCAL I  1.0
4     2     PENASCAL I  1.0
5     3     PENASCAL I  1.0

With your duplicated columns, you can melt + pivot:

out = (df
   .melt(&#39;week&#39;).assign(idx=lambda d: d.groupby([&#39;week&#39;, &#39;variable&#39;]).cumcount())
   .pivot(index=[&#39;idx&#39;, &#39;week&#39;], columns=&#39;variable&#39;, values=&#39;value&#39;)
   .reset_index(&#39;week&#39;).rename_axis(index=None, columns=None)
)

Output:

   week  LAL           SITE
0     1  1.1  BARTON CHAPEL
0     2  1.1  BARTON CHAPEL
0     3  1.1  BARTON CHAPEL
1     1    1     PENASCAL I
1     2    1     PENASCAL I
1     3    1     PENASCAL I

答案2

得分: 1

这不是一个非常通用的解决方案，但将适用于您的示例：

df.groupby('week').apply(lambda _df: pd.concat((_df.iloc[:, 1:3], _df.iloc[:, 3:5]))).reset_index('week')

它按周分组，然后通过列选择和连接来重新塑造数据。最后移除了一个多余的索引列。

英文:

Not a very generalizable solution, but will work on your example:

df.groupby(&#39;week&#39;).apply( lambda _df : pd.concat((_df.iloc[:,1:3], _df.iloc[:,3:5]))).reset_index(&#39;week&#39;)

it groups by week and then reshapes with column selection + concatenation. Removing a superfluous index column in the end.

答案3

得分: 0

你可以通过从 dataframe_1 中提取你想要的列并在底部连接它们来实现这个目标：

# 创建一个包含 6 列和 10 行的虚拟数据框
df = pd.DataFrame(np.random.randint(0, 100, size=(10, 6)), columns=list('ABCDEF'))

df_tempo = df[['D', 'E', 'F']]

然后将临时数据框的列重命名为你的情况下相同的名称 ['week', 'SITE', 'LAL']：

df_tempo.columns = ['A', 'B', 'C']

然后将临时数据框与你想要保留的第一个数据框的子集连接起来：

df = pd.concat([df[['A', 'B', 'C']], df_tempo], axis=0, ignore_index=True)

希望对你有所帮助！

英文:

You can do this by extracting from the dataframe_1 the columns you want to extract and concatenate at the bottom:

# create a dummy dataframe with 6 columns and 10 rows
df = pd.DataFrame(np.random.randint(0,100,size=(10, 6)), columns=list(&#39;ABCDEF&#39;))

df_tempo = df[[&#39;D&#39;,&#39;E&#39;,&#39;F&#39;]]

Then renaming the columns of the temporary dataframe with the same name in your case ['week', 'SITE', 'LAL']:

df_tempo.columns = [&#39;A&#39;,&#39;B&#39;,&#39;C&#39;]

And then concatenate the temporary with the subset of the first dataframe you want to keep:

df = pd.concat([df[[&#39;A&#39;,&#39;B&#39;,&#39;C&#39;]], df_tempo], axis=0, ignore_index=True)

Hope it helps!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用重复的列名重新塑造数据框

问题

答案1

答案2

答案3

如何将日志文件持久地上传到服务器（使用Flask），这些文件会不时更改。

将复杂的爆炸数据帧中的选定列添加到另一个PySpark数据帧中。

加速列表附加过程，通过if else语句

TypeError: can't multiply sequence by non-int of type 'str' i get this when i want to multiplied my name by my name in python

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论