2023年6月19日 03:32:01go评论104阅读模式

英文:

Combine two Pandas rows into one with duplicated columns for time series

问题

我有以下问题需要解决。我有两个具有相同列的Pandas Dataframe行：

列A	列B
单元格1	单元格2
单元格3	单元格4

我想要通过追加列将这两行合并为一行：

列A_1	列B_1	列A_2	列B_2
单元格1	单元格2	单元格3	单元格4

这个操作用于创建一个窗口大小为2的时间序列行，用于训练机器学习模型。因此，我需要执行这个操作数百万次，应该需要很小的操作成本。

提前感谢！

我尝试使用pandas concat，但速度太慢，需要大量内存。

英文:

I have the following problem that I am trying to solve. I have two Pandas Dataframe rows with the same columns:

Column A	Column B
Cell 1	Cell 2
Cell 3	Cell 4

I want to combine both rows into one single row by appending the columns:

Column A_1	Column B_1	Column A_2	Column B_2
Cell 1	Cell 2	Cell 3	Cell 4

This operation is used to create a time series row with window size 2 for training a machine learning model. Therefore, I am doing this operation millions of times which should require a small operational cost.

Thanks in advance!

I tried using pandas concat but is is just too slow and requires a lot of ram

答案1

得分: 3

你可以使用stack()函数：
out = df.stack().droplevel(0).to_frame().T
out.columns += ' ' + out.groupby(level=0, axis=1).cumcount().add(1).astype(str)
print(out)
# 输出
  Column A 1 Column B 1 Column A 2 Column B 2
0     Cell 1     Cell 2     Cell 3     Cell 4
如果你有多行数据，你可以使用`numpy.reshape`：
pd.DataFrame(df.values.reshape(-1, 4)).add_prefix('Col ')
    Col 0   Col 1   Col 2   Col 3
0  Cell 1  Cell 2  Cell 3  Cell 4
1  Cell 1  Cell 2  Cell 3  Cell 4

英文:

You can use stack():

out = df.stack().droplevel(0).to_frame().T
out.columns += &#39;_&#39; + out.groupby(level=0, axis=1).cumcount().add(1).astype(str)
print(out)
# Output
  Column A_1 Column B_1 Column A_2 Column B_2
0     Cell 1     Cell 2     Cell 3     Cell 4

If you have multiple rows, you can use numpy.reshape:

&gt;&gt;&gt; pd.DataFrame(df.values.reshape(-1, 4)).add_prefix(&#39;Col_&#39;)
    Col_0   Col_1   Col_2   Col_3
0  Cell 1  Cell 2  Cell 3  Cell 4
1  Cell 1  Cell 2  Cell 3  Cell 4

答案2

得分: 2

我希望我理解你的问题正确，但你可以尝试以下代码：

x = df.stack().reset_index()
x[''] = x['level_1'] + '_' + (x['level_0'] + 1).astype(str)
x = x[['', 0]].set_index('').T
print(x)

输出结果为：

  列 A_1 列 B_1 列 A_2 列 B_2
0   单元格 1   单元格 2   单元格 3   单元格 4

英文:

I hope I've understood you correctly, but you can try:

x = df.stack().reset_index()
x[&#39;&#39;] = x[&#39;level_1&#39;] + &#39;_&#39; + (x[&#39;level_0&#39;] + 1).astype(str)
x = x[[&#39;&#39;, 0]].set_index(&#39;&#39;).T
print(x)

Prints:

  Column A_1 Column B_1 Column A_2 Column B_2
0     Cell 1     Cell 2     Cell 3     Cell 4

答案3

得分: 1

也许这会有所帮助：

result = df.stack()
result.index = [f"{y}_{x+1}" for x,y in result.index]
result = pd.DataFrame(result).T

英文:

Maybe it helps:

result = df.stack()
result.index = [f&quot;{y}_{x+1}&quot; for x,y in result.index]
result = pd.DataFrame(result).T

答案4

得分: 0

另一个可能的解决方案：

(pd.DataFrame(np.hstack(df.values.T)).T
 .set_axis([f'{x}_{y+1}' for y in range(2) for x in df.columns], axis=1))

或者：

from itertools import chain
(pd.DataFrame(chain(*[df[col] for col in df.columns])).T
 .set_axis([f'{x}_{y}' for y in range(1,3) for x in df.columns], axis=1))

输出：

   Column A_1  Column B_1  Column A_2  Column B_2
0     Cell 1      Cell 3     Cell 2      Cell 4

英文:

Another possible solution:

(pd.DataFrame(np.hstack(df.values.T)).T
 .set_axis([f&#39;{x}_{y+1}&#39; for y in range(2) for x in df.columns], axis=1))

Alternatively,

from itertools import chain
(pd.DataFrame(chain(*[df[col] for col in df.columns])).T
 .set_axis([f&#39;{x}_{y}&#39; for y in range(1,3) for x in df.columns], axis=1))

Output:

  Column A_1 Column B_1 Column A_2 Column B_2
0     Cell 1     Cell 3     Cell 2     Cell 4

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将两个Pandas行合并为一个，具有重复的时间序列列。

问题

答案1

答案2

答案3

答案4

从Python中的行信息创建新列

分组行并添加列（删除重复行）

TensorFlow使用Java API进行推断速度极慢。

将满足条件的第一行和最后一行之间的连续行分类。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。