将两个Pandas行合并为一个,具有重复的时间序列列。

huangapple go评论79阅读模式
英文:

Combine two Pandas rows into one with duplicated columns for time series

问题

我有以下问题需要解决。我有两个具有相同列的Pandas Dataframe行:

列A 列B
单元格1 单元格2
单元格3 单元格4

我想要通过追加列将这两行合并为一行:

列A_1 列B_1 列A_2 列B_2
单元格1 单元格2 单元格3 单元格4

这个操作用于创建一个窗口大小为2的时间序列行,用于训练机器学习模型。因此,我需要执行这个操作数百万次,应该需要很小的操作成本。

提前感谢!

我尝试使用pandas concat,但速度太慢,需要大量内存。

英文:

I have the following problem that I am trying to solve. I have two Pandas Dataframe rows with the same columns:

Column A Column B
Cell 1 Cell 2
Cell 3 Cell 4

I want to combine both rows into one single row by appending the columns:

Column A_1 Column B_1 Column A_2 Column B_2
Cell 1 Cell 2 Cell 3 Cell 4

This operation is used to create a time series row with window size 2 for training a machine learning model. Therefore, I am doing this operation millions of times which should require a small operational cost.

Thanks in advance!

I tried using pandas concat but is is just too slow and requires a lot of ram

答案1

得分: 3

你可以使用stack()函数

out = df.stack().droplevel(0).to_frame().T
out.columns += ' ' + out.groupby(level=0, axis=1).cumcount().add(1).astype(str)
print(out)

# 输出
  Column A 1 Column B 1 Column A 2 Column B 2
0     Cell 1     Cell 2     Cell 3     Cell 4

如果你有多行数据你可以使用`numpy.reshape`:

pd.DataFrame(df.values.reshape(-1, 4)).add_prefix('Col ')
    Col 0   Col 1   Col 2   Col 3
0  Cell 1  Cell 2  Cell 3  Cell 4
1  Cell 1  Cell 2  Cell 3  Cell 4
英文:

You can use stack():

out = df.stack().droplevel(0).to_frame().T
out.columns += '_' + out.groupby(level=0, axis=1).cumcount().add(1).astype(str)
print(out)

# Output
  Column A_1 Column B_1 Column A_2 Column B_2
0     Cell 1     Cell 2     Cell 3     Cell 4

If you have multiple rows, you can use numpy.reshape:

>>> pd.DataFrame(df.values.reshape(-1, 4)).add_prefix('Col_')
    Col_0   Col_1   Col_2   Col_3
0  Cell 1  Cell 2  Cell 3  Cell 4
1  Cell 1  Cell 2  Cell 3  Cell 4

答案2

得分: 2

我希望我理解你的问题正确,但你可以尝试以下代码:

x = df.stack().reset_index()
x[''] = x['level_1'] + '_' + (x['level_0'] + 1).astype(str)
x = x[['', 0]].set_index('').T

print(x)

输出结果为:

  列 A_1 列 B_1 列 A_2 列 B_2
0   单元格 1   单元格 2   单元格 3   单元格 4
英文:

I hope I've understood you correctly, but you can try:

x = df.stack().reset_index()
x[''] = x['level_1'] + '_' + (x['level_0'] + 1).astype(str)
x = x[['', 0]].set_index('').T

print(x)

Prints:

  Column A_1 Column B_1 Column A_2 Column B_2
0     Cell 1     Cell 2     Cell 3     Cell 4

答案3

得分: 1

也许这会有所帮助:

result = df.stack()
result.index = [f"{y}_{x+1}" for x,y in result.index]
result = pd.DataFrame(result).T

将两个Pandas行合并为一个,具有重复的时间序列列。

英文:

Maybe it helps:

result = df.stack()
result.index = [f"{y}_{x+1}" for x,y in result.index]
result = pd.DataFrame(result).T

将两个Pandas行合并为一个,具有重复的时间序列列。

答案4

得分: 0

另一个可能的解决方案:

(pd.DataFrame(np.hstack(df.values.T)).T
 .set_axis([f'{x}_{y+1}' for y in range(2) for x in df.columns], axis=1))

或者:

from itertools import chain

(pd.DataFrame(chain(*[df[col] for col in df.columns])).T
 .set_axis([f'{x}_{y}' for y in range(1,3) for x in df.columns], axis=1))

输出:

   Column A_1  Column B_1  Column A_2  Column B_2
0     Cell 1      Cell 3     Cell 2      Cell 4
英文:

Another possible solution:

(pd.DataFrame(np.hstack(df.values.T)).T
 .set_axis([f'{x}_{y+1}' for y in range(2) for x in df.columns], axis=1))

Alternatively,

from itertools import chain

(pd.DataFrame(chain(*[df[col] for col in df.columns])).T
 .set_axis([f'{x}_{y}' for y in range(1,3) for x in df.columns], axis=1))

Output:

  Column A_1 Column B_1 Column A_2 Column B_2
0     Cell 1     Cell 3     Cell 2     Cell 4

huangapple
  • 本文由 发表于 2023年6月19日 03:32:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76502238.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定