将两个Pandas行合并为一个,具有重复的时间序列列。

huangapple go评论94阅读模式
英文:

Combine two Pandas rows into one with duplicated columns for time series

问题

我有以下问题需要解决。我有两个具有相同列的Pandas Dataframe行:

列A 列B
单元格1 单元格2
单元格3 单元格4

我想要通过追加列将这两行合并为一行:

列A_1 列B_1 列A_2 列B_2
单元格1 单元格2 单元格3 单元格4

这个操作用于创建一个窗口大小为2的时间序列行,用于训练机器学习模型。因此,我需要执行这个操作数百万次,应该需要很小的操作成本。

提前感谢!

我尝试使用pandas concat,但速度太慢,需要大量内存。

英文:

I have the following problem that I am trying to solve. I have two Pandas Dataframe rows with the same columns:

Column A Column B
Cell 1 Cell 2
Cell 3 Cell 4

I want to combine both rows into one single row by appending the columns:

Column A_1 Column B_1 Column A_2 Column B_2
Cell 1 Cell 2 Cell 3 Cell 4

This operation is used to create a time series row with window size 2 for training a machine learning model. Therefore, I am doing this operation millions of times which should require a small operational cost.

Thanks in advance!

I tried using pandas concat but is is just too slow and requires a lot of ram

答案1

得分: 3

  1. 你可以使用stack()函数
  2. out = df.stack().droplevel(0).to_frame().T
  3. out.columns += ' ' + out.groupby(level=0, axis=1).cumcount().add(1).astype(str)
  4. print(out)
  5. # 输出
  6. Column A 1 Column B 1 Column A 2 Column B 2
  7. 0 Cell 1 Cell 2 Cell 3 Cell 4
  8. 如果你有多行数据你可以使用`numpy.reshape`
  9. pd.DataFrame(df.values.reshape(-1, 4)).add_prefix('Col ')
  10. Col 0 Col 1 Col 2 Col 3
  11. 0 Cell 1 Cell 2 Cell 3 Cell 4
  12. 1 Cell 1 Cell 2 Cell 3 Cell 4
英文:

You can use stack():

  1. out = df.stack().droplevel(0).to_frame().T
  2. out.columns += '_' + out.groupby(level=0, axis=1).cumcount().add(1).astype(str)
  3. print(out)
  4. # Output
  5. Column A_1 Column B_1 Column A_2 Column B_2
  6. 0 Cell 1 Cell 2 Cell 3 Cell 4

If you have multiple rows, you can use numpy.reshape:

  1. >>> pd.DataFrame(df.values.reshape(-1, 4)).add_prefix('Col_')
  2. Col_0 Col_1 Col_2 Col_3
  3. 0 Cell 1 Cell 2 Cell 3 Cell 4
  4. 1 Cell 1 Cell 2 Cell 3 Cell 4

答案2

得分: 2

我希望我理解你的问题正确,但你可以尝试以下代码:

  1. x = df.stack().reset_index()
  2. x[''] = x['level_1'] + '_' + (x['level_0'] + 1).astype(str)
  3. x = x[['', 0]].set_index('').T
  4. print(x)

输出结果为:

  1. A_1 B_1 A_2 B_2
  2. 0 单元格 1 单元格 2 单元格 3 单元格 4
英文:

I hope I've understood you correctly, but you can try:

  1. x = df.stack().reset_index()
  2. x[''] = x['level_1'] + '_' + (x['level_0'] + 1).astype(str)
  3. x = x[['', 0]].set_index('').T
  4. print(x)

Prints:

  1. Column A_1 Column B_1 Column A_2 Column B_2
  2. 0 Cell 1 Cell 2 Cell 3 Cell 4

答案3

得分: 1

也许这会有所帮助:

  1. result = df.stack()
  2. result.index = [f"{y}_{x+1}" for x,y in result.index]
  3. result = pd.DataFrame(result).T

将两个Pandas行合并为一个,具有重复的时间序列列。

英文:

Maybe it helps:

  1. result = df.stack()
  2. result.index = [f"{y}_{x+1}" for x,y in result.index]
  3. result = pd.DataFrame(result).T

将两个Pandas行合并为一个,具有重复的时间序列列。

答案4

得分: 0

另一个可能的解决方案:

  1. (pd.DataFrame(np.hstack(df.values.T)).T
  2. .set_axis([f'{x}_{y+1}' for y in range(2) for x in df.columns], axis=1))

或者:

  1. from itertools import chain
  2. (pd.DataFrame(chain(*[df[col] for col in df.columns])).T
  3. .set_axis([f'{x}_{y}' for y in range(1,3) for x in df.columns], axis=1))

输出:

  1. Column A_1 Column B_1 Column A_2 Column B_2
  2. 0 Cell 1 Cell 3 Cell 2 Cell 4
英文:

Another possible solution:

  1. (pd.DataFrame(np.hstack(df.values.T)).T
  2. .set_axis([f'{x}_{y+1}' for y in range(2) for x in df.columns], axis=1))

Alternatively,

  1. from itertools import chain
  2. (pd.DataFrame(chain(*[df[col] for col in df.columns])).T
  3. .set_axis([f'{x}_{y}' for y in range(1,3) for x in df.columns], axis=1))

Output:

  1. Column A_1 Column B_1 Column A_2 Column B_2
  2. 0 Cell 1 Cell 3 Cell 2 Cell 4

huangapple
  • 本文由 发表于 2023年6月19日 03:32:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76502238.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定