英文:
Combine two Pandas rows into one with duplicated columns for time series
问题
我有以下问题需要解决。我有两个具有相同列的Pandas Dataframe行:
列A | 列B |
---|---|
单元格1 | 单元格2 |
单元格3 | 单元格4 |
我想要通过追加列将这两行合并为一行:
列A_1 | 列B_1 | 列A_2 | 列B_2 |
---|---|---|---|
单元格1 | 单元格2 | 单元格3 | 单元格4 |
这个操作用于创建一个窗口大小为2的时间序列行,用于训练机器学习模型。因此,我需要执行这个操作数百万次,应该需要很小的操作成本。
提前感谢!
我尝试使用pandas concat,但速度太慢,需要大量内存。
英文:
I have the following problem that I am trying to solve. I have two Pandas Dataframe rows with the same columns:
Column A | Column B |
---|---|
Cell 1 | Cell 2 |
Cell 3 | Cell 4 |
I want to combine both rows into one single row by appending the columns:
Column A_1 | Column B_1 | Column A_2 | Column B_2 |
---|---|---|---|
Cell 1 | Cell 2 | Cell 3 | Cell 4 |
This operation is used to create a time series row with window size 2 for training a machine learning model. Therefore, I am doing this operation millions of times which should require a small operational cost.
Thanks in advance!
I tried using pandas concat but is is just too slow and requires a lot of ram
答案1
得分: 3
你可以使用stack()函数:
out = df.stack().droplevel(0).to_frame().T
out.columns += ' ' + out.groupby(level=0, axis=1).cumcount().add(1).astype(str)
print(out)
# 输出
Column A 1 Column B 1 Column A 2 Column B 2
0 Cell 1 Cell 2 Cell 3 Cell 4
如果你有多行数据,你可以使用`numpy.reshape`:
pd.DataFrame(df.values.reshape(-1, 4)).add_prefix('Col ')
Col 0 Col 1 Col 2 Col 3
0 Cell 1 Cell 2 Cell 3 Cell 4
1 Cell 1 Cell 2 Cell 3 Cell 4
英文:
You can use stack():
out = df.stack().droplevel(0).to_frame().T
out.columns += '_' + out.groupby(level=0, axis=1).cumcount().add(1).astype(str)
print(out)
# Output
Column A_1 Column B_1 Column A_2 Column B_2
0 Cell 1 Cell 2 Cell 3 Cell 4
If you have multiple rows, you can use numpy.reshape
:
>>> pd.DataFrame(df.values.reshape(-1, 4)).add_prefix('Col_')
Col_0 Col_1 Col_2 Col_3
0 Cell 1 Cell 2 Cell 3 Cell 4
1 Cell 1 Cell 2 Cell 3 Cell 4
答案2
得分: 2
我希望我理解你的问题正确,但你可以尝试以下代码:
x = df.stack().reset_index()
x[''] = x['level_1'] + '_' + (x['level_0'] + 1).astype(str)
x = x[['', 0]].set_index('').T
print(x)
输出结果为:
列 A_1 列 B_1 列 A_2 列 B_2
0 单元格 1 单元格 2 单元格 3 单元格 4
英文:
I hope I've understood you correctly, but you can try:
x = df.stack().reset_index()
x[''] = x['level_1'] + '_' + (x['level_0'] + 1).astype(str)
x = x[['', 0]].set_index('').T
print(x)
Prints:
Column A_1 Column B_1 Column A_2 Column B_2
0 Cell 1 Cell 2 Cell 3 Cell 4
答案3
得分: 1
也许这会有所帮助:
result = df.stack()
result.index = [f"{y}_{x+1}" for x,y in result.index]
result = pd.DataFrame(result).T
英文:
Maybe it helps:
result = df.stack()
result.index = [f"{y}_{x+1}" for x,y in result.index]
result = pd.DataFrame(result).T
答案4
得分: 0
另一个可能的解决方案:
(pd.DataFrame(np.hstack(df.values.T)).T
.set_axis([f'{x}_{y+1}' for y in range(2) for x in df.columns], axis=1))
或者:
from itertools import chain
(pd.DataFrame(chain(*[df[col] for col in df.columns])).T
.set_axis([f'{x}_{y}' for y in range(1,3) for x in df.columns], axis=1))
输出:
Column A_1 Column B_1 Column A_2 Column B_2
0 Cell 1 Cell 3 Cell 2 Cell 4
英文:
Another possible solution:
(pd.DataFrame(np.hstack(df.values.T)).T
.set_axis([f'{x}_{y+1}' for y in range(2) for x in df.columns], axis=1))
Alternatively,
from itertools import chain
(pd.DataFrame(chain(*[df[col] for col in df.columns])).T
.set_axis([f'{x}_{y}' for y in range(1,3) for x in df.columns], axis=1))
Output:
Column A_1 Column B_1 Column A_2 Column B_2
0 Cell 1 Cell 3 Cell 2 Cell 4
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论