英文:
Splitting and flattenig a dataframe in Multiple dataframe
问题
我正在尝试从单个DataFrame中生成多个DataFrame,如下所示:
D_input:
import pandas as pd
from numpy import nan
data = {'ID': {0: 'id1', 1: 'id1', 2: 'id1', 3: 'id1', 4: 'id1', 5: 'id1'}, 'hr': {0: 55, 1: 56, 2: 57, 3: 75, 4: 65, 5: 55}, 'hrMax': {0: nan, 1: 60.0, 2: 59.0, 3: nan, 4: 70.0, 5: 79.0}, 'hrMin': {0: nan, 1: 45.0, 2: 45.0, 3: nan, 4: 45.0, 5: 35.0}}
df = pd.DataFrame(data)
D_output: [D1, D2]
ID hr_a hr_b hrMax hrMin
id1 55 56 60 45
id1 55 57 59 45
ID hr_a hr_b hrMax hrMin
id1 75 65 70 45
id1 75 55 79 35
我尝试过以下代码:
# 使用hrMax选择df中NaN的索引
index = df['hrMax'].index[df['hrMax'].apply(np.isnan)]
df_index = df.index.values.tolist()
# 使用iloc获取每个子DataFrame
for i in range(0, len(index)):
df_single_observation = df.iloc[df_index.index(i):df_index.index(i+1)-1]
但是它没有起作用。请问是否可以提供帮助?非常感谢。最好的问候。
英文:
I am trying to derive multi dataframe from a single one as shown below:
D_input:
import pandas as pd
from numpy import nan
data = {'ID': {0: 'id1', 1: 'id1', 2: 'id1', 3: 'id1', 4: 'id1', 5: 'id1'}, 'hr': {0: 55, 1: 56, 2: 57, 3: 75, 4: 65, 5: 55}, 'hrMax': {0: nan, 1: 60.0, 2: 59.0, 3: nan, 4: 70.0, 5: 79.0}, 'hrMin': {0: nan, 1: 45.0, 2: 45.0, 3: nan, 4: 45.0, 5: 35.0}}
df = pd.DataFrame(data)
D_output: [D1, D2]
ID hr_a hr_b hrMax hrMin
id1 55 56 60 45
id1 55 57 59 45
ID hr_a hr_b hrMax hrMin
id1 75 65 70 45
id1 75 55 79 35
I have tried
# Select the indexes where df is NaN using hrMax
index = df['hrMax'].index[df['hrMax'].apply(np.isnan)]
df_index = df.index.values.tolist()
# get each sub-dataframe using iloc
for i in range(0, len(index)) :
df_single_observation = df.iloc[df_index.index(i):df_index.index(i+1)-1]
but it does not work. Please could I ask for any help?
Many thanks in advance.
Best Regards.
答案1
得分: 1
尝试:
m = df[['hrMax', 'hrMin']].isna().all(axis=1)
df['hr_a'] = df.loc[m, 'hr']
df['hr_a'] = df['hr_a'].ffill()
df = df[~m].rename(columns={'hr':'hr_b'})[['ID', 'hr_a', 'hr_b', 'hrMax', 'hrMin']]
for _, g in df.groupby((m != m.shift()).cumsum()):
print(g)
print('-'*80)
打印:
ID hr_a hr_b hrMax hrMin
1 id1 55.0 56 60.0 45.0
2 id1 55.0 57 59.0 45.0
--------------------------------------------------------------------------------
ID hr_a hr_b hrMax hrMin
4 id1 75.0 65 70.0 45.0
5 id1 75.0 55 79.0 35.0
--------------------------------------------------------------------------------
英文:
Try:
m = df[['hrMax', 'hrMin']].isna().all(axis=1)
df['hr_a'] = df.loc[m, 'hr']
df['hr_a'] = df['hr_a'].ffill()
df = df[~m].rename(columns={'hr':'hr_b'})[['ID', 'hr_a', 'hr_b', 'hrMax', 'hrMin']]
for _, g in df.groupby((m != m.shift()).cumsum()):
print(g)
print('-'*80)
Prints:
ID hr_a hr_b hrMax hrMin
1 id1 55.0 56 60.0 45.0
2 id1 55.0 57 59.0 45.0
--------------------------------------------------------------------------------
ID hr_a hr_b hrMax hrMin
4 id1 75.0 65 70.0 45.0
5 id1 75.0 55 79.0 35.0
--------------------------------------------------------------------------------
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论