分割并展平一个数据框成多个数据框

huangapple go评论73阅读模式
英文:

Splitting and flattenig a dataframe in Multiple dataframe

问题

我正在尝试从单个DataFrame中生成多个DataFrame,如下所示:

D_input:

import pandas as pd
from numpy import nan

data = {'ID': {0: 'id1', 1: 'id1', 2: 'id1', 3: 'id1', 4: 'id1', 5: 'id1'}, 'hr': {0: 55, 1: 56, 2: 57, 3: 75, 4: 65, 5: 55}, 'hrMax': {0: nan, 1: 60.0, 2: 59.0, 3: nan, 4: 70.0, 5: 79.0}, 'hrMin': {0: nan, 1: 45.0, 2: 45.0, 3: nan, 4: 45.0, 5: 35.0}}

df = pd.DataFrame(data)

D_output: [D1, D2]

 ID    hr_a hr_b    hrMax hrMin  
 id1   55   56       60      45 
 id1   55   57       59      45 

 ID    hr_a hr_b    hrMax hrMin
 id1    75   65     70     45   
 id1    75   55     79     35  

我尝试过以下代码:

# 使用hrMax选择df中NaN的索引
index = df['hrMax'].index[df['hrMax'].apply(np.isnan)]
df_index = df.index.values.tolist()

# 使用iloc获取每个子DataFrame
for i in range(0, len(index)):
    df_single_observation = df.iloc[df_index.index(i):df_index.index(i+1)-1]

但是它没有起作用。请问是否可以提供帮助?非常感谢。最好的问候。

英文:

I am trying to derive multi dataframe from a single one as shown below:

D_input:

import pandas as pd
from numpy import nan

data = {'ID': {0: 'id1', 1: 'id1', 2: 'id1', 3: 'id1', 4: 'id1', 5: 'id1'}, 'hr': {0: 55, 1: 56, 2: 57, 3: 75, 4: 65, 5: 55}, 'hrMax': {0: nan, 1: 60.0, 2: 59.0, 3: nan, 4: 70.0, 5: 79.0}, 'hrMin': {0: nan, 1: 45.0, 2: 45.0, 3: nan, 4: 45.0, 5: 35.0}}

df = pd.DataFrame(data)

D_output: [D1, D2]

 ID    hr_a hr_b    hrMax hrMin  
 id1   55   56       60      45 
 id1   55   57       59      45 

 ID    hr_a hr_b    hrMax hrMin
 id1    75   65     70     45   
 id1    75   55     79     35  

I have tried

# Select the indexes where df is NaN using hrMax
index = df['hrMax'].index[df['hrMax'].apply(np.isnan)]
df_index = df.index.values.tolist()

# get each sub-dataframe using iloc
for i in range(0, len(index)) :
    df_single_observation = df.iloc[df_index.index(i):df_index.index(i+1)-1]

but it does not work. Please could I ask for any help?
Many thanks in advance.
Best Regards.

答案1

得分: 1

尝试:

m = df[['hrMax', 'hrMin']].isna().all(axis=1)

df['hr_a'] = df.loc[m, 'hr']
df['hr_a'] = df['hr_a'].ffill()

df = df[~m].rename(columns={'hr':'hr_b'})[['ID', 'hr_a', 'hr_b', 'hrMax', 'hrMin']]

for _, g in df.groupby((m != m.shift()).cumsum()):
    print(g)
    print('-'*80)

打印:

    ID  hr_a  hr_b  hrMax  hrMin
1  id1  55.0    56   60.0   45.0
2  id1  55.0    57   59.0   45.0
--------------------------------------------------------------------------------
    ID  hr_a  hr_b  hrMax  hrMin
4  id1  75.0    65   70.0   45.0
5  id1  75.0    55   79.0   35.0
--------------------------------------------------------------------------------
英文:

Try:

m = df[['hrMax', 'hrMin']].isna().all(axis=1)

df['hr_a'] = df.loc[m, 'hr']
df['hr_a'] = df['hr_a'].ffill()

df = df[~m].rename(columns={'hr':'hr_b'})[['ID', 'hr_a', 'hr_b', 'hrMax', 'hrMin']]

for _, g in df.groupby((m != m.shift()).cumsum()):
    print(g)
    print('-'*80)

Prints:

    ID  hr_a  hr_b  hrMax  hrMin
1  id1  55.0    56   60.0   45.0
2  id1  55.0    57   59.0   45.0
--------------------------------------------------------------------------------
    ID  hr_a  hr_b  hrMax  hrMin
4  id1  75.0    65   70.0   45.0
5  id1  75.0    55   79.0   35.0
--------------------------------------------------------------------------------

huangapple
  • 本文由 发表于 2023年7月20日 08:14:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76725927.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定