分割并展平一个数据框成多个数据框

huangapple go评论104阅读模式
英文:

Splitting and flattenig a dataframe in Multiple dataframe

问题

我正在尝试从单个DataFrame中生成多个DataFrame,如下所示:

D_input:

  1. import pandas as pd
  2. from numpy import nan
  3. data = {'ID': {0: 'id1', 1: 'id1', 2: 'id1', 3: 'id1', 4: 'id1', 5: 'id1'}, 'hr': {0: 55, 1: 56, 2: 57, 3: 75, 4: 65, 5: 55}, 'hrMax': {0: nan, 1: 60.0, 2: 59.0, 3: nan, 4: 70.0, 5: 79.0}, 'hrMin': {0: nan, 1: 45.0, 2: 45.0, 3: nan, 4: 45.0, 5: 35.0}}
  4. df = pd.DataFrame(data)

D_output: [D1, D2]

  1. ID hr_a hr_b hrMax hrMin
  2. id1 55 56 60 45
  3. id1 55 57 59 45
  4. ID hr_a hr_b hrMax hrMin
  5. id1 75 65 70 45
  6. id1 75 55 79 35

我尝试过以下代码:

  1. # 使用hrMax选择df中NaN的索引
  2. index = df['hrMax'].index[df['hrMax'].apply(np.isnan)]
  3. df_index = df.index.values.tolist()
  4. # 使用iloc获取每个子DataFrame
  5. for i in range(0, len(index)):
  6. df_single_observation = df.iloc[df_index.index(i):df_index.index(i+1)-1]

但是它没有起作用。请问是否可以提供帮助?非常感谢。最好的问候。

英文:

I am trying to derive multi dataframe from a single one as shown below:

D_input:

  1. import pandas as pd
  2. from numpy import nan
  3. data = {'ID': {0: 'id1', 1: 'id1', 2: 'id1', 3: 'id1', 4: 'id1', 5: 'id1'}, 'hr': {0: 55, 1: 56, 2: 57, 3: 75, 4: 65, 5: 55}, 'hrMax': {0: nan, 1: 60.0, 2: 59.0, 3: nan, 4: 70.0, 5: 79.0}, 'hrMin': {0: nan, 1: 45.0, 2: 45.0, 3: nan, 4: 45.0, 5: 35.0}}
  4. df = pd.DataFrame(data)

D_output: [D1, D2]

  1. ID hr_a hr_b hrMax hrMin
  2. id1 55 56 60 45
  3. id1 55 57 59 45
  4. ID hr_a hr_b hrMax hrMin
  5. id1 75 65 70 45
  6. id1 75 55 79 35

I have tried

  1. # Select the indexes where df is NaN using hrMax
  2. index = df['hrMax'].index[df['hrMax'].apply(np.isnan)]
  3. df_index = df.index.values.tolist()
  4. # get each sub-dataframe using iloc
  5. for i in range(0, len(index)) :
  6. df_single_observation = df.iloc[df_index.index(i):df_index.index(i+1)-1]

but it does not work. Please could I ask for any help?
Many thanks in advance.
Best Regards.

答案1

得分: 1

尝试:

  1. m = df[['hrMax', 'hrMin']].isna().all(axis=1)
  2. df['hr_a'] = df.loc[m, 'hr']
  3. df['hr_a'] = df['hr_a'].ffill()
  4. df = df[~m].rename(columns={'hr':'hr_b'})[['ID', 'hr_a', 'hr_b', 'hrMax', 'hrMin']]
  5. for _, g in df.groupby((m != m.shift()).cumsum()):
  6. print(g)
  7. print('-'*80)

打印:

  1. ID hr_a hr_b hrMax hrMin
  2. 1 id1 55.0 56 60.0 45.0
  3. 2 id1 55.0 57 59.0 45.0
  4. --------------------------------------------------------------------------------
  5. ID hr_a hr_b hrMax hrMin
  6. 4 id1 75.0 65 70.0 45.0
  7. 5 id1 75.0 55 79.0 35.0
  8. --------------------------------------------------------------------------------
英文:

Try:

  1. m = df[['hrMax', 'hrMin']].isna().all(axis=1)
  2. df['hr_a'] = df.loc[m, 'hr']
  3. df['hr_a'] = df['hr_a'].ffill()
  4. df = df[~m].rename(columns={'hr':'hr_b'})[['ID', 'hr_a', 'hr_b', 'hrMax', 'hrMin']]
  5. for _, g in df.groupby((m != m.shift()).cumsum()):
  6. print(g)
  7. print('-'*80)

Prints:

  1. ID hr_a hr_b hrMax hrMin
  2. 1 id1 55.0 56 60.0 45.0
  3. 2 id1 55.0 57 59.0 45.0
  4. --------------------------------------------------------------------------------
  5. ID hr_a hr_b hrMax hrMin
  6. 4 id1 75.0 65 70.0 45.0
  7. 5 id1 75.0 55 79.0 35.0
  8. --------------------------------------------------------------------------------

huangapple
  • 本文由 发表于 2023年7月20日 08:14:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76725927.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定