如何根据列中定义的起始和结束值创建数据帧的行序列

huangapple go评论110阅读模式
英文:

How to create a sequence of rows of Data Frame based on starting and ending value defined by columns

问题

我有以下的数据框:

  1. example_df = pd.DataFrame({'id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4},
  2. 'seq_start': {0: 0.0, 1: 2800.0, 2: 6400.0, 3: 8400.0, 4: 9800.0},
  3. 'seq_end': {0: 1400.0, 1: 4700.0, 2: 8400.0, 3: 9800.0, 4: 11400.0}})

我想要获得一个数据框,其中包含从 example_df['seq_start']example_df['seq_end'] 的值序列,以便稍后在连接中使用新创建的列。

所以期望的输出如下:

  1. out_df = pd.DataFrame({'id': np.concatenate([[0] * 15, [1] * 20, [2] * 21]),
  2. 'expected_output': np.concatenate([np.arange(0, 1500, 100),
  3. np.arange(2800, 4800, 100),
  4. np.arange(6400, 8500, 100)])})

如何处理这个问题?

英文:

I've got a following Data Frame:

  1. example_df = pd.DataFrame({'id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4},
  2. 'seq_start': {0: 0.0, 1: 2800.0, 2: 6400.0, 3: 8400.0, 4: 9800.0},
  3. 'seq_end': {0: 1400.0, 1: 4700.0, 2: 8400.0, 3: 9800.0, 4: 11400.0}})

I'd like to obtain a Data Frame that has a sequences of values from example_df['seq_start'] to example_df['seq_end'] so that I could later use newly created column in a join.

So the expected output would look like below:

  1. out_df = pd.DataFrame({'id': np.concatenate([[0] * 15, [1] * 20, [2] * 21]),
  2. 'expected_output': np.concatenate([np.arange(0, 1500, 100),
  3. np.arange(2800, 4800, 100),
  4. np.arange(6400, 8500, 100)])})
  5. id expected_output
  6. 0 0 0
  7. 1 0 100
  8. 2 0 200
  9. 3 0 300
  10. 4 0 400
  11. 5 0 500
  12. ...
  13. 12 0 1200
  14. 13 0 1300
  15. 14 0 1400
  16. 15 1 2800
  17. 16 1 2900
  18. 17 1 3000
  19. ...
  20. 31 1 4400
  21. 32 1 4500
  22. 33 1 4600
  23. 34 1 4700
  24. 35 2 6400
  25. 36 2 6500
  26. 37 2 6600
  27. ...
  28. 54 2 8300
  29. 55 2 8400

How can I approach this?

答案1

得分: 2

使用 pandas.DataFrame.explode

  1. def listify(x, step=100, right_closed=True):
  2. lower, upper = sorted(x)
  3. return range(lower, upper+step*right_closed, step)
  4. example_df['expected'] = example_df[['seq_end', 'seq_start']].astype(int).apply(listify, 1)
  5. new_df = example_df[['id','expected']].explode('expected')
  6. print(new_df)

输出:

  1. id expected
  2. 0 0 0
  3. 0 0 100
  4. 0 0 200
  5. 0 0 300
  6. 0 0 400
  7. ...
  8. 4 4 11000
  9. 4 4 11100
  10. 4 4 11200
  11. 4 4 11300
  12. 4 4 11400
英文:

Using pandas.DataFrame.explode:

  1. def listify(x, step=100, right_closed=True):
  2. lower, upper = sorted(x)
  3. return range(lower, upper+step*right_closed, step)
  4. example_df['expected'] = example_df[['seq_end', 'seq_start']].astype(int).apply(listify, 1)
  5. new_df = example_df[['id','expected']].explode('expected')
  6. print(new_df)

Output:

  1. id expected
  2. 0 0 0
  3. 0 0 100
  4. 0 0 200
  5. 0 0 300
  6. 0 0 400
  7. .. .. ...
  8. 4 4 11000
  9. 4 4 11100
  10. 4 4 11200
  11. 4 4 11300
  12. 4 4 11400

huangapple
  • 本文由 发表于 2020年1月3日 15:43:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/59574875.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定