英文:
How to create a sequence of rows of Data Frame based on starting and ending value defined by columns
问题
我有以下的数据框:
example_df = pd.DataFrame({'id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4},
'seq_start': {0: 0.0, 1: 2800.0, 2: 6400.0, 3: 8400.0, 4: 9800.0},
'seq_end': {0: 1400.0, 1: 4700.0, 2: 8400.0, 3: 9800.0, 4: 11400.0}})
我想要获得一个数据框,其中包含从 example_df['seq_start']
到 example_df['seq_end']
的值序列,以便稍后在连接中使用新创建的列。
所以期望的输出如下:
out_df = pd.DataFrame({'id': np.concatenate([[0] * 15, [1] * 20, [2] * 21]),
'expected_output': np.concatenate([np.arange(0, 1500, 100),
np.arange(2800, 4800, 100),
np.arange(6400, 8500, 100)])})
如何处理这个问题?
英文:
I've got a following Data Frame:
example_df = pd.DataFrame({'id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4},
'seq_start': {0: 0.0, 1: 2800.0, 2: 6400.0, 3: 8400.0, 4: 9800.0},
'seq_end': {0: 1400.0, 1: 4700.0, 2: 8400.0, 3: 9800.0, 4: 11400.0}})
I'd like to obtain a Data Frame that has a sequences of values from example_df['seq_start']
to example_df['seq_end']
so that I could later use newly created column in a join.
So the expected output would look like below:
out_df = pd.DataFrame({'id': np.concatenate([[0] * 15, [1] * 20, [2] * 21]),
'expected_output': np.concatenate([np.arange(0, 1500, 100),
np.arange(2800, 4800, 100),
np.arange(6400, 8500, 100)])})
id expected_output
0 0 0
1 0 100
2 0 200
3 0 300
4 0 400
5 0 500
...
12 0 1200
13 0 1300
14 0 1400
15 1 2800
16 1 2900
17 1 3000
...
31 1 4400
32 1 4500
33 1 4600
34 1 4700
35 2 6400
36 2 6500
37 2 6600
...
54 2 8300
55 2 8400
How can I approach this?
答案1
得分: 2
使用 pandas.DataFrame.explode
:
def listify(x, step=100, right_closed=True):
lower, upper = sorted(x)
return range(lower, upper+step*right_closed, step)
example_df['expected'] = example_df[['seq_end', 'seq_start']].astype(int).apply(listify, 1)
new_df = example_df[['id','expected']].explode('expected')
print(new_df)
输出:
id expected
0 0 0
0 0 100
0 0 200
0 0 300
0 0 400
...
4 4 11000
4 4 11100
4 4 11200
4 4 11300
4 4 11400
英文:
Using pandas.DataFrame.explode
:
def listify(x, step=100, right_closed=True):
lower, upper = sorted(x)
return range(lower, upper+step*right_closed, step)
example_df['expected'] = example_df[['seq_end', 'seq_start']].astype(int).apply(listify, 1)
new_df = example_df[['id','expected']].explode('expected')
print(new_df)
Output:
id expected
0 0 0
0 0 100
0 0 200
0 0 300
0 0 400
.. .. ...
4 4 11000
4 4 11100
4 4 11200
4 4 11300
4 4 11400
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论