英文:
Calculate median values for every certain rows in pandas df
问题
I have the following df
:
df = pd.DataFrame({
"value": [10,20,30,40,50,60,70,80,90,100]
})
I need to calculate median values for every n
rows. Ideally, to write a function where I can pass pd.Series
and n
as arguments. So, if n=2
, my function should return:
median
15
35
55
75
95
if n=3
, then it should return:
median
20
50
80
100
In this example when n=3
, the last returned value is 100
, however in my real dataset, I have a df
with thousands of rows, and I want to set my n
to 10
or 20
. So, as the last median value, it should return the median of n%10
.
I included a similar function below just for reference from link, it calculates the mean in the same manner I explained, but I need to tweak it to calculate the median.
def find_mean(col, rows):
"""
col: pd.Series
rows: number of rows
"""
if isinstance(col, pd.Series):
col = col.to_numpy()
mod = col.shape[0] % rows
if mod != 0:
exclude = col[-mod:]
keep = col[: len(col) - mod]
out = keep.reshape((int(keep.shape[0]/rows), int(rows))).mean(1)
out = np.hstack((out, exclude.mean()))
else:
out = col.reshape((int(col.shape[0]/rows), int(rows))).mean(1)
return out
英文:
I have the following df
:
df = pd.DataFrame({
"value": [10,20,30,40,50,60,70,80,90,100]
})
I need to calculate median values for every n
rows. Ideally, to write a function where I can pass pd.Series
and n
as arguments. So, if n=2
, my function should return:
median
15
35
55
75
95
if n=3
, then it should return:
median
20
50
80
100
In this example when n=3
, the last returned value is 100
, however in my real dataset, I have a df
with thousands of rows, and I want to set my n
to 10
or 20
. So, as the last median value, it should return the median of n%10
.
I included a similar function below just for reference from link, it calculates the mean in the same manner I explained, but I need to tweak it to calculate the median.
def find_mean(col, rows):
"""
col: pd.Series
rows: number of rows
"""
if isinstance(col, pd.Series):
col = col.to_numpy()
mod = col.shape[0] % rows
if mod != 0:
exclude = col[-mod:]
keep = col[: len(col) - mod]
out = keep.reshape((int(keep.shape[0]/rows), int(rows))).mean(1)
out = np.hstack((out, exclude.mean()))
else:
out = col.reshape((int(col.shape[0]/rows), int(rows))).mean(1)
return out
答案1
得分: 3
你可以使用 groupby
:
N = 3
df.groupby(np.arange(len(df))//N)['value'].median()
输出:
0 20
1 50
2 80
3 100
Name: value, dtype: int64
英文:
You can use groupby
:
N = 3
df.groupby(np.arange(len(df))//N)['value'].median()
Output:
0 20
1 50
2 80
3 100
Name: value, dtype: int64
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论