英文:
Use Python to sum the data every 24 hours and filter for the data with the maximum sum
问题
I have three rows of data, representing a time series for a whole year at an hourly resolution (3,8760).
我有三行数据,表示一整年以每小时分辨率的时间序列(3,8760)。
I want to sum the values for every 24 hours/columns and filter the row with the maximum sum.
我想对每24小时/列的值进行求和,并筛选出总和最大的行。
For example:
例如:
If I want to sum the values every 2 hours/columns and filter the row with the maximum sum,
如果我想每2小时/列对值进行求和,并筛选出总和最大的行,
the expected output would be
预期的输出将会是
(1,2,4,5,14,0).
(1,2,4,5,14,0).
英文:
I have three rows of data, representing a time series for a whole year at an hourly resolution (3,8760).
I want to sum the values for every 24 hours/columns and filter the row with the maximum sum.
For example:
A = (1,2,3,4,5,6)
B = (0,0,4,5,6,7)
C = (0,0,2,6,14,0)
If I want to sum the values every 2 hours/columns and filter the row with the maximum sum,
the expected output would be
(1,2,4,5,14,0).
Currently, I am only trying to input the data into Python and create it in the form of a dataframe.
答案1
得分: 0
以下是您要翻译的代码部分:
import numpy as np
a = np.array([(1,2,3,4,5,6), (0,0,4,5,6,7), (0,0,2,6,14,0)])
array([[ 1, 2, 3, 4, 5, 6],
[ 0, 0, 4, 5, 6, 7],
[ 0, 0, 2, 6, 14, 0]])
首先,重塑数组并沿新轴求和(在简化示例中为2,完整示例中为24):
b = a.reshape((3,3,2)).sum(axis=2)
array([[ 3, 7, 11],
[ 0, 9, 13],
[ 0, 8, 14]])
这给出了所有部分和。现在,您可以获取给定列中总和最大的索引:
idx = np.argmax(b, axis=0)
array([0, 1, 2], dtype=int64)
现在,您可以根据这个索引从初始数组中选择值:
a.reshape((3,3,2))[idx, range(a.shape[1]//2), :].reshape((6,))
array([ 1, 2, 4, 5, 14, 0])
这给出了您想要的答案。
最终解决方案
您可以将所有这些放在这个函数中:
def filter_row_cols(a, period = 2):
rshape = (a.shape[0], a.shape[1]//period, period)
idx = np.argmax(a.reshape(rshape).sum(axis=2), axis=0)
result = a.reshape(rshape)[idx, range(a.shape[1]//period), :].reshape((a.shape[1],))
return result
array([ 1, 2, 4, 5, 14, 0])
英文:
Taking your simplified example with shape=(6,3):
import numpy as np
a = np.array([(1,2,3,4,5,6), (0,0,4,5,6,7), (0,0,2,6,14,0)])
array([[ 1, 2, 3, 4, 5, 6],
[ 0, 0, 4, 5, 6, 7],
[ 0, 0, 2, 6, 14, 0]])
First reshape the array and sum along new axis (2 in simplified example, 24 in your full case):
b = a.reshape((3,3,2)).sum(axis=2)
array([[ 3, 7, 11],
[ 0, 9, 13],
[ 0, 8, 14]])
This gives you all partial sums. Now you can get indexes where there sum is biggest for a given column:
idx = np.argmax(b, axis=0)
array([0, 1, 2], dtype=int64)
Now you can select values from the initial array according to this index:
a.reshape((3,3,2))[idx, range(a.shape[1]//2), :].reshape((6,))
array([ 1, 2, 4, 5, 14, 0])
Which gives the answer you wanted.
Final solution
You can write it all in this routine:
def filter_row_cols(a, period = 2):
rshape = (a.shape[0], a.shape[1]//period, period)
idx = np.argmax(a.reshape(rshape).sum(axis=2), axis=0)
result = a.reshape(rshape)[idx, range(a.shape[1]//period), :].reshape((a.shape[1],))
return result
array([ 1, 2, 4, 5, 14, 0])
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论