Use Python 每隔 24 小时对数据求和,并筛选出总和最大的数据。

huangapple go评论70阅读模式
英文:

Use Python to sum the data every 24 hours and filter for the data with the maximum sum

问题

I have three rows of data, representing a time series for a whole year at an hourly resolution (3,8760).
我有三行数据,表示一整年以每小时分辨率的时间序列(3,8760)。

I want to sum the values for every 24 hours/columns and filter the row with the maximum sum.
我想对每24小时/列的值进行求和,并筛选出总和最大的行。

For example:
例如:

If I want to sum the values every 2 hours/columns and filter the row with the maximum sum,
如果我想每2小时/列对值进行求和,并筛选出总和最大的行,

the expected output would be
预期的输出将会是

(1,2,4,5,14,0).
(1,2,4,5,14,0).

英文:

I have three rows of data, representing a time series for a whole year at an hourly resolution (3,8760).
I want to sum the values for every 24 hours/columns and filter the row with the maximum sum.

For example:
A = (1,2,3,4,5,6)
B = (0,0,4,5,6,7)
C = (0,0,2,6,14,0)

If I want to sum the values every 2 hours/columns and filter the row with the maximum sum,
the expected output would be
(1,2,4,5,14,0).

Currently, I am only trying to input the data into Python and create it in the form of a dataframe.

答案1

得分: 0

以下是您要翻译的代码部分:

import numpy as np
a = np.array([(1,2,3,4,5,6), (0,0,4,5,6,7), (0,0,2,6,14,0)])
array([[ 1,  2,  3,  4,  5,  6],
       [ 0,  0,  4,  5,  6,  7],
       [ 0,  0,  2,  6, 14,  0]])

首先,重塑数组并沿新轴求和(在简化示例中为2,完整示例中为24):

b = a.reshape((3,3,2)).sum(axis=2)
array([[ 3,  7, 11],
       [ 0,  9, 13],
       [ 0,  8, 14]])

这给出了所有部分和。现在,您可以获取给定列中总和最大的索引:

idx = np.argmax(b, axis=0)
array([0, 1, 2], dtype=int64)

现在,您可以根据这个索引从初始数组中选择值:

a.reshape((3,3,2))[idx, range(a.shape[1]//2), :].reshape((6,))
array([ 1,  2,  4,  5, 14,  0])

这给出了您想要的答案。

最终解决方案

您可以将所有这些放在这个函数中:

def filter_row_cols(a, period = 2):
  rshape = (a.shape[0], a.shape[1]//period, period)
  idx = np.argmax(a.reshape(rshape).sum(axis=2), axis=0)
  result = a.reshape(rshape)[idx, range(a.shape[1]//period), :].reshape((a.shape[1],))
  return result
array([ 1,  2,  4,  5, 14,  0])
英文:

Taking your simplified example with shape=(6,3):

import numpy as np
a = np.array([(1,2,3,4,5,6), (0,0,4,5,6,7), (0,0,2,6,14,0)])
array([[ 1,  2,  3,  4,  5,  6],
       [ 0,  0,  4,  5,  6,  7],
       [ 0,  0,  2,  6, 14,  0]])

First reshape the array and sum along new axis (2 in simplified example, 24 in your full case):

b = a.reshape((3,3,2)).sum(axis=2)
array([[ 3,  7, 11],
       [ 0,  9, 13],
       [ 0,  8, 14]])

This gives you all partial sums. Now you can get indexes where there sum is biggest for a given column:

idx = np.argmax(b, axis=0)
array([0, 1, 2], dtype=int64)

Now you can select values from the initial array according to this index:

a.reshape((3,3,2))[idx, range(a.shape[1]//2), :].reshape((6,))
array([ 1,  2,  4,  5, 14,  0])

Which gives the answer you wanted.

Final solution

You can write it all in this routine:

def filter_row_cols(a, period = 2):
  rshape = (a.shape[0], a.shape[1]//period, period)
  idx = np.argmax(a.reshape(rshape).sum(axis=2), axis=0)
  result = a.reshape(rshape)[idx, range(a.shape[1]//period), :].reshape((a.shape[1],))
  return result
array([ 1,  2,  4,  5, 14,  0])

huangapple
  • 本文由 发表于 2023年5月22日 11:16:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76302840.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定