Use Python 每隔 24 小时对数据求和,并筛选出总和最大的数据。

huangapple go评论101阅读模式
英文:

Use Python to sum the data every 24 hours and filter for the data with the maximum sum

问题

I have three rows of data, representing a time series for a whole year at an hourly resolution (3,8760).
我有三行数据,表示一整年以每小时分辨率的时间序列(3,8760)。

I want to sum the values for every 24 hours/columns and filter the row with the maximum sum.
我想对每24小时/列的值进行求和,并筛选出总和最大的行。

For example:
例如:

If I want to sum the values every 2 hours/columns and filter the row with the maximum sum,
如果我想每2小时/列对值进行求和,并筛选出总和最大的行,

the expected output would be
预期的输出将会是

(1,2,4,5,14,0).
(1,2,4,5,14,0).

英文:

I have three rows of data, representing a time series for a whole year at an hourly resolution (3,8760).
I want to sum the values for every 24 hours/columns and filter the row with the maximum sum.

For example:
A = (1,2,3,4,5,6)
B = (0,0,4,5,6,7)
C = (0,0,2,6,14,0)

If I want to sum the values every 2 hours/columns and filter the row with the maximum sum,
the expected output would be
(1,2,4,5,14,0).

Currently, I am only trying to input the data into Python and create it in the form of a dataframe.

答案1

得分: 0

以下是您要翻译的代码部分:

  1. import numpy as np
  2. a = np.array([(1,2,3,4,5,6), (0,0,4,5,6,7), (0,0,2,6,14,0)])
  1. array([[ 1, 2, 3, 4, 5, 6],
  2. [ 0, 0, 4, 5, 6, 7],
  3. [ 0, 0, 2, 6, 14, 0]])

首先,重塑数组并沿新轴求和(在简化示例中为2,完整示例中为24):

  1. b = a.reshape((3,3,2)).sum(axis=2)
  1. array([[ 3, 7, 11],
  2. [ 0, 9, 13],
  3. [ 0, 8, 14]])

这给出了所有部分和。现在,您可以获取给定列中总和最大的索引:

  1. idx = np.argmax(b, axis=0)
  1. array([0, 1, 2], dtype=int64)

现在,您可以根据这个索引从初始数组中选择值:

  1. a.reshape((3,3,2))[idx, range(a.shape[1]//2), :].reshape((6,))
  1. array([ 1, 2, 4, 5, 14, 0])

这给出了您想要的答案。

最终解决方案

您可以将所有这些放在这个函数中:

  1. def filter_row_cols(a, period = 2):
  2. rshape = (a.shape[0], a.shape[1]//period, period)
  3. idx = np.argmax(a.reshape(rshape).sum(axis=2), axis=0)
  4. result = a.reshape(rshape)[idx, range(a.shape[1]//period), :].reshape((a.shape[1],))
  5. return result
  1. array([ 1, 2, 4, 5, 14, 0])
英文:

Taking your simplified example with shape=(6,3):

  1. import numpy as np
  2. a = np.array([(1,2,3,4,5,6), (0,0,4,5,6,7), (0,0,2,6,14,0)])
  1. array([[ 1, 2, 3, 4, 5, 6],
  2. [ 0, 0, 4, 5, 6, 7],
  3. [ 0, 0, 2, 6, 14, 0]])

First reshape the array and sum along new axis (2 in simplified example, 24 in your full case):

  1. b = a.reshape((3,3,2)).sum(axis=2)
  1. array([[ 3, 7, 11],
  2. [ 0, 9, 13],
  3. [ 0, 8, 14]])

This gives you all partial sums. Now you can get indexes where there sum is biggest for a given column:

  1. idx = np.argmax(b, axis=0)
  1. array([0, 1, 2], dtype=int64)

Now you can select values from the initial array according to this index:

  1. a.reshape((3,3,2))[idx, range(a.shape[1]//2), :].reshape((6,))
  1. array([ 1, 2, 4, 5, 14, 0])

Which gives the answer you wanted.

Final solution

You can write it all in this routine:

  1. def filter_row_cols(a, period = 2):
  2. rshape = (a.shape[0], a.shape[1]//period, period)
  3. idx = np.argmax(a.reshape(rshape).sum(axis=2), axis=0)
  4. result = a.reshape(rshape)[idx, range(a.shape[1]//period), :].reshape((a.shape[1],))
  5. return result
  1. array([ 1, 2, 4, 5, 14, 0])

huangapple
  • 本文由 发表于 2023年5月22日 11:16:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76302840.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定