英文:
Check if condition is met three or more times consecutively (Python)
问题
我有一个全球数据集,包括大约15000天 x 361纬度 x 576经度的每日数值。这个数据集是二进制的 - 在满足条件的位置/天数上为1,在不满足条件的位置/天数上为0。我想只保留连续出现3天或更多天的位置上的1。目前我正在使用numpy np数组处理数据,但我也会使用xarray。
我的初始想法是使用一个3天滚动求和,并检查它是否等于3,但这只会找到三天或更多天的期间的中间天数,而不是期间的两端。
有没有有效的方法来实现这一目标?最好不需要明确遍历每个项目 - 因为那会花费很长时间。提前感谢!
英文:
I have a global dataset of daily values approximately 15000 days x 361 lat x 576 lon. This dataset is binary - there are ones in locations/days that meet a criteria, and 0s where the condition is not met. I want to only keep 1s where they occur 3 or more days in a row. Currently working with the data as a numpy np array but I also work with xarray as well.
My initial idea was a 3 day rolling sum and checking where it was 3, but this only finds the middle days of three+ day periods, not the ends.
Any ideas for an efficient way to accomplish this? ideally without explicitly looping through each item - as that would take a long time. Thanks in advance!
答案1
得分: 2
首先,通过找到一组包含3个元素的集合,然后使用“or”将它们向后移动两个元素来处理这个问题。以下是易于理解的版本:
import numpy as np
np.random.seed(17)
rands = np.random.randint(2, size=30)
# [1 1 1 0 0 1 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1]
and_rands = rands[:-2] & rands[1:-1] & rands[2:]
# [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
lhs = np.concatenate((and_rands, np.zeros(2,dtype=and_rands.dtype)))
mid = np.concatenate((np.zeros(1,dtype=and_rands.dtype), and_rands, np.zeros(1, dtype=and_rands.dtype)))
rhs = np.concatenate((np.zeros(2,dtype=and_rands.dtype), and_rands))
result = lhs | mid | rhs
# [1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1]
以下是相同的内容,但缩放并且更节省内存:
import numpy as np
np.random.seed(17)
DAYS, DIM2, DIM3 = 15000, 361, 576
rands = np.random.randint(2, size=(DAYS, DIM2, DIM3), dtype='i1')
ret = np.zeros((DAYS, DIM2, DIM3), dtype=rands.dtype)
ret[2:, :, :] |= rands[2:, :, :]
ret[2:, :, :] &= rands[1:-1, :, :]
ret[2:, :, :] &= rands[:-2, :, :]
ret[1:-1, :, :] |= ret[2:, :, :]
ret[:-2, :, :] |= ret[1:-1:, :, :]
print(rands[:30, 0, 0])
# [1 1 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 1 0 1 1 0 0
1 1 0 1 1 1 0 1 1 1 0 0 1]
print(ret[:30, 0, 0])
# [1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0
0 0 0 1 1 1 0 1 1 1 0 0 0]
英文:
Approach this by first finding the sets of 3, then shuffling them back two elements with or
. Here's the easy to understand version:
import numpy as np
np.random.seed(17)
rands = np.random.randint(2, size=30)
# [1 1 1 0 0 1 0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1]
and_rands = rands[:-2] & rands[1:-1] & rands[2:]
# [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
lhs = np.concatenate((and_rands, np.zeros(2,dtype=and_rands.dtype)))
mid = np.concatenate((np.zeros(1,dtype=and_rands.dtype), and_rands, np.zeros(1, dtype=and_rands.dtype)))
rhs = np.concatenate((np.zeros(2,dtype=and_rands.dtype), and_rands))
result = lhs | mid | rhs
# [1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1]
And here's the same thing, but scaled and a bit more memory efficient:
import numpy as np
np.random.seed(17)
DAYS, DIM2, DIM3 =15000, 361, 576
rands = np.random.randint(2, size=(DAYS, DIM2, DIM3), dtype='i1')
ret = np.zeros((DAYS, DIM2, DIM3), dtype=rands.dtype)
ret[2:, :, :] |= rands[2:, :, :]
ret[2:, :, :] &= rands[1:-1, :, :]
ret[2:, :, :] &= rands[:-2, :, :]
ret[1:-1, :, :] |= ret[2:, :, :]
ret[:-2, :, :] |= ret[1:-1:, :, :]
print(rands[:30, 0, 0])
# [1 1 1 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 1 0 1 1 0 0
1 1 0 1 1 1 0 1 1 1 0 0 1]
print(ret[:30, 0, 0])
# [1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0
0 0 0 1 1 1 0 1 1 1 0 0 0]
答案2
得分: 1
从可扩展性更强的方法开始
来自skimage.util的导入视窗视图
import numpy as np
定义find_blocks函数(in_arr, window_shape=(3, 1, 1)):
dims = len(window_shape)
使用view_as_windows创建windowed_view = view_as_windows(in_arr, window_shape)
loc = np.logical_and.reduce(windowed_view, tuple(range(-dims, 0)))
loc_shape = loc.shape
使用np.pad进行填充的loc = np.pad(loc, tuple((i-1, 0) for i in window_shape))
使用view_as_windows创建windowed_loc = view_as_windows(loc, loc_shape)
返回np.logical_or.reduce(windowed_loc, tuple(range(dims)))
如果您仅想使用numpy,我在此处有一个方法(https://stackoverflow.com/questions/45960192/using-numpy-as-strided-function-to-create-patches-tiles-rolling-or-sliding-w/45960193#45960193),它复制了view_as_windows(甚至添加了一些额外的功能,比如一个axis参数,因此您不需要您的window_shape和in_arr具有相同数量的维度)
定义find_blocks_np函数(in_arr, window=3, axis=0):
使用window_nd创建windowed_view = window_nd(in_arr, window, axis=axis)
loc = np.logical_and.reduce(windowed_view, axis + 1)
loc_shape = loc.shape
使用zip(in_arr.shape, loc.shape)创建padder = tuple((i-j, 0) for i, j in zip(in_arr.shape, loc.shape))
loc = np.pad(loc, padder)
使用window_nd创建windowed_loc = window_nd(loc, loc_shape)
返回np.logical_or.reduce(windowed_loc, 0)
英文:
A bit more extensible method
from skimage.util import view_as_windows
import numpy as np
def find_blocks(in_arr, window_shape = (3,1,1)):
dims = len(window_shape)
windowed_view = view_as_windows(in_arr, window_shape)
loc = np.logical_and.reduce(windowed_view, tuple(range(-dims, 0)))
loc_shape = loc.shape
loc = np.pad(loc, tuple((i-1, 0) for i in window_shape))
windowed_loc = view_as_windows(loc, loc_shape)
return np.logical_or.reduce(windowed_loc, tuple(range(dims)))
If you want numpy
only, I have a recipe here that replicates view_as_windows
(with few added functionalities even, like an axis
parameter so you don't need your window_shape
to have the same number of dimensions as your in_arr
)
def find_blocks_np(in_arr, window = 3, axis = 0):
windowed_view = window_nd(in_arr, window, axis = axis)
loc = np.logical_and.reduce(windowed_view, axis + 1)
loc_shape = loc.shape
padder = tuple((i-j, 0) for i, j in zip(in_arr.shape, loc.shape))
loc = np.pad(loc, padder)
windowed_loc = window_nd(loc, loc_shape)
return np.logical_or.reduce(windowed_loc, 0)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论