重新编号数组中出现顺序的不相交部分。

huangapple go评论70阅读模式
英文:

Re-number disjoint sections of an array, by order of appearance

问题

考虑一个由连续的“部分”组成的数组:

x = np.asarray([
   1, 1, 1, 1,
   9, 9, 9,
   3, 3, 3, 3, 3,
   5, 5, 5,
])

我不关心数组中的实际值。我只关心它们标识出数组的不同部分。我想重新编号它们,使第一部分全部为 0,第二部分全部为 1,以此类推:

desired = np.asarray([
   0, 0, 0, 0,
   1, 1, 1,
   2, 2, 2, 2, 2,
   3, 3, 3,
])

有没有一种优雅的方法来执行这个操作?我不指望有一个单一的最佳答案,但我认为这个问题可以提供展示各种Numpy和其他Python特性的有趣机会。

请假设在这个问题中,数组是一维的且非空的。

英文:

Consider an array of contiguous "sections":

x = np.asarray([
   1, 1, 1, 1,
   9, 9, 9,
   3, 3, 3, 3, 3,
   5, 5, 5,
])

I don't care about the actual values in the array. I only care that they demarcate disjoint sections of the array. I would like to renumber them so that the first section is all 0, the second second is all 1, and so on:

desired = np.asarray([
   0, 0, 0, 0,
   1, 1, 1,
   2, 2, 2, 2, 2,
   3, 3, 3,
])

What is an elegant way to perform this operation? I don't expect there to be a single best answer, but I think this question could provide interesting opportunities to show off applications of various Numpy and other Python features.

Assume for the sake of this question that the array is 1-dimensional and non-empty.

答案1

得分: 1

结合使用 np.cumsumnp.diff 可以实现这一点。

a = np.cumsum(np.diff(x, prepend=x[0]) != 0)
英文:

Combining np.cumsum with np.diff allows you to do this.

a = np.cumsum(np.diff(x, prepend=x[0]) != 0)

答案2

得分: 0

以下是使用 nditer 的天真但线性时间实现:

def renumber(arr):
    assert arr.ndim == 1

    val_prev = None  # 任意的占位符
    section_number = 0
    result = np.empty_like(arr, dtype=int)
    with np.nditer(
        [arr, result],
        flags=['c_index'],
        op_flags=[['readonly'], ['writeonly']]
    ) as it:
        for val_curr, res in it:
            if it.index > 0 and val_curr != val_prev:
                section_number += 1
            res[...] = section_number
            val_prev = val_curr
    return result

当然,有更高级的方法来做这个,但这个实现应该作为一个明智的基准:

x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = np.array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)
英文:

Here is a naïve but linear-time implementation using nditer:

def renumber(arr):
    assert arr.ndim == 1

    val_prev = None  # Arbitrary placeholder
    section_number = 0
    result = np.empty_like(arr, dtype=int)
    with np.nditer(
        [arr, result],
        flags=['c_index'],
        op_flags=[['readonly'], ['writeonly']]
    ) as it:
        for val_curr, res in it:
            if it.index > 0 and val_curr != val_prev:
                section_number += 1
            res[...] = section_number
            val_prev = val_curr
    return result

There are certainly fancier ways to do this, but this implementation should serve as a sensible baseline:

x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)

答案3

得分: 0

以下是您要求的代码部分的翻译:

def renumber(arr):
    assert x.ndim == 1
    return np.cumsum(np.insert(x[1:] != x[:-1], 0, x[0] != x[1]))

x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = np.array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)

请注意,这段代码的目的是将数组中连续重复的值重新编号,使它们按顺序递增。

英文:

Note: There is a nicer equivalent of this in another answer.

My other answer essentially consists of comparing every value to the value before it, and incrementing a counter whenever that happens. This can be implemented in vectorized fashion by taking advantage of the fact that boolean True corresponds to integer 1, and False corresponds to 0.

def renumber(arr):
    assert x.ndim == 1
    return np.cumsum(np.insert(x[1:] != x[:-1], 0, x[0] != x[1]))

x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = np.array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)

Note that this is a little clunky due to the need to np.insert the first value. I would be very interested to know if there is a more elegant way to achieve this.

huangapple
  • 本文由 发表于 2023年2月24日 10:00:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75551991.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定