2023年2月24日 10:00:32go评论92阅读模式

英文:

Re-number disjoint sections of an array, by order of appearance

问题

考虑一个由连续的“部分”组成的数组：

x = np.asarray([
   1, 1, 1, 1,
   9, 9, 9,
   3, 3, 3, 3, 3,
   5, 5, 5,
])

我不关心数组中的实际值。我只关心它们标识出数组的不同部分。我想重新编号它们，使第一部分全部为 0，第二部分全部为 1，以此类推：

desired = np.asarray([
   0, 0, 0, 0,
   1, 1, 1,
   2, 2, 2, 2, 2,
   3, 3, 3,
])

有没有一种优雅的方法来执行这个操作？我不指望有一个单一的最佳答案，但我认为这个问题可以提供展示各种Numpy和其他Python特性的有趣机会。

请假设在这个问题中，数组是一维的且非空的。

英文:

Consider an array of contiguous "sections":

x = np.asarray([
   1, 1, 1, 1,
   9, 9, 9,
   3, 3, 3, 3, 3,
   5, 5, 5,
])

I don't care about the actual values in the array. I only care that they demarcate disjoint sections of the array. I would like to renumber them so that the first section is all 0, the second second is all 1, and so on:

desired = np.asarray([
   0, 0, 0, 0,
   1, 1, 1,
   2, 2, 2, 2, 2,
   3, 3, 3,
])

What is an elegant way to perform this operation? I don't expect there to be a single best answer, but I think this question could provide interesting opportunities to show off applications of various Numpy and other Python features.

Assume for the sake of this question that the array is 1-dimensional and non-empty.

答案1

得分: 1

结合使用 np.cumsum 和 np.diff 可以实现这一点。

a = np.cumsum(np.diff(x, prepend=x[0]) != 0)

英文:

Combining np.cumsum with np.diff allows you to do this.

a = np.cumsum(np.diff(x, prepend=x[0]) != 0)

答案2

得分: 0

以下是使用 nditer 的天真但线性时间实现：

def renumber(arr):
    assert arr.ndim == 1
    val_prev = None  # 任意的占位符
    section_number = 0
    result = np.empty_like(arr, dtype=int)
    with np.nditer(
        [arr, result],
        flags=['c_index'],
        op_flags=[['readonly'], ['writeonly']]
    ) as it:
        for val_curr, res in it:
            if it.index > 0 and val_curr != val_prev:
                section_number += 1
            res[...] = section_number
            val_prev = val_curr
    return result

当然，有更高级的方法来做这个，但这个实现应该作为一个明智的基准：

x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = np.array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)

英文:

Here is a naïve but linear-time implementation using nditer:

def renumber(arr):
    assert arr.ndim == 1
    val_prev = None  # Arbitrary placeholder
    section_number = 0
    result = np.empty_like(arr, dtype=int)
    with np.nditer(
        [arr, result],
        flags=[&#39;c_index&#39;],
        op_flags=[[&#39;readonly&#39;], [&#39;writeonly&#39;]]
    ) as it:
        for val_curr, res in it:
            if it.index &gt; 0 and val_curr != val_prev:
                section_number += 1
            res[...] = section_number
            val_prev = val_curr
    return result

There are certainly fancier ways to do this, but this implementation should serve as a sensible baseline:

x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)

答案3

得分: 0

以下是您要求的代码部分的翻译：

def renumber(arr):
    assert x.ndim == 1
    return np.cumsum(np.insert(x[1:] != x[:-1], 0, x[0] != x[1]))
x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = np.array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)

请注意，这段代码的目的是将数组中连续重复的值重新编号，使它们按顺序递增。

英文:

Note: There is a nicer equivalent of this in another answer.

My other answer essentially consists of comparing every value to the value before it, and incrementing a counter whenever that happens. This can be implemented in vectorized fashion by taking advantage of the fact that boolean True corresponds to integer 1, and False corresponds to 0.

def renumber(arr):
    assert x.ndim == 1
    return np.cumsum(np.insert(x[1:] != x[:-1], 0, x[0] != x[1]))
x = np.asarray([1, 1, 1, 1, 9, 9, 9, 3, 3, 3, 3, 3, 5, 5, 5])
desired = np.array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3])
np.testing.assert_array_equal(renumber(x), desired)

Note that this is a little clunky due to the need to np.insert the first value. I would be very interested to know if there is a more elegant way to achieve this.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

重新编号数组中出现顺序的不相交部分。

问题

答案1

答案2

答案3

随机根据ID从多个文件中拆分1个文件。

Type “vector” 在postgresql – langchain 上不存在

Python – 如何解包字符串?

Django ORM如何在查询级别通过分页获取数据？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。