2023年5月18日 09:26:45go评论81阅读模式

英文:

Reorder 2D NumPy array based on two lists of rearranged labels

问题

data_sorted_T = np.transpose(data_sorted)
combined_sorted = zip(labels_sorted, data_sorted_T)
combined_reverted = sorted(combined_sorted, key=lambda s: labels.index(s[0]))
data_T = np.array([label[1] for label in combined_reverted])
data = np.transpose(data_T)

英文:

I have a function that takes a 1D list of numeric labels and returns a 2D array (multiple 1D arrays corresponding to each label). The problem is that the returned data is ordered based on a sorted list of the labels. I need to revert it back into the original order of the labels.

For example, I have the following:

import numpy as np

labels = [20,12,11]         # desired order
labels_sorted = [11,12,20]  # sorted label order

#              labels:    11     12     20                          20     12     11
data_sorted = np.array([[345.3, 361.8 ,347.6],       # reverted: [[347.6, 361.8, 345.3]
                        [383.6, 402.0, 386.2 ],      #            [386.2, 402.0, 383.6]
                        [422.0, 442.2, 424.9 ],      #            [424.9, 442.2, 422.0]
                        [460.4, 482.4, 463.5 ],      #            [463.5, 482.4, 460.4]
                        [498.7, 522.5, 502.1 ]])     #            [502.1, 522.5, 498.7]]

My desired output in this case is for the 1st and 3rd columns to be swapped. I managed to find a solution that gets my desired result. But it mostly uses list operations. I'm concerned that it's going to be slow for larger arrays (e.g., 1000x1000). Is it possible to do this more efficiently with NumPy functions?

data_sorted_T = np.transpose(data_sorted)  # transpose array so it can be zipped correctly
combined_sorted = zip(labels_sorted, data_sorted_T)  # pair the labels with each data set
combined_reverted = sorted(combined_sorted, key=lambda s: labels.index(s[0]))  # rearrange order
#data_T = np.fromiter( [label[1] for label in combined_reverted], float)  # doesn&#39;t work
data_T = np.array([label[1] for label in combined_reverted])  # unzip
data = np.transpose(data_T)

print(labels_sorted)
print(data_sorted)
print(labels)
print(data)

答案1

得分: 1

我不认为“标签”的概念在NumPy的情况下有太多意义，所以我猜想最好的做法是简单地使用适合该任务的正确工具，这在这种情况下应该是pandas：

英文:

I don't think the concept of "labels" has much meaning in the case of NumPy so I guess the best idea is to simply use the right tool for the job which would be pandas in this case:

import numpy as np
import pandas as pd

labels = [20,12,11]         # desired order
labels_sorted = [11,12,20]  # sorted label order

#              labels:    11     12     20
data_sorted = np.array([[345.3, 361.8 ,347.6],          
                        [383.6, 402.0, 386.2 ],         
                        [422.0, 442.2, 424.9 ],         
                        [460.4, 482.4, 463.5 ],         
                        [498.7, 522.5, 502.1 ]]) 

res = pd.DataFrame(data_sorted, columns=[str(i) for i in labels_sorted])
          .reindex(columns=[str(i) for i in labels]).values


Out:
array([[347.6, 361.8, 345.3],
       [386.2, 402. , 383.6],
       [424.9, 442.2, 422. ],
       [463.5, 482.4, 460.4],
       [502.1, 522.5, 498.7]])

Also to get performance similar or better compared to Numpy you can use Polars:

import polars as pl

res = pl.DataFrame(data_sorted, schema=[str(i) for i in labels_sorted])
         .select(pl.col(str(i) for i in labels)).to_numpy()

答案2

得分: 1

Sure, here is the translated code part:

你可以做的是将`index list`传递给切片2D数组。

以下是没有其他库的一行解决方案。

```python
import numpy as np

labels = [20, 12, 11]         # 期望的顺序
labels_sorted = [11, 12, 20]  # 排序后的标签顺序

#              labels:    11     12     20                             20     12     11
data_sorted = np.array([[345.3, 361.8 ,347.6],          # 逆序: [[347.6, 361.8, 345.3]
                        [383.6, 402.0, 386.2 ],         #            [386.2, 402.0, 383.6]
                        [422.0, 442.2, 424.9 ],         #            [424.9, 442.2, 422.0]
                        [460.4, 482.4, 463.5 ],         #            [463.5, 482.4, 460.4]
                        [498.7, 522.5, 502.1 ]])

# # 思路（下面答案的描述）
# index_list = []
# for label in labels:
#     index_list.append(labels_sorted.index(label))
# data_sorted[:, index_list]

# ----------------
#     解决方案
# ----------------
data_sorted[:, [labels_sorted.index(label) for label in labels]]

英文:

What you could do is pass the index list to slice the 2d array.

Below is the one-line solution without any other libs.

import numpy as np

labels = [20,12,11]         # desired order
labels_sorted = [11,12,20]  # sorted label order

#              labels:    11     12     20                             20     12     11
data_sorted = np.array([[345.3, 361.8 ,347.6],          # reverted: [[347.6, 361.8, 345.3]
                        [383.6, 402.0, 386.2 ],         #            [386.2, 402.0, 383.6]
                        [422.0, 442.2, 424.9 ],         #            [424.9, 442.2, 422.0]
                        [460.4, 482.4, 463.5 ],         #            [463.5, 482.4, 460.4]
                        [498.7, 522.5, 502.1 ]])

# # idea (description of the below answer)
# index_list = []
# for label in labels:
#     index_list.append(labels_sorted.index(label))
# data_sorted[:, index_list]

# ----------------
#     Solution
# ----------------
data_sorted[:, [labels_sorted.index(label) for label in labels]]

答案3

得分: 1

得到一个排序的索引数组：

idx=np.argsort(labels); idx

应用到 labels 上：

np.array(labels_sorted)[idx]

以及应用到数据的列上：

data_sorted[:, idx]

不需要使用 argsort，只需要任何可以指定标签和列顺序的方法。

英文:

get a sorting index array:

In [64]: idx=np.argsort(labels); idx
Out[64]: array([2, 1, 0], dtype=int64)

apply it to the labels:

In [65]: np.array(labels_sorted)[idx]
Out[65]: array([20, 12, 11])

and to the columns of the data

In [66]: data_sorted[:, idx]
Out[66]: 
array([[347.6, 361.8, 345.3],
       [386.2, 402. , 383.6],
       [424.9, 442.2, 422. ],
       [463.5, 482.4, 460.4],
       [502.1, 522.5, 498.7]])

You don't need to use argsort; just anything that specifies the order of the labels and columns.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

基于两个重新排列的标签列表重新排列2D NumPy数组

问题

答案1

答案2

答案3

什么是在ModelChoiceField中更好的使用 ‘request’ 的方式

除了它不打开我的摄像头来获取我的面部坐标之外，它正在运行。

如何在matplotlib中为没有内部线的蒙版添加轮廓线。

如何在Python中定义自定义可调用类型

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论