基于两个重新排列的标签列表重新排列2D NumPy数组

huangapple go评论64阅读模式
英文:

Reorder 2D NumPy array based on two lists of rearranged labels

问题

data_sorted_T = np.transpose(data_sorted)
combined_sorted = zip(labels_sorted, data_sorted_T)
combined_reverted = sorted(combined_sorted, key=lambda s: labels.index(s[0]))
data_T = np.array([label[1] for label in combined_reverted])
data = np.transpose(data_T)
英文:

I have a function that takes a 1D list of numeric labels and returns a 2D array (multiple 1D arrays corresponding to each label). The problem is that the returned data is ordered based on a sorted list of the labels. I need to revert it back into the original order of the labels.

For example, I have the following:

import numpy as np

labels = [20,12,11]         # desired order
labels_sorted = [11,12,20]  # sorted label order

#              labels:    11     12     20                          20     12     11
data_sorted = np.array([[345.3, 361.8 ,347.6],       # reverted: [[347.6, 361.8, 345.3]
                        [383.6, 402.0, 386.2 ],      #            [386.2, 402.0, 383.6]
                        [422.0, 442.2, 424.9 ],      #            [424.9, 442.2, 422.0]
                        [460.4, 482.4, 463.5 ],      #            [463.5, 482.4, 460.4]
                        [498.7, 522.5, 502.1 ]])     #            [502.1, 522.5, 498.7]]

My desired output in this case is for the 1st and 3rd columns to be swapped. I managed to find a solution that gets my desired result. But it mostly uses list operations. I'm concerned that it's going to be slow for larger arrays (e.g., 1000x1000). Is it possible to do this more efficiently with NumPy functions?

data_sorted_T = np.transpose(data_sorted)  # transpose array so it can be zipped correctly
combined_sorted = zip(labels_sorted, data_sorted_T)  # pair the labels with each data set
combined_reverted = sorted(combined_sorted, key=lambda s: labels.index(s[0]))  # rearrange order
#data_T = np.fromiter( [label[1] for label in combined_reverted], float)  # doesn't work
data_T = np.array([label[1] for label in combined_reverted])  # unzip
data = np.transpose(data_T)

print(labels_sorted)
print(data_sorted)
print(labels)
print(data)

答案1

得分: 1

我不认为“标签”的概念在NumPy的情况下有太多意义,所以我猜想最好的做法是简单地使用适合该任务的正确工具,这在这种情况下应该是pandas:

英文:

I don't think the concept of "labels" has much meaning in the case of NumPy so I guess the best idea is to simply use the right tool for the job which would be pandas in this case:

import numpy as np
import pandas as pd

labels = [20,12,11]         # desired order
labels_sorted = [11,12,20]  # sorted label order

#              labels:    11     12     20
data_sorted = np.array([[345.3, 361.8 ,347.6],          
                        [383.6, 402.0, 386.2 ],         
                        [422.0, 442.2, 424.9 ],         
                        [460.4, 482.4, 463.5 ],         
                        [498.7, 522.5, 502.1 ]]) 

res = pd.DataFrame(data_sorted, columns=[str(i) for i in labels_sorted])
          .reindex(columns=[str(i) for i in labels]).values


Out:
array([[347.6, 361.8, 345.3],
       [386.2, 402. , 383.6],
       [424.9, 442.2, 422. ],
       [463.5, 482.4, 460.4],
       [502.1, 522.5, 498.7]])

Also to get performance similar or better compared to Numpy you can use Polars:

import polars as pl

res = pl.DataFrame(data_sorted, schema=[str(i) for i in labels_sorted])
         .select(pl.col(str(i) for i in labels)).to_numpy()

答案2

得分: 1

Sure, here is the translated code part:

你可以做的是将`index list`传递给切片2D数组

以下是没有其他库的一行解决方案

```python
import numpy as np

labels = [20, 12, 11]         # 期望的顺序
labels_sorted = [11, 12, 20]  # 排序后的标签顺序

#              labels:    11     12     20                             20     12     11
data_sorted = np.array([[345.3, 361.8 ,347.6],          # 逆序: [[347.6, 361.8, 345.3]
                        [383.6, 402.0, 386.2 ],         #            [386.2, 402.0, 383.6]
                        [422.0, 442.2, 424.9 ],         #            [424.9, 442.2, 422.0]
                        [460.4, 482.4, 463.5 ],         #            [463.5, 482.4, 460.4]
                        [498.7, 522.5, 502.1 ]])

# # 思路(下面答案的描述)
# index_list = []
# for label in labels:
#     index_list.append(labels_sorted.index(label))
# data_sorted[:, index_list]

# ----------------
#     解决方案
# ----------------
data_sorted[:, [labels_sorted.index(label) for label in labels]]
英文:

What you could do is pass the index list to slice the 2d array.

Below is the one-line solution without any other libs.

import numpy as np

labels = [20,12,11]         # desired order
labels_sorted = [11,12,20]  # sorted label order

#              labels:    11     12     20                             20     12     11
data_sorted = np.array([[345.3, 361.8 ,347.6],          # reverted: [[347.6, 361.8, 345.3]
                        [383.6, 402.0, 386.2 ],         #            [386.2, 402.0, 383.6]
                        [422.0, 442.2, 424.9 ],         #            [424.9, 442.2, 422.0]
                        [460.4, 482.4, 463.5 ],         #            [463.5, 482.4, 460.4]
                        [498.7, 522.5, 502.1 ]])

# # idea (description of the below answer)
# index_list = []
# for label in labels:
#     index_list.append(labels_sorted.index(label))
# data_sorted[:, index_list]

# ----------------
#     Solution
# ----------------
data_sorted[:, [labels_sorted.index(label) for label in labels]]

答案3

得分: 1

得到一个排序的索引数组:

idx=np.argsort(labels); idx

应用到 labels 上:

np.array(labels_sorted)[idx]

以及应用到数据的列上:

data_sorted[:, idx]

不需要使用 argsort,只需要任何可以指定标签和列顺序的方法。

英文:

get a sorting index array:

In [64]: idx=np.argsort(labels); idx
Out[64]: array([2, 1, 0], dtype=int64)

apply it to the labels:

In [65]: np.array(labels_sorted)[idx]
Out[65]: array([20, 12, 11])

and to the columns of the data

In [66]: data_sorted[:, idx]
Out[66]: 
array([[347.6, 361.8, 345.3],
       [386.2, 402. , 383.6],
       [424.9, 442.2, 422. ],
       [463.5, 482.4, 460.4],
       [502.1, 522.5, 498.7]])

You don't need to use argsort; just anything that specifies the order of the labels and columns.

huangapple
  • 本文由 发表于 2023年5月18日 09:26:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76277174.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定