英文:
Reorder 2D NumPy array based on two lists of rearranged labels
问题
data_sorted_T = np.transpose(data_sorted)
combined_sorted = zip(labels_sorted, data_sorted_T)
combined_reverted = sorted(combined_sorted, key=lambda s: labels.index(s[0]))
data_T = np.array([label[1] for label in combined_reverted])
data = np.transpose(data_T)
英文:
I have a function that takes a 1D list of numeric labels and returns a 2D array (multiple 1D arrays corresponding to each label). The problem is that the returned data is ordered based on a sorted list of the labels. I need to revert it back into the original order of the labels.
For example, I have the following:
import numpy as np
labels = [20,12,11] # desired order
labels_sorted = [11,12,20] # sorted label order
# labels: 11 12 20 20 12 11
data_sorted = np.array([[345.3, 361.8 ,347.6], # reverted: [[347.6, 361.8, 345.3]
[383.6, 402.0, 386.2 ], # [386.2, 402.0, 383.6]
[422.0, 442.2, 424.9 ], # [424.9, 442.2, 422.0]
[460.4, 482.4, 463.5 ], # [463.5, 482.4, 460.4]
[498.7, 522.5, 502.1 ]]) # [502.1, 522.5, 498.7]]
My desired output in this case is for the 1st and 3rd columns to be swapped. I managed to find a solution that gets my desired result. But it mostly uses list operations. I'm concerned that it's going to be slow for larger arrays (e.g., 1000x1000). Is it possible to do this more efficiently with NumPy functions?
data_sorted_T = np.transpose(data_sorted) # transpose array so it can be zipped correctly
combined_sorted = zip(labels_sorted, data_sorted_T) # pair the labels with each data set
combined_reverted = sorted(combined_sorted, key=lambda s: labels.index(s[0])) # rearrange order
#data_T = np.fromiter( [label[1] for label in combined_reverted], float) # doesn't work
data_T = np.array([label[1] for label in combined_reverted]) # unzip
data = np.transpose(data_T)
print(labels_sorted)
print(data_sorted)
print(labels)
print(data)
答案1
得分: 1
我不认为“标签”的概念在NumPy的情况下有太多意义,所以我猜想最好的做法是简单地使用适合该任务的正确工具,这在这种情况下应该是pandas:
英文:
I don't think the concept of "labels" has much meaning in the case of NumPy so I guess the best idea is to simply use the right tool for the job which would be pandas in this case:
import numpy as np
import pandas as pd
labels = [20,12,11] # desired order
labels_sorted = [11,12,20] # sorted label order
# labels: 11 12 20
data_sorted = np.array([[345.3, 361.8 ,347.6],
[383.6, 402.0, 386.2 ],
[422.0, 442.2, 424.9 ],
[460.4, 482.4, 463.5 ],
[498.7, 522.5, 502.1 ]])
res = pd.DataFrame(data_sorted, columns=[str(i) for i in labels_sorted])
.reindex(columns=[str(i) for i in labels]).values
Out:
array([[347.6, 361.8, 345.3],
[386.2, 402. , 383.6],
[424.9, 442.2, 422. ],
[463.5, 482.4, 460.4],
[502.1, 522.5, 498.7]])
Also to get performance similar or better compared to Numpy you can use Polars:
import polars as pl
res = pl.DataFrame(data_sorted, schema=[str(i) for i in labels_sorted])
.select(pl.col(str(i) for i in labels)).to_numpy()
答案2
得分: 1
Sure, here is the translated code part:
你可以做的是将`index list`传递给切片2D数组。
以下是没有其他库的一行解决方案。
```python
import numpy as np
labels = [20, 12, 11] # 期望的顺序
labels_sorted = [11, 12, 20] # 排序后的标签顺序
# labels: 11 12 20 20 12 11
data_sorted = np.array([[345.3, 361.8 ,347.6], # 逆序: [[347.6, 361.8, 345.3]
[383.6, 402.0, 386.2 ], # [386.2, 402.0, 383.6]
[422.0, 442.2, 424.9 ], # [424.9, 442.2, 422.0]
[460.4, 482.4, 463.5 ], # [463.5, 482.4, 460.4]
[498.7, 522.5, 502.1 ]])
# # 思路(下面答案的描述)
# index_list = []
# for label in labels:
# index_list.append(labels_sorted.index(label))
# data_sorted[:, index_list]
# ----------------
# 解决方案
# ----------------
data_sorted[:, [labels_sorted.index(label) for label in labels]]
英文:
What you could do is pass the index list
to slice the 2d array.
Below is the one-line solution without any other libs.
import numpy as np
labels = [20,12,11] # desired order
labels_sorted = [11,12,20] # sorted label order
# labels: 11 12 20 20 12 11
data_sorted = np.array([[345.3, 361.8 ,347.6], # reverted: [[347.6, 361.8, 345.3]
[383.6, 402.0, 386.2 ], # [386.2, 402.0, 383.6]
[422.0, 442.2, 424.9 ], # [424.9, 442.2, 422.0]
[460.4, 482.4, 463.5 ], # [463.5, 482.4, 460.4]
[498.7, 522.5, 502.1 ]])
# # idea (description of the below answer)
# index_list = []
# for label in labels:
# index_list.append(labels_sorted.index(label))
# data_sorted[:, index_list]
# ----------------
# Solution
# ----------------
data_sorted[:, [labels_sorted.index(label) for label in labels]]
答案3
得分: 1
得到一个排序的索引数组:
idx=np.argsort(labels); idx
应用到 labels
上:
np.array(labels_sorted)[idx]
以及应用到数据的列上:
data_sorted[:, idx]
不需要使用 argsort
,只需要任何可以指定标签和列顺序的方法。
英文:
get a sorting index array:
In [64]: idx=np.argsort(labels); idx
Out[64]: array([2, 1, 0], dtype=int64)
apply it to the labels
:
In [65]: np.array(labels_sorted)[idx]
Out[65]: array([20, 12, 11])
and to the columns of the data
In [66]: data_sorted[:, idx]
Out[66]:
array([[347.6, 361.8, 345.3],
[386.2, 402. , 383.6],
[424.9, 442.2, 422. ],
[463.5, 482.4, 460.4],
[502.1, 522.5, 498.7]])
You don't need to use argsort
; just anything that specifies the order of the labels and columns.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论