英文:
Sorting of arrays based on another array
问题
I understand you want a translation of the provided content. Here is the translated portion:
我有两个数组:
data_array=[4.4, 7.2, 10.1, 1.1, 5.5, 8.3, 2.2, 6.2, 3.3, 9.1, 1.3]
test_array=[2, 5, 9, 4, 10, 8, 7, 3, 6, 1, 1]
所需输出:
[2.2, 5.5, 9.1, 4.4, 10.1, 8.3, 7.2, 3.3, 6.2, 1.1, 1.3]
即,我需要根据两个数组中每个元素的最小差异以唯一方式排列data_array(意思是:在示例test_array中,值'1'出现两次,其最小差异为0.1,因此输出数组中有1.1,但对于test_array中的第二个值'1',它应该取下一个最小差异为0.3的值,因此输出为1.3)。
这只是一个示例。我想要执行此操作,但对于长度更大的数组(数千/数百万)。
提前感谢您!
请注意,这是您提供的内容的翻译部分。如果您需要进一步的帮助或信息,请告诉我。
英文:
I have two arrays:
data_array=[4.4, 7.2, 10.1, 1.1, 5.5, 8.3, 2.2, 6.2, 3.3, 9.1, 1.3]
test_array=[2, 5, 9, 4, 10, 8, 7, 3, 6, 1, 1]
Required Output:
[2.2, 5.5, 9.1, 4.4, 10.1, 8.3, 7.2, 3.3, 6.2, 1.1, 1.3]
i.e., I need to arrange the data_array based on each element wise smallest difference of the two arrays in a unique manner (meaning: in the example test_array, value '1' is present twice whoes smallest difference->0.1 with values of data_array, hence 1.1 in output array, but for second value '1' in the test_array it should take the next smallest difference-> 0.3, hence 1.3 in output array)
This is an example. I want to execute this for arrays of larger lengths (in thousands/millions)
Thank you in advance
Solution in MATLAB, Python or any other efficient medium is appreciated!
def method1(A, B):
result = np.empty_like(A)
for i, val in tqdm(enumerate(A), desc="Processing", unit="iteration", unit_scale=True):
closest_idx = np.argmin(np.abs(B- val))
result[i] = B[closest_idx]
comp_arr = np.delete(B, closest_idx)
return result
#GPU execution - most efficient method found in stackoverflow!
@nb.njit('int_[:](float32[:],float32[:])', parallel=True)
def method2(A,B):
mB = B.shape[0]
output = np.empty(A.shape[0], dtype=np.int_)
# Parallel loop
for i in nb.prange(A.shape[0]):
rowA = A[i]
rowB = B
index_rowB = np.argsort(rowB)
sorted_rowB = rowB[index_rowB]
idxs = np.searchsorted(sorted_rowB, rowA)
left = np.fabs(rowA - sorted_rowB[np.maximum(idxs-1, 0)])
right = np.fabs(rowA - sorted_rowB[np.minimum(idxs, mB-1)])
prev_idx_is_less = (idxs == mB) | (left < right)
output[i] = index_rowB[idxs - prev_idx_is_less]
return output
Both methods takes years for execution for the length of my arrays!!!
答案1
得分: 1
I think I've got this. Apologies if I've not fully understood your question.
My solution:
- 创建包含测试数据的数据帧(通过索引保存了测试数组的原始顺序)
- 对数据帧按测试数据进行排序
- 添加一个带有排序后的数据数组的列
- 对索引进行排序以获取测试数组的原始顺序。
这假设最小的唯一差异与按秩对数组进行配对相同。请检查是否适用于您的用例。
我期望这会很快,因为它的复杂度很低。
代码:
import pandas as pd
data_array = [4.4, 7.2, 10.1, 1.1, 5.5, 8.3, 2.2, 6.2, 3.3, 9.1, 1.3]
test_array = [2, 5, 9, 4, 10, 8, 7, 3, 6, 1, 1]
df = pd.DataFrame(data={'test': test_array})
df.sort_values(by='test', inplace=True)
df['data'] = sorted(data_array)
df.sort_index(inplace=True)
df['data']
#output: 2.2, 5.5, 9.1, 4.4, 10.1, 8.3, 7.2, 3.3, 6.2, 1.1, 1.3
(Note: I've provided the translation of the code part as well for clarity.)
英文:
I think I've got this. Apologies if I've not fully understood your question.
My solution:
- Create dataframe with test data (saves original order of test_array through index)
- Sort the dataframe by the test data
- Add a column with sorted data_array
- Sort the index to get original order of test_array.
This assumes that the smallest unique difference is the same as pairing the arrays by rank. Check that this is accurate for your use case.
I expect this to be quick given its low complexity.
Code:
import pandas as pd
data_array=[4.4, 7.2, 10.1, 1.1, 5.5, 8.3, 2.2, 6.2, 3.3, 9.1, 1.3]
test_array=[2, 5, 9, 4, 10, 8, 7, 3, 6, 1, 1]
df = pd.DataFrame(data={'test':test_array})
df.sort_values(by='test',inplace=True)
df['data']=sorted(data_array)
df.sort_index(inplace=True)
df['data']
#output: 2.2, 5.5, 9.1, 4.4, 10.1, 8.3, 7.2, 3.3, 6.2, 1.1, 1.3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论