2023年3月31日 20:10:24go评论65阅读模式

英文:

How can I reduce the time needed to run my code and what is the cause of the slow speed?

问题

代码在小数据集上运行良好，如示例代码所示，但我要处理的数据集包括两个列表 picker 和 order，分别具有1592798和288以及528510个唯一值的长度。
出于示例的目的，我已将它们替换为两个较短的列表，但概念是相同的。我想知道运行代码所需的时间是否是由于数据量过大，还是由于代码在处理数据时效率不高，可以改进。

代码的目的是将与唯一订单相关联的所有 picker 分组到一个列表（hold）中，该列表位于一个列表（pairs）内。必须确定成对列表中元素的顺序，由列表中每个元素的第一个条目决定，例如[1, 'a']必须位于[2, 'b', 'k']之前，因为1比2小。关于'b'，'k'在[2, 'b', 'k']中的顺序是由它们在 picker 列表中的出现顺序决定的。'b'之前出现在'k'之前，因为'b'的索引较低。
当前的代码如下：

order  = [ 1,   2,   3,   4,   1,   5,   3,   6,   7,   1,   8,   9,   4,   4,   2,   8,   4,   4,   2 ]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']
pairs = []

order_picker = list(zip(order, picker))

for x in set(order):
    hold = []
    hold.append(x)
    for i in range(len(order_picker)):
        if x == list(order_picker[i])[0]:
            if list(order_picker[i])[1] not in hold:
                hold.append(list(order_picker[i])[1])
    pairs.append(hold)

print(pairs)

print(pairs) 的输出如下：

>>> print(pairs)
[[1, 'a'], [2, 'b', 'k'], [3, 'c'], [4, 'd', 'j', 'k'], [5, 'e'], [6, 'f'], [7, 'g'], [8, 'h'], [9, 'i']]

为了以后将其写入 Excel，输出必须采用这种格式。

我怀疑运行代码所需的时间较长是由于每次需要识别新值时都要检查长度为1592798的整个列表，但我无法创建更快的解决方案。如何减少运行代码所需的时间。

英文:

The code works well on small datasets as illustrated in the example code, but the data set I have to process are two lists picker and order each with a length of 1592798 and 288 and 528510 unique values respectively.
For the sake of the example I have replaced these with two short lists, but the concept is the same. I am wondering if the long time required to run the code is due to the sheer amount of data, or if the code is inefficient at processing the data and can be improved.

The purpose of the code is to group all pickers associated with a unique order into a list(hold) within a list(pairs). The order the elements occur in the pair list must be determined, by the first entry in each element on the list, for instance [1, 'a'] must come before [2, 'b', 'k'], because 1 is a smaller number than 2. Regarding for instance 'b', 'k' in [2, 'b', 'k'], the order of these is determined by which of these occurs first in the list picker. 'b' comes before 'k' because 'b' has a lower index.
The current code looks like this


order  = [ 1,   2,   3,   4,   1,   5,   3,   6,   7,   1,   8,   9,   4,   4,   2,   8,   4,   4,   2 ]
picker = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;a&#39;, &#39;e&#39;, &#39;c&#39;, &#39;f&#39;, &#39;g&#39;, &#39;a&#39;, &#39;h&#39;, &#39;i&#39;, &#39;j&#39;, &#39;k&#39;, &#39;b&#39;, &#39;h&#39;, &#39;j&#39;, &#39;j&#39;, &#39;k&#39;]
pairs = []

order_picker = list(zip(order, picker))

for x in set(order):
    hold = []
    hold.append(x)
    for i in range(len(order_picker)):
        if x == list(order_picker[i])[0]:
            if list(order_picker[i])[1] not in hold:
                hold.append(list(order_picker[i])[1])
    pairs.append(hold)

print(pairs)

The output from the print(pairs)

&gt;&gt;&gt; print(pairs)
[[1, &#39;a&#39;], [2, &#39;b&#39;, &#39;k&#39;], [3, &#39;c&#39;], [4, &#39;d&#39;, &#39;j&#39;, &#39;k&#39;], [5, &#39;e&#39;], [6, &#39;f&#39;], [7, &#39;g&#39;], [8, &#39;h&#39;], [9, &#39;i&#39;]]

The output must be on this format for me to later write it to excel.

I suspect that the long time required to run the code occurs due to checking the entire list of length 1592798 each time a new value must be identified, but I have been unable to create a faster solution. How can I reduce the time required to run the code.

答案1

得分: 2

代码部分不翻译，以下是翻译好的内容：

"Perhaps you can speed up your code by only looping over the elements in picker and order once"

"在我制作的示例中，我正在将这两个列表压缩在一起，并使用由集合组成的默认字典来添加每个元素。最后，将字典转换为所需的输出格式。"

"order = [1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2]"
"picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']"

"pairs = defaultdict(set)"
"for o, p in zip(order, picker):"
" pairs[o].add(p)"
"pairs = [[k, *v] for k, v in pairs.items()]"

"print(pairs)"

英文:

Perhaps you can speed up your code by only looping over the elements in picker and order once

In the example I made, I am zipping the two lists, and using a defaultdict consisting of sets to add each element. Finally, the dictionary is converted to your desired output format

from collections import defaultdict

order = [1,  2,  3,  4,  1,  5,  3,  6,  7,  1,  8,  9,  4,  4,  2,  8,  4,  4, 2]
picker = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;a&#39;, &#39;e&#39;, &#39;c&#39;, &#39;f&#39;, &#39;g&#39;, &#39;a&#39;, &#39;h&#39;, &#39;i&#39;, &#39;j&#39;, &#39;k&#39;, &#39;b&#39;, &#39;h&#39;, &#39;j&#39;, &#39;j&#39;, &#39;k&#39;]

pairs = defaultdict(set)
for o, p in zip(order, picker):
    pairs[o].add(p)
pairs = [[k, *v] for k, v in pairs.items()]

print(pairs)

答案2

得分: 1

def pairs(order, picker):
    d = {o: {} for o in sorted(set(order))}
    for o, p in zip(order, picker):
        d[o] = None
    return [[o, *p] for o, p in d.items()]

order = [1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']

print(pairs(order, picker))

Output:

[[1, 'a'], [2, 'b', 'k'], [3, 'c'], [4, 'd', 'j', 'k'], [5, 'e'], [6, 'f'], [7, 'g'], [8, 'h'], [9, 'i']]

英文:

Fast solution with the desired orders:

def pairs(order, picker):
    d = {o: {} for o in sorted(set(order))}
    for o, p in zip(order, picker):
        d[o] = None
    return [[o, *p] for o, p in d.items()]

order  = [ 1,   2,   3,   4,   1,   5,   3,   6,   7,   1,   8,   9,   4,   4,   2,   8,   4,   4,   2 ]
picker = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;a&#39;, &#39;e&#39;, &#39;c&#39;, &#39;f&#39;, &#39;g&#39;, &#39;a&#39;, &#39;h&#39;, &#39;i&#39;, &#39;j&#39;, &#39;k&#39;, &#39;b&#39;, &#39;h&#39;, &#39;j&#39;, &#39;j&#39;, &#39;k&#39;]

print(pairs(order, picker))

Output (Attempt This Online!):

[[1, &#39;a&#39;], [2, &#39;b&#39;, &#39;k&#39;], [3, &#39;c&#39;], [4, &#39;d&#39;, &#39;j&#39;, &#39;k&#39;], [5, &#39;e&#39;], [6, &#39;f&#39;], [7, &#39;g&#39;], [8, &#39;h&#39;], [9, &#39;i&#39;]]

答案3

得分: 0

因为你在相同的数据上多次迭代：zip、for和for，所以需要很长时间。

尝试通过减少迭代来优化，像这样的代码只需要一个for循环就可以产生相同的输出：

order = [1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']

order_indexes = {}  # 存储订单的索引
pairs = []

for i in range(0, len(order)):
    order_item = order[i]
    picker_item = picker[i]

    if (order_item not in order_indexes):
        order_indexes[order_item] = len(pairs)
        # 将订单的索引存储起来
        pairs.append([order_item])
        # 插入新的订单

    if (picker_item not in pairs[order_indexes[order_item]]):
        pairs[order_indexes[order_item]].append(picker_item)
        # 如果还未包含该拣货员，就将其添加进来

print(pairs)

希望这能帮助你理解代码。

英文:

It takes long because you iterate multiple times on the same data : zip, for and for

Try to optimize by iterating less,

something like this produces the same output with only 1 for loop

order  = [ 1,   2,   3,   4,   1,   5,   3,   6,   7,   1,   8,   9,   4,   4,   2,   8,   4,   4,   2 ]
picker = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;a&#39;, &#39;e&#39;, &#39;c&#39;, &#39;f&#39;, &#39;g&#39;, &#39;a&#39;, &#39;h&#39;, &#39;i&#39;, &#39;j&#39;, &#39;k&#39;, &#39;b&#39;, &#39;h&#39;, &#39;j&#39;, &#39;j&#39;, &#39;k&#39;]


order_indexes = {} # stores indexes of orders
pairs = []

for i in range(0, len(order)):
    order_item = order[i]
    picker_item = picker[i]

    if (order_item not in order_indexes):
        order_indexes[order_item] = len(pairs)
        # the index it will be inserted in
        pairs.append([order_item]) 
        # insertion of new order
  
    if (picker_item not in pairs[order_indexes[order_item]]): 
        pairs[order_indexes[order_item]].append(picker_item)
        # add picker if not already present
        
print(pairs)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何减少运行代码所需的时间以及速度缓慢的原因是什么？

问题

答案1

答案2

答案3

How do I compress repeating tkinter code into a loop so it displays rectangles without having to write out each individual rectangle's coordinates?

使用Selenium点击提交按钮。

更快的方式使用已保存的PyTorch模型（绕过import torch？）

how can i get a random sample from dataframe but have it contain a distribution of a variable? PYTHON

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论