如何减少运行代码所需的时间以及速度缓慢的原因是什么?

huangapple go评论65阅读模式
英文:

How can I reduce the time needed to run my code and what is the cause of the slow speed?

问题

代码在小数据集上运行良好,如示例代码所示,但我要处理的数据集包括两个列表 picker 和 order,分别具有1592798和288以及528510个唯一值的长度。
出于示例的目的,我已将它们替换为两个较短的列表,但概念是相同的。我想知道运行代码所需的时间是否是由于数据量过大,还是由于代码在处理数据时效率不高,可以改进。

代码的目的是将与唯一订单相关联的所有 picker 分组到一个列表(hold)中,该列表位于一个列表(pairs)内。必须确定成对列表中元素的顺序,由列表中每个元素的第一个条目决定,例如[1, 'a']必须位于[2, 'b', 'k']之前,因为1比2小。关于'b''k'[2, 'b', 'k']中的顺序是由它们在 picker 列表中的出现顺序决定的。'b'之前出现在'k'之前,因为'b'的索引较低。
当前的代码如下:

order  = [ 1,   2,   3,   4,   1,   5,   3,   6,   7,   1,   8,   9,   4,   4,   2,   8,   4,   4,   2 ]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']
pairs = []

order_picker = list(zip(order, picker))

for x in set(order):
    hold = []
    hold.append(x)
    for i in range(len(order_picker)):
        if x == list(order_picker[i])[0]:
            if list(order_picker[i])[1] not in hold:
                hold.append(list(order_picker[i])[1])
    pairs.append(hold)

print(pairs)

print(pairs) 的输出如下:

>>> print(pairs)
[[1, 'a'], [2, 'b', 'k'], [3, 'c'], [4, 'd', 'j', 'k'], [5, 'e'], [6, 'f'], [7, 'g'], [8, 'h'], [9, 'i']]

为了以后将其写入 Excel,输出必须采用这种格式。

我怀疑运行代码所需的时间较长是由于每次需要识别新值时都要检查长度为1592798的整个列表,但我无法创建更快的解决方案。如何减少运行代码所需的时间。

英文:

The code works well on small datasets as illustrated in the example code, but the data set I have to process are two lists picker and order each with a length of 1592798 and 288 and 528510 unique values respectively.
For the sake of the example I have replaced these with two short lists, but the concept is the same. I am wondering if the long time required to run the code is due to the sheer amount of data, or if the code is inefficient at processing the data and can be improved.

The purpose of the code is to group all pickers associated with a unique order into a list(hold) within a list(pairs). The order the elements occur in the pair list must be determined, by the first entry in each element on the list, for instance [1, 'a'] must come before [2, 'b', 'k'], because 1 is a smaller number than 2. Regarding for instance 'b', 'k' in [2, 'b', 'k'], the order of these is determined by which of these occurs first in the list picker. 'b' comes before 'k' because 'b' has a lower index.
The current code looks like this


order  = [ 1,   2,   3,   4,   1,   5,   3,   6,   7,   1,   8,   9,   4,   4,   2,   8,   4,   4,   2 ]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']
pairs = []

order_picker = list(zip(order, picker))

for x in set(order):
    hold = []
    hold.append(x)
    for i in range(len(order_picker)):
        if x == list(order_picker[i])[0]:
            if list(order_picker[i])[1] not in hold:
                hold.append(list(order_picker[i])[1])
    pairs.append(hold)

print(pairs)

The output from the print(pairs)

>>> print(pairs)
[[1, 'a'], [2, 'b', 'k'], [3, 'c'], [4, 'd', 'j', 'k'], [5, 'e'], [6, 'f'], [7, 'g'], [8, 'h'], [9, 'i']]

The output must be on this format for me to later write it to excel.

I suspect that the long time required to run the code occurs due to checking the entire list of length 1592798 each time a new value must be identified, but I have been unable to create a faster solution. How can I reduce the time required to run the code.

答案1

得分: 2

代码部分不翻译,以下是翻译好的内容:

"Perhaps you can speed up your code by only looping over the elements in picker and order once"

"在我制作的示例中,我正在将这两个列表压缩在一起,并使用由集合组成的默认字典来添加每个元素。最后,将字典转换为所需的输出格式。"

"order = [1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2]"
"picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']"

"pairs = defaultdict(set)"
"for o, p in zip(order, picker):"
" pairs[o].add(p)"
"pairs = [[k, *v] for k, v in pairs.items()]"

"print(pairs)"

英文:

Perhaps you can speed up your code by only looping over the elements in picker and order once

In the example I made, I am zipping the two lists, and using a defaultdict consisting of sets to add each element. Finally, the dictionary is converted to your desired output format

from collections import defaultdict

order = [1,  2,  3,  4,  1,  5,  3,  6,  7,  1,  8,  9,  4,  4,  2,  8,  4,  4, 2]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']

pairs = defaultdict(set)
for o, p in zip(order, picker):
    pairs[o].add(p)
pairs = [[k, *v] for k, v in pairs.items()]

print(pairs)

答案2

得分: 1

def pairs(order, picker):
    d = {o: {} for o in sorted(set(order))}
    for o, p in zip(order, picker):
        d[o]

= None return [[o, *p] for o, p in d.items()] order = [1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2] picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k'] print(pairs(order, picker))

Output:

[[1, 'a'], [2, 'b', 'k'], [3, 'c'], [4, 'd', 'j', 'k'], [5, 'e'], [6, 'f'], [7, 'g'], [8, 'h'], [9, 'i']]
英文:

Fast solution with the desired orders:

def pairs(order, picker):
    d = {o: {} for o in sorted(set(order))}
    for o, p in zip(order, picker):
        d[o]

= None return [[o, *p] for o, p in d.items()] order = [ 1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2 ] picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k'] print(pairs(order, picker))

Output (Attempt This Online!):

[[1, 'a'], [2, 'b', 'k'], [3, 'c'], [4, 'd', 'j', 'k'], [5, 'e'], [6, 'f'], [7, 'g'], [8, 'h'], [9, 'i']]

答案3

得分: 0

因为你在相同的数据上多次迭代:zipforfor,所以需要很长时间。

尝试通过减少迭代来优化,像这样的代码只需要一个for循环就可以产生相同的输出:

order = [1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']

order_indexes = {}  # 存储订单的索引
pairs = []

for i in range(0, len(order)):
    order_item = order[i]
    picker_item = picker[i]

    if (order_item not in order_indexes):
        order_indexes[order_item] = len(pairs)
        # 将订单的索引存储起来
        pairs.append([order_item])
        # 插入新的订单

    if (picker_item not in pairs[order_indexes[order_item]]):
        pairs[order_indexes[order_item]].append(picker_item)
        # 如果还未包含该拣货员,就将其添加进来

print(pairs)

希望这能帮助你理解代码。

英文:

It takes long because you iterate multiple times on the same data : zip, for and for

Try to optimize by iterating less,

something like this produces the same output with only 1 for loop

order  = [ 1,   2,   3,   4,   1,   5,   3,   6,   7,   1,   8,   9,   4,   4,   2,   8,   4,   4,   2 ]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']


order_indexes = {} # stores indexes of orders
pairs = []

for i in range(0, len(order)):
    order_item = order[i]
    picker_item = picker[i]

    if (order_item not in order_indexes):
        order_indexes[order_item] = len(pairs)
        # the index it will be inserted in
        pairs.append([order_item]) 
        # insertion of new order
  
    if (picker_item not in pairs[order_indexes[order_item]]): 
        pairs[order_indexes[order_item]].append(picker_item)
        # add picker if not already present
        
print(pairs)

huangapple
  • 本文由 发表于 2023年3月31日 20:10:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75898386.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定