英文:
How can I reduce the time needed to run my code and what is the cause of the slow speed?
问题
代码在小数据集上运行良好,如示例代码所示,但我要处理的数据集包括两个列表 picker 和 order,分别具有1592798和288以及528510个唯一值的长度。
出于示例的目的,我已将它们替换为两个较短的列表,但概念是相同的。我想知道运行代码所需的时间是否是由于数据量过大,还是由于代码在处理数据时效率不高,可以改进。
代码的目的是将与唯一订单相关联的所有 picker 分组到一个列表(hold)中,该列表位于一个列表(pairs)内。必须确定成对列表中元素的顺序,由列表中每个元素的第一个条目决定,例如[1, 'a']
必须位于[2, 'b', 'k']
之前,因为1比2小。关于'b'
,'k'
在[2, 'b', 'k']
中的顺序是由它们在 picker 列表中的出现顺序决定的。'b'
之前出现在'k'
之前,因为'b'
的索引较低。
当前的代码如下:
order = [ 1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2 ]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']
pairs = []
order_picker = list(zip(order, picker))
for x in set(order):
hold = []
hold.append(x)
for i in range(len(order_picker)):
if x == list(order_picker[i])[0]:
if list(order_picker[i])[1] not in hold:
hold.append(list(order_picker[i])[1])
pairs.append(hold)
print(pairs)
print(pairs) 的输出如下:
>>> print(pairs)
[[1, 'a'], [2, 'b', 'k'], [3, 'c'], [4, 'd', 'j', 'k'], [5, 'e'], [6, 'f'], [7, 'g'], [8, 'h'], [9, 'i']]
为了以后将其写入 Excel,输出必须采用这种格式。
我怀疑运行代码所需的时间较长是由于每次需要识别新值时都要检查长度为1592798的整个列表,但我无法创建更快的解决方案。如何减少运行代码所需的时间。
英文:
The code works well on small datasets as illustrated in the example code, but the data set I have to process are two lists picker and order each with a length of 1592798 and 288 and 528510 unique values respectively.
For the sake of the example I have replaced these with two short lists, but the concept is the same. I am wondering if the long time required to run the code is due to the sheer amount of data, or if the code is inefficient at processing the data and can be improved.
The purpose of the code is to group all pickers associated with a unique order into a list(hold) within a list(pairs). The order the elements occur in the pair list must be determined, by the first entry in each element on the list, for instance [1, 'a']
must come before [2, 'b', 'k']
, because 1 is a smaller number than 2. Regarding for instance 'b'
, 'k'
in [2, 'b', 'k']
, the order of these is determined by which of these occurs first in the list picker. 'b'
comes before 'k'
because 'b'
has a lower index.
The current code looks like this
order = [ 1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2 ]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']
pairs = []
order_picker = list(zip(order, picker))
for x in set(order):
hold = []
hold.append(x)
for i in range(len(order_picker)):
if x == list(order_picker[i])[0]:
if list(order_picker[i])[1] not in hold:
hold.append(list(order_picker[i])[1])
pairs.append(hold)
print(pairs)
The output from the print(pairs)
>>> print(pairs)
[[1, 'a'], [2, 'b', 'k'], [3, 'c'], [4, 'd', 'j', 'k'], [5, 'e'], [6, 'f'], [7, 'g'], [8, 'h'], [9, 'i']]
The output must be on this format for me to later write it to excel.
I suspect that the long time required to run the code occurs due to checking the entire list of length 1592798 each time a new value must be identified, but I have been unable to create a faster solution. How can I reduce the time required to run the code.
答案1
得分: 2
代码部分不翻译,以下是翻译好的内容:
"Perhaps you can speed up your code by only looping over the elements in picker
and order
once"
"在我制作的示例中,我正在将这两个列表压缩在一起,并使用由集合组成的默认字典来添加每个元素。最后,将字典转换为所需的输出格式。"
"order = [1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2]"
"picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']"
"pairs = defaultdict(set)"
"for o, p in zip(order, picker):"
" pairs[o].add(p)"
"pairs = [[k, *v] for k, v in pairs.items()]"
"print(pairs)"
英文:
Perhaps you can speed up your code by only looping over the elements in picker
and order
once
In the example I made, I am zipping the two lists, and using a defaultdict consisting of sets to add each element. Finally, the dictionary is converted to your desired output format
from collections import defaultdict
order = [1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']
pairs = defaultdict(set)
for o, p in zip(order, picker):
pairs[o].add(p)
pairs = [[k, *v] for k, v in pairs.items()]
print(pairs)
答案2
得分: 1
def pairs(order, picker):
d = {o: {} for o in sorted(set(order))}
for o, p in zip(order, picker):
d[o] = None
return [[o, *p] for o, p in d.items()]
order = [1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']
print(pairs(order, picker))
Output:
[[1, 'a'], [2, 'b', 'k'], [3, 'c'], [4, 'd', 'j', 'k'], [5, 'e'], [6, 'f'], [7, 'g'], [8, 'h'], [9, 'i']]
英文:
Fast solution with the desired orders:
def pairs(order, picker):
d = {o: {} for o in sorted(set(order))}
for o, p in zip(order, picker):
d[o] = None
return [[o, *p] for o, p in d.items()]
order = [ 1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2 ]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']
print(pairs(order, picker))
Output (Attempt This Online!):
[[1, 'a'], [2, 'b', 'k'], [3, 'c'], [4, 'd', 'j', 'k'], [5, 'e'], [6, 'f'], [7, 'g'], [8, 'h'], [9, 'i']]
答案3
得分: 0
因为你在相同的数据上多次迭代:zip
、for
和for
,所以需要很长时间。
尝试通过减少迭代来优化,像这样的代码只需要一个for
循环就可以产生相同的输出:
order = [1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']
order_indexes = {} # 存储订单的索引
pairs = []
for i in range(0, len(order)):
order_item = order[i]
picker_item = picker[i]
if (order_item not in order_indexes):
order_indexes[order_item] = len(pairs)
# 将订单的索引存储起来
pairs.append([order_item])
# 插入新的订单
if (picker_item not in pairs[order_indexes[order_item]]):
pairs[order_indexes[order_item]].append(picker_item)
# 如果还未包含该拣货员,就将其添加进来
print(pairs)
希望这能帮助你理解代码。
英文:
It takes long because you iterate multiple times on the same data : zip
, for
and for
Try to optimize by iterating less,
something like this produces the same output with only 1 for
loop
order = [ 1, 2, 3, 4, 1, 5, 3, 6, 7, 1, 8, 9, 4, 4, 2, 8, 4, 4, 2 ]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']
order_indexes = {} # stores indexes of orders
pairs = []
for i in range(0, len(order)):
order_item = order[i]
picker_item = picker[i]
if (order_item not in order_indexes):
order_indexes[order_item] = len(pairs)
# the index it will be inserted in
pairs.append([order_item])
# insertion of new order
if (picker_item not in pairs[order_indexes[order_item]]):
pairs[order_indexes[order_item]].append(picker_item)
# add picker if not already present
print(pairs)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论