时间复杂度在使用Python字符串时

huangapple go评论57阅读模式
英文:

time complexity when working with python string

问题

I'm analysing the complexity of string concatination for different methods and what I notice.

%%time
sent = ""
word = "hello "
num = 100000
for _ in range(num):
    sent += word

this method takes 20ms

%%time
sent_1 = "".join([word] * num)

the second method takes 4 ms and

%%time
sent_2 = ""
for _ in range(num//5):
    sent_2 = sent_2 + word + word + word + word + word

the third method takes 208 ms

sent==sent_1==sent_2

gives me True

can someone explain why I get such results

英文:

I'm analysing the complexity of string concatination for different methods and what I notice.

%%time
sent = ""
word = "hello "
num = 100000
for _ in range(num):
    sent += word

this method takes 20ms

%%time
sent_1 = "".join([word] * num)

the second method takes 4 ms and

%%time
sent_2 = ""
for _ in range(num//5):
    sent_2 = sent_2 + word + word + word + word + word

the third method takes 208 ms

sent==sent_1==sent_2
gives me True

can someone explain why I get such results

答案1

得分: 1

sent2 + word + word + word + word + word 会创建多个临时的 str 对象,大致等同于:

t1 = sent2 + word
t2 = t1 + word
t3 = t2 + word
t4 = t3 + word
sent_2 = t4 + word

创建所有这些对象(并将一个临时值的不断增长的值复制到另一个对象中)会导致二次运行时间。

另一方面,''.join([word] * num) 会将所有字符串一次性连接在单个列表中,因此它可以分配一个结果字符串并在线性时间内复制子字符串。

第一个示例:

for _ in range(num):
    sent += word

应该 等同于重复的连接。但是,特别是 CPython 通过识别到在计算 sent + word 时没有人能够访问 sent 的值,因此它会“欺骗”并在原地更新 sent,而不会为 sent 分配新的字符串。这意味着你仍然拥有一个线性时间操作,类似于 ''.join,但由于在 Python 中迭代字符串而不是在解释器的 C 级别上进行,因此存在额外的开销。

英文:

sent2 + word + word + word + word + word creates multiple temporary str objects, roughly equivalent to

t1 = sent2 + word
t2 = t1 + word
t3 = t2 + word
t4 = t3 + word
sent_2= t4 + word

Creating all these objects (and copying the ever-growing values of one temporary value into another) results in a quadratic runtime.

''.join([word] * num), on the other hand, gets all the strings to concatenate at once in a single list, so it can allocate one result string and copy the substring in place in linear time.

The first example,

for _ in range(num):
    sent += word

should be equivalent to repeated concatenation. But, CPython in particular optimizes this by recognizing that nobody can access the value of sent while sent + word is being computed, so it "cheats" and updates sent in place, with no new string being allocated to assign back to sent. This means you still have a linear-time operation, like ''.join, but with extra overhead due to iterating over the string in Python rather than at the C level of the interpreter.

huangapple
  • 本文由 发表于 2023年3月7日 03:59:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75655312.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定