重构一个列表,使其仅以增量为1增加。

huangapple go评论68阅读模式
英文:

Python - restructuring a list to only increase in increments of 1

问题

我有一个数字列表在Python中),其中一些数字是重复的列表看起来像这样

Input
```python
values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]

当一个数字不重复时,它将等于起始数字加上新条目的索引(例如:1833 = 1827 + 6)。

Expected output

[1827, 1827, 1828, 1828, 1827, 1827, 1829]

这样重复的数字不会改变,但当出现一个新数字时,我希望它等于列表中截止到那时的最大数字+1(例如:1829 变成 1827 + 1 = 1828)。

值得注意的是,在这个示例中,如果数字1833再次出现,它需要设置为1829。

到目前为止,这是我的代码:

contact_ids = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
output = []
seen_ids = {}

for id in contact_ids:
    if id not in seen_ids:
        seen_ids[id] = id
        output.append(id)
    else:
        output.append(seen_ids[id])

print("CONTACT_ID")
for id in output:
    print(id)

相应的输出:

CONTACT_ID
1827
1827
1829
1829
1827
1827
1833

<details>
<summary>英文:</summary>

I have a list of numbers (in Python), some of which are repeated. The list looks like this:

Input

values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]


When a number is not repeated, it will be equal to the starting number + the index of the new entry (example: 1833 = 1827 + 6).

Expected output

[1827,1827,1828,1828,1827,1827,1829]


So that repeating numbers do not change, but when a new number comes into play, I want it to be equal to the highest number in the list up until that point +1 (example: 1829 to become 1827 + 1 = 1828).

It would be worth noting that in this example, if the number 1833 were to appear again, it would need to be set to 1829.


So far this is my code: 

contact_ids = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
output = []
seen_ids = {}

for id in contact_ids:
if id not in seen_ids:
seen_ids[id] = id
output.append(id)
else:
output.append(seen_ids[id])

print("CONTACT_ID")
for id in output:
print(id)


Corresponding output:

CONTACT_ID
1827
1827
1829
1829
1827
1827
1833


</details>


# 答案1
**得分**: 2

另一种(更简单和更高效的)自引用defaultdict技巧:

```python
from collections import defaultdict

values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]

d = defaultdict(lambda: len(d) + values[0])

[d[v] for v in values]
# [1827, 1827, 1828, 1828, 1827, 1827, 1829]
英文:

The self-referential defaultdict trick in another (simpler & more efficient) way:

from collections import defaultdict

values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]

d = defaultdict(lambda: len(d) + values[0])

[d[v] for v in values]
# [1827, 1827, 1828, 1828, 1827, 1827, 1829]

答案2

得分: 1

以下是代码部分的翻译:

from collections import defaultdict

values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]

d = defaultdict(lambda: max(list(d.values()) + [min(values) - 1]) + 1)
[d[value] for value in values]
[1827, 1827, 1828, 1828, 1827, 1827, 1829]
import random
from collections import defaultdict

# 省略了一些函数定义

approaches = [dictionary_approach, len_defaultdict, max_defaultdict]
run_performance_comparison(
    approaches, [10_000, 20_000, 30_000, 100_000, 200_000], setup=setup
)

我已经提供了代码的翻译。如果您需要更多帮助,请告诉我。

英文:

This can be done elegantly with a defaultdict and a good lambda:

from collections import defaultdict

values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]

d = defaultdict(lambda: max(list(d.values()) + [min(values) - 1]) + 1)
[d[value] for value in values]
[1827, 1827, 1828, 1828, 1827, 1827, 1829]

This works because we're binding the dictionary itself into the lambda it uses for setting the value of a missing key. We can max those values and figure out the right value.

We need to start the dictionary off with the lowest value in the list (alternatively you can start it off with the first value in the list with values[0])

I profiled the solutions:

import random
from collections import defaultdict


def max_defaultdict(values):
    # @swozny
    d = defaultdict(lambda: max(list(d.values()) + [min(values) - 1]) + 1)
    [d[value] for value in values]


def len_defaultdict(values):
    # @user2390182
    d = defaultdict(lambda: len(d) + values[0])
    [d[v] for v in values]


def dictionary_approach(contact_ids):
    # @John Coleman
    start = contact_ids[0]
    rank = 0
    ranks = {}
    for contact_id in contact_ids:
        if contact_id not in ranks:
            ranks[contact_id] = rank
            rank += 1

    output = [start + ranks[contact_id] for contact_id in contact_ids]


def setup(N):
    values = [1827]
    for _ in range(N - 1):
        next_value = values[-1] + random.randint(-2, 2)
        values.append(max(next_value, 1827))
    return [values]


from performance_measurement import run_performance_comparison

approaches = [dictionary_approach, len_defaultdict, max_defaultdict]
run_performance_comparison(
    approaches, [10_000, 20_000, 30_000, 100_000, 200_000], setup=setup
)

重构一个列表,使其仅以增量为1增加。

The repeated max calls really put this solution far behind the dictionary solution. What's interesting is that the len based defaultdict beats even the dictionary.

Profiling Code

import timeit
from functools import partial

from typing import List, Dict, Callable

from contextlib import contextmanager
import matplotlib.pyplot as plt


@contextmanager
def data_provider(data_size, setup=lambda N: N, teardown=lambda: None):
    data = setup(data_size)
    yield data
    teardown(*data)


def run_performance_comparison(approaches: List[Callable],
                               data_size: List[int],
                               *,
                               setup=lambda N: [N],
                               teardown=lambda *N: None,
                               number_of_repetitions=5,
                               title=&#39;Performance Comparison&#39;,
                               data_name=&#39;N&#39;,
                               yscale=&#39;log&#39;,
                               xscale=&#39;log&#39;):
    approach_times: Dict[Callable, List[float]] = {approach: [] for approach in approaches}
    for N in data_size:
        with data_provider(N, setup, teardown) as data:
            print(f&#39;Running performance comparison for {data_name}={N}&#39;)
            for approach in approaches:
                function = partial(approach, *data)
                approach_time = min(timeit.Timer(function).repeat(repeat=number_of_repetitions, number=1))
                approach_times[approach].append(approach_time)

    for approach in approaches:
        plt.plot(data_size, approach_times[approach], label=approach.__name__)
    plt.yscale(yscale)
    plt.xscale(xscale)

    plt.xlabel(data_name)
    plt.ylabel(&#39;Execution Time (seconds)&#39;)
    plt.title(title)
    plt.legend()
    plt.show()

答案3

得分: 1

将联系人ID的等级定义为其在去除重复项的联系人ID列表中的索引。这些是预期输出中从开头开始的偏移量。可以以多种方式获取它们。以下代码使用字典来实现,然后在列表推导式中使用它来获得最终输出:

contact_ids = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
start = contact_ids[0]
rank = 0
ranks = {}
for contact_id in contact_ids:
  if contact_id not in ranks:
    ranks[contact_id] = rank
    rank += 1

output = [start + ranks[contact_id] for contact_id in contact_ids]
print(output)
#[1827, 1827, 1828, 1828, 1827, 1827, 1829]
英文:

Define the rank of a contact id to be its index in the list of contact ids with duplicates removed. These are the offsets from the start in the intended output. They can be obtained in various ways. The following code does so using a dictionary, which is then used in a list comprehension to get the final output:

contact_ids = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
start = contact_ids[0]
rank = 0
ranks = {}
for contact_id in contact_ids:
  if contact_id not in ranks:
    ranks[contact_id] = rank
    rank += 1

output = [start + ranks[contact_id] for contact_id in contact_ids]
print(output)
#[1827, 1827, 1828, 1828, 1827, 1827, 1829]

答案4

得分: 0

根据非重复数字与它们距离第一个数字相等的说法,您可以确定在哪里应该发生+1增量。然后使用 itertools.accumulate 计算要应用于每个位置(从基数开始)的总增量,并用它替换较大的值:

from itertools import accumulate

values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]

inc = accumulate(p if p == values[0] else p == n
                for p, n in enumerate(values, values[0]))
*output, = map(min, zip(values, inc))

print(output)
[1827, 1827, 1828, 1828, 1827, 1827, 1829]
英文:

Based on the statement that non-repeated numbers can be identified as being equal to their offset from the first one, you could can determine where the +1 increment should occur. Then use itertools.accumulate to compute the total increment to apply to each position (from the base) and replace the larger values with it:

from itertools import accumulate

values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]

inc      = accumulate( p if p == values[0] else p == n
                       for p,n in enumerate(values,values[0]))
*output, = map(min, zip(values,inc)) 

print(output)
[1827, 1827, 1828, 1828, 1827, 1827, 1829]

huangapple
  • 本文由 发表于 2023年7月17日 18:07:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76703395.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定