2023年7月17日 18:07:06go评论96阅读模式

英文:

Python - restructuring a list to only increase in increments of 1

问题

我有一个数字列表（在Python中），其中一些数字是重复的。列表看起来像这样：
Input
```python
values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]

当一个数字不重复时，它将等于起始数字加上新条目的索引（例如：1833 = 1827 + 6）。

Expected output

[1827, 1827, 1828, 1828, 1827, 1827, 1829]

这样重复的数字不会改变，但当出现一个新数字时，我希望它等于列表中截止到那时的最大数字+1（例如：1829 变成 1827 + 1 = 1828）。

值得注意的是，在这个示例中，如果数字1833再次出现，它需要设置为1829。

到目前为止，这是我的代码：

contact_ids = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
output = []
seen_ids = {}
for id in contact_ids:
    if id not in seen_ids:
        seen_ids[id] = id
        output.append(id)
    else:
        output.append(seen_ids[id])
print("CONTACT_ID")
for id in output:
    print(id)

相应的输出：

CONTACT_ID
1827
1827
1829
1829
1827
1827
1833


<details>
<summary>英文:</summary>
I have a list of numbers (in Python), some of which are repeated. The list looks like this:
Input

values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]


When a number is not repeated, it will be equal to the starting number + the index of the new entry (example: 1833 = 1827 + 6).
Expected output

[1827,1827,1828,1828,1827,1827,1829]


So that repeating numbers do not change, but when a new number comes into play, I want it to be equal to the highest number in the list up until that point +1 (example: 1829 to become 1827 + 1 = 1828).
It would be worth noting that in this example, if the number 1833 were to appear again, it would need to be set to 1829.
So far this is my code:

contact_ids = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
output = []
seen_ids = {}

for id in contact_ids:
if id not in seen_ids:
seen_ids[id] = id
output.append(id)
else:
output.append(seen_ids[id])

print("CONTACT_ID")
for id in output:
print(id)


Corresponding output:

CONTACT_ID
1827
1827
1829
1829
1827
1827
1833


</details>
# 答案1
**得分**: 2
另一种（更简单和更高效的）自引用defaultdict技巧：
```python
from collections import defaultdict
values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
d = defaultdict(lambda: len(d) + values[0])
[d[v] for v in values]
# [1827, 1827, 1828, 1828, 1827, 1827, 1829]

英文:

The self-referential defaultdict trick in another (simpler & more efficient) way:

from collections import defaultdict
values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
d = defaultdict(lambda: len(d) + values[0])
[d[v] for v in values]
# [1827, 1827, 1828, 1828, 1827, 1827, 1829]

答案2

得分: 1

以下是代码部分的翻译：

from collections import defaultdict
values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
d = defaultdict(lambda: max(list(d.values()) + [min(values) - 1]) + 1)
[d[value] for value in values]

[1827, 1827, 1828, 1828, 1827, 1827, 1829]

import random
from collections import defaultdict
# 省略了一些函数定义
approaches = [dictionary_approach, len_defaultdict, max_defaultdict]
run_performance_comparison(
    approaches, [10_000, 20_000, 30_000, 100_000, 200_000], setup=setup
)

我已经提供了代码的翻译。如果您需要更多帮助，请告诉我。

英文:

This can be done elegantly with a defaultdict and a good lambda:

from collections import defaultdict
values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
d = defaultdict(lambda: max(list(d.values()) + [min(values) - 1]) + 1)
[d[value] for value in values]

[1827, 1827, 1828, 1828, 1827, 1827, 1829]

This works because we're binding the dictionary itself into the lambda it uses for setting the value of a missing key. We can max those values and figure out the right value.

We need to start the dictionary off with the lowest value in the list (alternatively you can start it off with the first value in the list with values[0])

I profiled the solutions:

import random
from collections import defaultdict
def max_defaultdict(values):
    # @swozny
    d = defaultdict(lambda: max(list(d.values()) + [min(values) - 1]) + 1)
    [d[value] for value in values]
def len_defaultdict(values):
    # @user2390182
    d = defaultdict(lambda: len(d) + values[0])
    [d[v] for v in values]
def dictionary_approach(contact_ids):
    # @John Coleman
    start = contact_ids[0]
    rank = 0
    ranks = {}
    for contact_id in contact_ids:
        if contact_id not in ranks:
            ranks[contact_id] = rank
            rank += 1
    output = [start + ranks[contact_id] for contact_id in contact_ids]
def setup(N):
    values = [1827]
    for _ in range(N - 1):
        next_value = values[-1] + random.randint(-2, 2)
        values.append(max(next_value, 1827))
    return [values]
from performance_measurement import run_performance_comparison
approaches = [dictionary_approach, len_defaultdict, max_defaultdict]
run_performance_comparison(
    approaches, [10_000, 20_000, 30_000, 100_000, 200_000], setup=setup
)

The repeated max calls really put this solution far behind the dictionary solution. What's interesting is that the len based defaultdict beats even the dictionary.

Profiling Code

import timeit
from functools import partial
from typing import List, Dict, Callable
from contextlib import contextmanager
import matplotlib.pyplot as plt
@contextmanager
def data_provider(data_size, setup=lambda N: N, teardown=lambda: None):
    data = setup(data_size)
    yield data
    teardown(*data)
def run_performance_comparison(approaches: List[Callable],
                               data_size: List[int],
                               *,
                               setup=lambda N: [N],
                               teardown=lambda *N: None,
                               number_of_repetitions=5,
                               title=&#39;Performance Comparison&#39;,
                               data_name=&#39;N&#39;,
                               yscale=&#39;log&#39;,
                               xscale=&#39;log&#39;):
    approach_times: Dict[Callable, List[float]] = {approach: [] for approach in approaches}
    for N in data_size:
        with data_provider(N, setup, teardown) as data:
            print(f&#39;Running performance comparison for {data_name}={N}&#39;)
            for approach in approaches:
                function = partial(approach, *data)
                approach_time = min(timeit.Timer(function).repeat(repeat=number_of_repetitions, number=1))
                approach_times[approach].append(approach_time)
    for approach in approaches:
        plt.plot(data_size, approach_times[approach], label=approach.__name__)
    plt.yscale(yscale)
    plt.xscale(xscale)
    plt.xlabel(data_name)
    plt.ylabel(&#39;Execution Time (seconds)&#39;)
    plt.title(title)
    plt.legend()
    plt.show()

答案3

得分: 1

将联系人ID的等级定义为其在去除重复项的联系人ID列表中的索引。这些是预期输出中从开头开始的偏移量。可以以多种方式获取它们。以下代码使用字典来实现，然后在列表推导式中使用它来获得最终输出：

contact_ids = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
start = contact_ids[0]
rank = 0
ranks = {}
for contact_id in contact_ids:
  if contact_id not in ranks:
    ranks[contact_id] = rank
    rank += 1
output = [start + ranks[contact_id] for contact_id in contact_ids]
print(output)
#[1827, 1827, 1828, 1828, 1827, 1827, 1829]

英文:

Define the rank of a contact id to be its index in the list of contact ids with duplicates removed. These are the offsets from the start in the intended output. They can be obtained in various ways. The following code does so using a dictionary, which is then used in a list comprehension to get the final output:

contact_ids = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
start = contact_ids[0]
rank = 0
ranks = {}
for contact_id in contact_ids:
  if contact_id not in ranks:
    ranks[contact_id] = rank
    rank += 1
output = [start + ranks[contact_id] for contact_id in contact_ids]
print(output)
#[1827, 1827, 1828, 1828, 1827, 1827, 1829]

答案4

得分: 0

根据非重复数字与它们距离第一个数字相等的说法，您可以确定在哪里应该发生+1增量。然后使用 itertools.accumulate 计算要应用于每个位置（从基数开始）的总增量，并用它替换较大的值：

from itertools import accumulate
values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
inc = accumulate(p if p == values[0] else p == n
                for p, n in enumerate(values, values[0]))
*output, = map(min, zip(values, inc))
print(output)
[1827, 1827, 1828, 1828, 1827, 1827, 1829]

英文:

Based on the statement that non-repeated numbers can be identified as being equal to their offset from the first one, you could can determine where the +1 increment should occur. Then use itertools.accumulate to compute the total increment to apply to each position (from the base) and replace the larger values with it:

from itertools import accumulate
values = [1827, 1827, 1829, 1829, 1827, 1827, 1833]
inc      = accumulate( p if p == values[0] else p == n
                       for p,n in enumerate(values,values[0]))
*output, = map(min, zip(values,inc)) 
print(output)
[1827, 1827, 1828, 1828, 1827, 1827, 1829]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

重构一个列表，使其仅以增量为1增加。

问题

答案2

答案3

答案4

你可以使用 Polars Python API 查询 Parquet 文件吗？

“decorator design” 和 “template design” 在Python中有什么区别？

Spliting a text csv file into another csv with the variables the text represents with Python

数据摄入 – 类型错误：无法解包非可迭代的 NoneType 对象

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。