如何实现 List.RemoveAll 方法的专用重载,其中谓词包含索引参数?

huangapple go评论84阅读模式
英文:

How to implement a specialized overload of the List.RemoveAll method, with an index parameter in the predicate?

问题

The List<T>.RemoveAll 方法是一个非常有用的方法,允许从列表中高效地移除多个项。不幸的是,在某些情况下,我需要一些额外的功能,该方法不具备,并且文档也没有提供某些保证。此外,如果 match 谓词失败,它的行为也值得怀疑,这让我感到焦虑。因此,在这个问题中,我正在请求一个实现,以扩展方法的形式,具有以下功能和特性:

  1. 它接受一个 Func<T, int, bool> 委托,而不是 Predicate<T>,其中 intT 项的从零开始的索引。
  2. 它保证谓词将按严格升序的方式确切地调用每个项一次。
  3. 如果谓词为某些项返回 true 然后失败,那么在异常传播之前,已选定要移除的项将从列表中移除。

这是我正在尝试实现的扩展方法的签名:

public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate);

它返回被移除的元素数。

我尝试使用现有实现作为起点来实现它,但它具有一些性能优化,使其相当复杂,并且注入期望的“异常”行为并不明显。我希望实现一个简单且相对高效的方法。在实现中使用 LINQ 不可取,因为它会导致我希望避免的内存分配。


上下文: 我应该演示内置的 List<T>.RemoveAll 方法的行为,并解释为什么我不喜欢它。如果谓词在列表中间的某个项目失败,已经选定要移除的项目要么不会被移除,要么会被其他元素的重复项替换。在所有情况下,列表保留其原始大小。以下是一个最小的演示:

List<int> list = new(Enumerable.Range(1, 15));
Console.WriteLine($"Before RemoveAll: [{String.Join(", ", list)}]");
try
{
    list.RemoveAll(item =>
    {
        if (item == 10) throw new Exception();
        bool removeIt = item % 2 == 1;
        if (removeIt) Console.WriteLine($"Removing #{item}");
        return removeIt;
    });
}
catch (Exception ex) { Console.WriteLine(ex); }
finally
{
    Console.WriteLine($"After RemoveAll: [{String.Join(", ", list)}]");
}

列表有15个数字,意图是从列表中删除奇数数字。谓词失败了第10个数字。

输出:

Before RemoveAll: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
Removing #1
Removing #3
Removing #5
Removing #7
Removing #9
System.Exception: Exception of type 'System.Exception' was thrown.
   at Program.<>c.<Main>b__0_0(Int32 item)
   at System.Collections.Generic.List`1.RemoveAll(Predicate`1 match)
   at Program.Main()
After RemoveAll: [2, 4, 6, 8, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

正如您所看到的,数字1和3已被删除,数字5、7和9仍然存在,数字6和8已被复制(每个元素有两个重复项)。相反,我期望看到的输出是:

After RemoveAll: [2, 4, 6, 8, 10, 11, 12, 13, 14, 15]

这将是一个可以依赖的合理和可预测的行为。它将危险水平保持在可管理的水平。我不会冒着在虚拟购物车中复制项目或从选择中打印某些 PDF 文档的风险。现有的行为将我的舒适水平拉得有点太远。

我已经向 Microsoft 报告了这个行为,但我得到的反馈是,在失败的情况下,结果是未定义的。从他们的角度来看,两者(实际和预期的输出)没有任何区别。两者都同样损坏,因为它们都代表了既不是原始状态也不是成功执行后的最终/正确状态的状态。因此,他们认为没有需要修复的错误,也没有理由记录这种行为。

英文:

The List<T>.RemoveAll is a quite useful method, that allows to remove efficiently multiple items from a list. Unfortunately in some scenarios I needed some extra features that the method doesn't have, and some guarantees that the documentation doesn't provide. It also has a questionable behavior in case the match predicate fails, that causes me anxiety. So in this question I am asking for an implementation of the same method, in the form of an extension method, with these features and characteristics:

  1. Instead of a Predicate<T> it accepts a Func<T, int, bool> delegate, where the int is the zero-based index of the T item.
  2. It guarantees that the predicate will be invoked exactly once for each item, in a stricly ascending order.
  3. In case the predicate returns true for some items and then fails for another item, the items that have been elected for removal are removed from the list before the propagation of the exception.

Here is the signature of the extension method that I am trying to implement:

public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate);

It returns the number of elements that were removed.

I attempted to implement it using as starting point the existing implementation, but it has some performance optimizations that make it quite complex, and injecting the desirable "exceptional" behavior is not obvious. I am interested for an implementation that is simple and reasonably efficient. Using LINQ in the implementation is not desirable, because it implies memory allocations that I would like to avoid.


Context: I should demonstrate the behavior of the built-in List<T>.RemoveAll method, and explain why I don't like it. In case the match predicate fails for an item in the middle of the list, the items that have already been elected for removal are either not removed, or they are replaced with duplicates of other elements. In all cases the list retains its original size. Here is a minimal demo:

List<int> list = new(Enumerable.Range(1, 15));
Console.WriteLine($"Before RemoveAll: [{String.Join(", ", list)}]");
try
{
    list.RemoveAll(item =>
    {
        if (item == 10) throw new Exception();
        bool removeIt = item % 2 == 1;
        if (removeIt) Console.WriteLine($"Removing #{item}");
        return removeIt;
    });
}
catch (Exception ex) { Console.WriteLine(ex); }
finally
{
    Console.WriteLine($"After RemoveAll: [{String.Join(", ", list)}]");
}

The list has 15 numbers, and the intention is to remove the odd numbers from the list. The predicate fails for the 10th number.

Output:

Before RemoveAll: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
Removing #1
Removing #3
Removing #5
Removing #7
Removing #9
System.Exception: Exception of type 'System.Exception' was thrown.
   at Program.<>c.<Main>b__0_0(Int32 item)
   at System.Collections.Generic.List`1.RemoveAll(Predicate`1 match)
   at Program.Main()
After RemoveAll: [2, 4, 6, 8, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

Online demo.

As you can see the numbers 1 and 3 have been removed, the 5, 7 and 9 are still there, and the numbers 6 and 8 have been duplicated (there are two occurrences of each). On the contrary the output that I expected to see is:

After RemoveAll: [2, 4, 6, 8, 10, 11, 12, 13, 14, 15]

This would be a reasonable and predictable behavior I could count on. It keeps the levels of danger in a manageable level. I am not risking, for example, duplicating items in a virtual shopping cart, or printing twice some PDF documents from a selection. The existing behavior stretches a bit too much my comfort levels.

I have reported this behavior to Microsoft, and the feedback that I've got is that in case of failure the outcome is undefined. From their point of view there is no difference between the two above outputs (the actual and the expected). Both are equally corrupted, because both represent a state that is neither the original nor the final/correct state after a successful execution. So they don't think that there is any bug that needs to be fixed, and doing changes that could potentially affect negatively the performance of successful executions is not justified. They also believe that the existing behavior is not surprising or unexpected, so there is no reason to document it.

5: https://github.com/dotnet/runtime/issues/66255 "Not documented that the List<T>.RemoveAll method can corrupt the list"

答案1

得分: 1

这个解决方案的基本思想是将要移除的项目的选择与实际移除操作分离开来。

这有以下优点

  • 如果在选择过程中出现异常,列表将保持不变。
  • 移除过程只有在灾难性情况下才会失败(如OutOfMemoryException等)。

当然,也有一些缺点

  • 需要额外的内存来存储中间选择结果。
  • 一些优化可能不会那么有效。

由于提到的优化,我选择基于范围而不是单独的索引来生成选择结果,这样我们可以使用List.RemoveRange,它比单独的RemoveAt调用更有效(假设实际上存在具有多个元素的范围)。

public static List<(int start, int count)> GetIndexRanges<T>(this List<T> list, 
    Func<T, int, bool> predicate)
{
    var result = new List<(int start, int count)>();
    int start = -1;
    for (var i = 0; i < list.Count; i++)
    {
        bool toBeRemoved = predicate(list[i], i);
        if (toBeRemoved)
        {
            if (start < 0)
                start = i; // 新范围开始
        }
        else if (start >= 0)
        {
            // 范围结束
            result.Add((start, i - start));
            start = -1;
        }
    }
    if (start >= 0)
    {
        // 在末尾的孤立范围
        result.Add((start, list.Count - start));
    }
    return result;
}

public static int RemoveIndexRanges<T>(this List<T> list, 
    List<(int start, int count)> ranges)
{
    var removed = 0;
    foreach (var range in ranges)
    {
        // "- removed" 用于考虑删除操作会移动索引的情况。
        list.RemoveRange(range.start - removed, range.count);
        removed += range.count;
    }
    return removed;
}

用法

var ranges = list.GetIndexRanges((item, index) =>
    {
        // 如果 (item == 10) 抛出异常
        return item % 2 == 1;
    });
// 详见备注2下方
list.RemoveIndexRanges(ranges);

备注1:目前,如果在谓词中发生异常,它将在选择过程中传播,不会对集合进行任何更改。为了让调用者更多地控制这一点,可以扩展GetIndexRanges以继续返回到目前为止收集的所有内容,并且还将任何异常作为out参数返回:

public static List<(int start, int count)> GetIndexRanges<T>(this List<T> list, 
    Func<T, int, bool> predicate, out Exception exception)
{
    var result = new List<(int start, int count)>();
    int start = -1;
    exception = null;
    for (var i = 0; i < list.Count; i++)
    {
        bool toBeRemoved = false;
        try 
        { 
            toBeRemoved = predicate(list[i], i); 
        }
        catch (Exception e) 
        { 
            exception = e;
            break; // 忽略此行以继续选择过程
        }
        if (toBeRemoved)
        {
            if (start < 0)
                start = i; // 新范围开始
        }
        else if (start >= 0)
        {
            // 范围结束
            result.Add((start, i - start));
            start = -1;
        }
    }
    if (start >= 0)
    {
        // 在末尾的孤立范围
        result.Add((start, list.Count - start));
    }
    return result;
}

然后,在发生异常的情况下,即使在异常发生后也会删除到目前为止收集的范围,以满足要求 #3,并在异常抛出后再抛出异常:

var ranges = list.GetIndexRanges((item, index) =>
    {
        if (item == 10) throw new Exception();
        return item % 2 == 1;
    }, out var exception);

// 在异常发生的情况下,删除到目前为止收集的范围
list.RemoveIndexRanges(ranges);

// 然后在后续抛出异常
if (exception != null) 
    ExceptionDispatchInfo.Capture(exception).Throw();

备注2:由于现在是一个两步的过程,如果在调用之间更改了列表,它将失败。

英文:

This solution is based on the idea to separate the selection of the items to be removed from the removal itself.

This has the following advantages:

  • If during the selection process, an exception occurs, the list will be left untouched
  • The removal process can only fail in catastrophic cases (OutOfMemoryException etc.)

But of course also some disadantages:

  • it requires extra memory to store the intermediate selection result
  • some optimizations might not be as effective

Because of the mentioned optimizations, I chose to base the selection result on ranges instead of individual indexes, so we can use List.RemoveRange which if more effective than individual RemoveAt calls (assumed that there are in fact ranges with more than one element).

public static List&lt;(int start, int count)&gt; GetIndexRanges&lt;T&gt;(this List&lt;T&gt; list, 
    Func&lt;T, int, bool&gt; predicate)
{
	var result = new List&lt;(int start, int count)&gt;();
    int start = -1;
	for (var i = 0; i &lt; list.Count; i++)
	{
        // see note 1 below
        bool toBeRemoved = predicate(list[i], i);
	    if (toBeRemoved)
		{
		    if (start &lt; 0)
			    start = i; // new range starts
		}
		else if (start &gt;= 0)
		{
            // range finished
		    result.Add((start, i - start));
			start = -1;
		}
	}
    if (start &gt;= 0)
    {
        // orphan range at the end
	    result.Add((start, list.Count - start));
	}
	return result;
}

public static int RemoveIndexRanges&lt;T&gt;(this List&lt;T&gt; list, 
    List&lt;(int start, int count)&gt; ranges)
{
    var removed = 0;
	foreach (var range in ranges)
	{
        // the &quot;- removed&quot; is there to take into account 
        // that deletion moves the indexes.
		list.RemoveRange(range.start - removed, range.count);
		removed += range.count;
	}
    return removed;
}

Usage:

var ranges = list.GetIndexRanges((item, index) =&gt;
    {
        //if (item == 10) throw new Exception();
        return item % 2 == 1;
    });
// See note 2 below
list.RemoveIndexRanges(ranges);

Note 1: As is, an exception in the predicate would just be propagated during the selection process, with no change to the ecollection. To give the caller more control over this, the following could be done: extend GetIndexRanges to still return everything collected so far, and in addition also return any exception as out parameter:

public static List&lt;(int start, int count)&gt; GetIndexRanges&lt;T&gt;(this List&lt;T&gt; list, 
    Func&lt;T, int, bool&gt; predicate, out Exception exception)
{
	var result = new List&lt;(int start, int count)&gt;();
    int start = -1;
	for (var i = 0; i &lt; list.Count; i++)
	{
        bool toBeRemoved = false;
	    try 
	    { 
        	toBeRemoved = predicate(list[i], i); 
	    }
	    catch (Exception e) 
	    { 
        	exception = e;
	        break; // omit this line to continue with the selection process
	    }
	    if (toBeRemoved)
		{
		    if (start &lt; 0)
			    start = i; // new range starts
		}
		else if (start &gt;= 0)
		{
            // range finished
		    result.Add((start, i - start));
			start = -1;
		}
	}
    if (start &gt;= 0)
    {
        // orphan range at the end
	    result.Add((start, list.Count - start));
	}
	return result;
}

var ranges = list.GetIndexRanges((item, index) =&gt;
    {
        if (item == 10) throw new Exception();
        return item % 2 == 1;
    }, out var exception);

// to fulfil requirement #3, we remove the ranges collected so far
// even in case of an exception
list.RemoveIndexRanges(ranges);

// and then throw the exception afterwards
if (exception != null) 
    ExceptionDispatchInfo.Capture(exception).Throw();

Note 2: As this is now a two-step process, it will fail if the list changes between the calls.

答案2

得分: 1

以下是您要翻译的内容:

我认为我已经成功实现了满足所有三个要求的代码:

///

/// 删除与指定谓词定义的条件匹配的所有元素。如果谓词失败,列表的完整性将得到保留。
///

public static int RemoveAll(this List list, Func<T, int, bool> predicate)
{
ArgumentNullException.ThrowIfNull(list);
ArgumentNullException.ThrowIfNull(predicate);

Span<T> span = CollectionsMarshal.AsSpan(list);
int i = 0, j = 0;
try
{
    for (; i < span.Length; i++)
    {
        if (predicate(span[i], i)) continue;
        if (j < i) span[j] = span[i];
        j++;
    }
}
finally
{
    if (j < i)
    {
        for (; i < span.Length; i++, j++)
            span[j] = span[i];
        list.RemoveRange(j, span.Length - j);
    }
}
return i - j;

}

为了更好的性能,它使用 CollectionsMarshal.AsSpan 方法(.NET 5)从列表中获取了一个 Span<T>。该算法通过使用列表的索引器而不是 span,并将 span.Length 替换为 list.Count 来实现相同的效果。

在线演示

我还没有对这个实现进行性能测试,但我预计它只会比原生实现稍微慢一点。

英文:

I think that I've managed to come up with an implementation that satisfies all three requirements:

/// &lt;summary&gt;
/// Removes all the elements that match the conditions defined by the specified
/// predicate. In case the predicate fails, the integrity of the list is preserved.
/// &lt;/summary&gt;
public static int RemoveAll&lt;T&gt;(this List&lt;T&gt; list, Func&lt;T, int, bool&gt; predicate)
{
    ArgumentNullException.ThrowIfNull(list);
    ArgumentNullException.ThrowIfNull(predicate);

    Span&lt;T&gt; span = CollectionsMarshal.AsSpan(list);
    int i = 0, j = 0;
    try
    {
        for (; i &lt; span.Length; i++)
        {
            if (predicate(span[i], i)) continue;
            if (j &lt; i) span[j] = span[i];
            j++;
        }
    }
    finally
    {
        if (j &lt; i)
        {
            for (; i &lt; span.Length; i++, j++)
                span[j] = span[i];
            list.RemoveRange(j, span.Length - j);
        }
    }
    return i - j;
}

For better performance it uses the CollectionsMarshal.AsSpan method (.NET 5) to get a Span&lt;T&gt; out of the list. The algorithm works just as well by using the indexer of the list instead of the span, and replacing the span.Length with list.Count.

Online demo.

I haven't benchmark this implementation, but I expect it to be only marginally slower than the native implementation.

答案3

得分: 0

所以他们认为没有任何需要修复的错误。他们还认为这种行为既不令人惊讶,也不意外,因此无需记录它。

他们是正确的。该方法已记录为:

删除与指定谓词定义的条件匹配的所有元素。

这支持两种情况:谓词返回true,删除一个元素,或返回false以保留它不变。谓词抛出异常不是预期支持的用例。

如果您希望能够传递可能引发异常的谓词,您可以像这样包装它:

public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate)
{
    Exception? caught = null;
    int index = 0;
    int removed = 0;

    list.RemoveAll(item =>
    {
        // 一旦抛出异常,忽略剩余列表
        if (caught != null) return false;

        try
        {
            var remove = predicate(item, index);
            if (remove)
            {
                removed++;
            }

            return remove;
        }
        catch (Exception e)
        {
            caught = e;
            return false;
        }

        index++;
    });

    if (caught != null)
    {
        throw caught;
    }

    return removed;
}
英文:

> So they don't think that there is any bug that needs to be fixed. They also believe that this behavior is not surprising or unexpected, so there is no need to document it.

They're correct. The method is documented as:

> Removes all the elements that match the conditions defined by the specified predicate.

This supports two scenarios: the predicate returning true, removing an element, or false for leaving it as-is. A predicate throwing an exception is not a use case intended to be supported.

If you want to be able to pass a predicate that may throw, you could wrap it like this:

public static int RemoveAll&lt;T&gt;(this List&lt;T&gt; list, Func&lt;T, int, bool&gt; predicate)
{
    Exception? caught = null;
    int index = 0;
    int removed = 0;

    list.RemoveAll(item =&gt;
    {
        // Ignore the rest of the list once thrown
        if (caught != null) return false;

        try
        {
            var remove = predicate(item, index);
            if (remove)
            {
                removed++;
            }

            return remove;
        }
        catch (Exception e)
        {
            caught = e;
            return false;
        }

        index++;
    });

    if (caught != null)
    {
        throw caught;
    }

    return removed;
}

答案4

得分: -2

我不知道微软是如何编写这个方法的。

我尝试了一些代码块,并且找到了情况。

实际问题在于你的 throw new Exception()。如果你在那个时候不写这段代码,你的代码将会完美运行。异常触发了另一个情况。但我不知道那是什么。

if (item >= 10) return false;
bool removeIt = item % 2 == 1;
if (removeIt) Console.WriteLine($"移除 #{item}");
return removeIt;

我找到了这个。 编辑

实际上,Func<T, int, bool> 属性并没有删除某个项目。它返回一个布尔值。如果返回 true,它从列表中成功删除。如果返回 false,它就没有从列表中删除。

英文:

I don't know microsoft is how to wrote this method.

I tried some code block. And i found case.

Actually problem is your throw new Exception(). If you dont this code that time yo code will run perfect. Exception trigger some another case. But i dont know what is that.

if (item &gt;= 10) return false;
bool removeIt = item % 2 == 1;
if (removeIt) Console.WriteLine($&quot;Removing #{item}&quot;);
return removeIt;

I found this. EDIT

Actually Func&lt;T, int, bool&gt; property is not deleted some item. It return boolean. As if return true he succesful deleted from list. If return false. it is not deleted from list.

huangapple
  • 本文由 发表于 2023年1月6日 13:58:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/75027446.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定