删除列表中连续重复的条目的尾递归方法

huangapple go评论65阅读模式
英文:

Tail recursive remove duplicate consecutive entries in list

问题

I'm here to provide the translated content as requested. Here's the translation of the text you provided:

我正在尝试解决99个OCaml问题中的问题8,该问题要求编写一个名为compress的函数,用于移除列表中的连续重复项:

assert (
  compress
    [ "a"; "a"; "a"; "a"; "b"; "c"; "c"; "a"; "a"; "d"; "e"; "e"; "e"; "e" ]
  = [ "a"; "b"; "c"; "a"; "d"; "e" ])

我提出了以下解决方案:

let head = function x :: _ -> Some x | [] -> None

let compress list =
  let rec fn old_list new_list =
    match (old_list, new_list) with
    | h :: t, _ ->
        fn t (if Some h = head new_list then new_list else h :: new_list)
    | _ -> new_list
  in
  List.rev (fn list [])
;;

网站提供的示例解决方案如下:

let rec compress = function
    | a :: (b :: _ as t) -> if a = b then compress t else a :: compress t
    | smaller -> smaller;;

起初,我认为我的解决方案更高效,因为它是尾递归的,而提供的解决方案显然不是(需要在堆栈上保留a :: compress t中的a)。然而,当我测试我的代码是否尾递归时:

assert (
  (compress [@tailcall])
    [ "a"; "a"; "a"; "a"; "b"; "c"; "c"; "a"; "a"; "d"; "e"; "e"; "e"; "e" ]
  = [ "a"; "b"; "c"; "a"; "d"; "e" ])

它给我一个警告,说它不是尾递归的。为什么?

从我的理解来看,我的解决方案不需要在堆栈上保留任何状态,这应该使它成为尾递归的。

编辑
还尝试将[@tailcall]应用于fn,通过List.rev ((fn [@tailcall]) list []),但收到相同的警告。

英文:

I'm attempting problem 8 of the 99 OCaml problems, which asks you to write a function compress that removes consecutive duplicate entires in a list:

assert (
  compress
    [ "a"; "a"; "a"; "a"; "b"; "c"; "c"; "a"; "a"; "d"; "e"; "e"; "e"; "e" ]
  = [ "a"; "b"; "c"; "a"; "d"; "e" ])

I came to the following solution:

let head = function x :: _ -> Some x | [] -> None

let compress list =
  let rec fn old_list new_list =
    match (old_list, new_list) with
    | h :: t, _ ->
        fn t (if Some h = head new_list then new_list else h :: new_list)
    | _ -> new_list
  in
  List.rev (fn list [])
;;

The provided example solution by the website is as follows:

let rec compress = function
    | a :: (b :: _ as t) -> if a = b then compress t else a :: compress t
    | smaller -> smaller;;

At first, I thought that my solution was more efficient as it is tail recursive, while the provided solution is clearly not (requires us to keep the a in a :: compress t on the stack). However, when I test if my code is tail recursive:

assert (
  (compress [@tailcall])
    [ "a"; "a"; "a"; "a"; "b"; "c"; "c"; "a"; "a"; "d"; "e"; "e"; "e"; "e" ]
  = [ "a"; "b"; "c"; "a"; "d"; "e" ])

It gives me a warning saying it's not tail recursive. Why?

From my understanding, my solution doesn't require keeping any state on the stack, which should make it tail recursive.

EDIT
Also tried applying the [@tailcall] to fn directly via List.rev ((fn [@tailcall]) list []), get the same warning.

答案1

得分: 2

Here are the translated portions of your text:

当你进行断言时,compress 不在尾位置,(=) 。 这与你的 compress 函数实现无关。

同样,在表达式 List.rev ((fn [@tailcall]) list []) 中,fn 的调用不在尾调用位置。

你可以通过尝试以下方式进行测试:

请注意,尾递归并不总是意味着_更高效_。尾递归函数通常_不太_高效,但它们可以用于大型数据而不会导致栈溢出。如果你处理可能导致这种情况的数据,那么建议重新评估你正在使用的数据结构。

截至 OCaml 4.14 版本,我们也可以使用 tail_mod_cons 使 compress 成为尾递归。

或者,你可以考虑使用延续传递方式来实现这个功能。

作为另一种有趣的尾递归替代方法,你可以编写一个生成 sequencecompress 函数。为了做到这一点,我们的 aux 函数需要接受一个参数来跟踪上次看到的值。因为在开始时不会有上次看到的值,所以使用 option 类型是有意义的。

以下是翻译好的部分。如果需要更多翻译,请提供具体的文本。

英文:

When you make your assertion, compress isn't in tail position, (=) is. This has no bearing on your compress function implementation.

# assert (
  ((=) [@tailcall]) 
    (compress ["a"; "a"; "b"]) 
    ["a"; "b"]
);;
- : unit = ()

Similarly, in the expression List.rev ((fn [@tailcall]) list []), the call of fn is not in the tail call position.

You can test this by trying:

let compress list =
  let rec fn old_list new_list =
    match (old_list, new_list) with
    | h :: t, _ ->
        (fn[@tailcall]) t (if Some h = head new_list then new_list else h :: new_list)
    | _ -> new_list
  in
  List.rev (fn list [])

Note also that tail-recursive does not always mean more efficient. Tail-recursive functions are often less efficient, but they can be used on large data without a stack overflow. If you are dealing with data likely to cause this, it suggests you may need to re-evaluate the data structure you're using.

We could also, as of OCaml 4.14 make compress tail-recursive with tail_mod_cons.

let[@tail_mod_cons] rec compress = 
  function
  | a :: (b :: _ as t) -> 
      if a = b then compress t 
      else a :: compress t
  | smaller -> smaller;;

Alternatively, you might implement this with continuation passing.

let compress lst =
  let rec aux k = function
    | ([] | [_]) as lst -> k lst
    | a::(b::_ as t) when a = b -> aux k t
    | a::t -> aux (fun i -> k (a :: i)) t
  in
  aux Fun.id lst

As yet another fun alternative that is tail-recursive, you might write a compress function which generates a sequence. In order to do this, our aux function needs to take an argument to keep track of the last value seen. Because at the beginning there will not be a last value seen, the option type makes sense.

# let compress lst =
    let rec aux lst last_seen () =
      match lst, last_seen with
      | [], _ -> Seq.Nil
      | x::xs, Some x' when x = x' -> aux xs last_seen ()
      | x::xs, _ -> Seq.Cons (x, aux xs (Some x)) 
    in
    aux lst None;;
val compress : 'a list -> 'a Seq.t = <fun>
# compress [1;1;1;3;3;4;6;6;7;4] 
  |> Seq.take 3 
  |> List.of_seq;;
- : int list = [1; 3; 4]

答案2

得分: 1

以下是已翻译的内容:

"Okay, so I figured it out.

1) Is it tail recursive?

To test if the functions were tail recursive, I decided to just try and break them by inducing a stack overflow:

let _ = compress (List.init 10_000_000 (fun x -> Some x))

For my implementation, this works just fine. The provided solution, on the other hand, results in a segvault, which I'm assuming is from stack overflow:

[1]    70387 segmentation fault  ./a.out

So we can conclude that my implementation is indeed tail recursive, while the other one is not.

2) Which one is faster?

I used the following to test the speed of both functions:

let _ =
  let l = List.init 250000 (fun x -> x) in
  let t = Sys.time () in
  let f = compress l in
  Printf.printf "Execution time: %fs\n" (Sys.time () -. t);
  f

Note, 250000 is the limit on my machine before things stack-overflowed.

For the tail recursive implementation, it was around 0.018s.

For the non-tail recursive implementation, it was around 0.013s.

So seems that the overhead of function calls is not enough to make the non-tail recursive implementation slower than the tail recursive one, which requires 2 passes of the list.

I should also be noted, this is also for the worst case, where our input list List.init 250000 (fun x -> x) is a list with all unique elements. The amount of stack space required for the non-tail recursive implementation is proportional to the number of unique elements, not the number of elements in the list, because in the left branch of if a = b then compress t else a :: compress t, we don't use any stack space. I tested this by changing the list to only hold a constant and made the list much larger List.init 100000000 (fun x -> 0), and the non-tail recursive implementation no longer seg faults. It seems the OCaml compiler is smart enough to know that if the left branch of if a = b then compress t else a :: compress t is tail recursive, and only allocates a stack if the right branch is hit."

英文:

Okay, so I figured it out.

1) Is it tail recursive?

To test if the functions were tail recursive, I decided to just try and break them by inducing a stack overflow:

let _ = compress (List.init 10_000_000 (fun x -> Some x))

For my implementation, this works just fine. The provided solution, on the other hand, results in a segvault, which I'm assuming is from stack overflow:

[1]    70387 segmentation fault  ./a.out

So we can conclude that my implementation is indeed tail recursive, while the other one is not.

2) Which one is faster?

I used the following to test the speed of both functions:

let _ =
  let l = List.init 250000 (fun x -> x) in
  let t = Sys.time () in
  let f = compress l in
  Printf.printf "Execution time: %fs\n" (Sys.time () -. t);
  f

Note, 250000 is the limit on my machine before things stack-overflowed.

For the tail recursive implementation, it was around 0.018s.

For the non-tail recursive implementation, it was around 0.013s.

So seems that the overhead of function calls is not enough to make the non-tail recursive implementation slower than the tail recursive one, which requires 2 passes of the list.

I should also be noted, this is also for the worst case, where our input list List.init 250000 (fun x -> x) is a list with all unique elements. The amount of stack space required for the non-tail recursive implementation is proportional to the number of unique elements, not the number of elements in the list, because in the left branch of if a = b then compress t else a :: compress t, we don't use any stack space. I tested this by changing the list to only hold a constant and made the list much larger List.init 100000000 (fun x -> 0), and the non-tail recursive implementation no longer seg faults. It seems the OCaml compiler is smart enough to know that if the left branch of if a = b then compress t else a :: compress t is tail recursive, and only allocates a stack if the right branch is hit.

huangapple
  • 本文由 发表于 2023年6月30日 03:29:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76584105.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定