2023年6月12日 06:50:09go评论135阅读模式

英文:

Unexpected warnings with case_when and regex conditions suggest too many cases are matching

问题

以下是您的代码部分的翻译：

我有一个数据集，其中一些数据以hh:mm格式和Excel序列号的混乱日期/时间格式存在。因此，我已将所有内容强制转换为字符串，并在一个大的`case_when`块内使用`stringr`和`readr`来识别不同的格式并正确处理它们。我认为我要么误解了我的`stringr`函数，要么误解了`case_when`，因为我得到了我期望的输出，但它会抛出解析失败和`NA`强制转换的警告，这些警告在最终产品中是不必要的。
这里是一些虚拟数据，其中包含我的数据集中每种格式的示例：
```R
dummy <- tibble(x = c("13:15:21", "02:03:17+01:00", "12:03", "0.1234"))

我创建了一个函数来识别和解析这些格式中的每一个。它调用另一个函数将Excel序列号代码转换为时间。它使用我认为是正确的正则表达式，但总结一下：

^0\\. 应该通过它们都是小于1的小数来识别Excel序列号
\\+ 通过搜索加号来识别带有DST时区指示符的时间
.+(?=\\+) 提取加号前的所有内容以进行解析
: 通过测试冒号来确保结果是某种时间。这是一个更广泛的测试，因此它在加号已经匹配的情况下最后出现

convert_times <- function(x){
  case_when(str_detect(x, "^0\\.")        ~ convert_excel_time(x), 
            str_detect(x, "\\+")          ~ parse_time(str_extract(x, ".+(?=\\+)")), 
            str_detect(x, ":")            ~ parse_time(x),
            .default = NA)
}
convert_excel_time <- function(x){
  as.numeric(x) * 24 * 60 * 60 %>%
  as_datetime() %>%
  hms::as_hms()
}

当我运行它时，我得到了期望的输出，但随之而来的警告似乎表明我不理解底层发生了什么。

> dummy %>%
+   mutate(new = convert_time(x))
# A tibble: 4 × 2
  x              new        
  <chr>          <time>     
1 13:15:21       13:15:21.00
2 02:03:17+01:00 02:03:17.00
3 12:03          12:03:00.00
4 0.1234         02:57:41.76

这是我的错误：

[[1]]
<warning/rlang_warning>
Warning in `mutate()`:
ℹ In argument: `new = convert_time(x)`.
Caused by warning in `convert_excel_time()`:
! NAs introduced by coercion
---
Backtrace:
    ▆
 1. ├─dummy %>% mutate(new = convert_time(x))
 2. ├─dplyr::mutate(., new = convert_time(x))
 3. └─dplyr:::mutate.data.frame(., new = convert_time(x))
[[2]]
<warning/rlang_warning>
Warning in `mutate()`:
ℹ In argument: `new = convert_time(x)`.
Caused by warning:
! 2 parsing failures.
row col   expected         actual
  2  -- time like  02:03:17+01:00
  4  -- time like  0.1234        
---
Backtrace:
    ▆
 1. ├─dummy %>% mutate(new = convert_time(x))
 2. ├─dplyr::mutate(., new = convert_time(x))
 3. └─dplyr:::mutate.data.frame(., new = convert_time(x))

在我看来，convert_time不应该尝试解析那两个观察结果，因为它们被case_when块的左侧排除了。同样，我没有期望NA强制转换，因为case_when的左侧阻止了convert_excel_time()看到hh:mm字符串。非常感谢。


<details>
<summary>英文:</summary>
I have a data set with messy date/time formatting in some hh:mm formats and Excel serial numbers. So I&#39;ve coerced everything into a string and I&#39;m using `stringr` and `readr` within a large `case_when` block to identify different formats and process them properly. I think I&#39;m misunderstanding either my `stringr` functions or `case_when` because I&#39;m getting the output I expect, but it&#39;s throwing warnings of parsing failures and `NA` coercion that aren&#39;t in the final product.
Here are some dummy data with an example of each of the formats in my data set:

dummy <- tibble(x = c("13:15:21", "02:03:17+01:00", "12:03", "0.1234"))


I&#39;ve made a function to identify and parse each of these formats. It calls on another function to change excel serial codes into times. It uses regular expressions which I think are correct, but to summarise:
 - `^0\\.` should identify Excel serial numbers by the fact they are all
   decimals &lt;1 
- `\\+` is identifying the times with the DST timezone
   indicator by searching for the plus sign then 
- `.+(?=\\+)` is
   extracting everything before the plus sign to parse 
- `:` is testing
   for a colon to make sure the result is some kind of time. This is a
   broader test so it&#39;s coming last after the pluses have already been
   matched

convert_times <- function(x){
case_when(str_detect(x, "^0\.") ~ convert_excel_time(x),
str_detect(x, "\+") ~ parse_time(str_extract(x, ".+(?=\+)")),
str_detect(x, ":") ~ parse_time(x),
.default = NA)
}

convert_excel_time <- function(x){
as.numeric(x) * 24 * 60 * 60 %>%
as_datetime() %>%
hms::as_hms()
}


When I run it, I get the expected output, but the warnings that come along with it suggest to me I&#39;m not understanding what&#39;s happening under the hood.

> dummy %>%

mutate(new = convert_time(x))

A tibble: 4 × 2

x new
<chr> <time>
1 13:15:21 13:15:21.00
2 02:03:17+01:00 02:03:17.00
3 12:03 12:03:00.00
4 0.1234 02:57:41.76


These are my errors

[[1]]
<warning/rlang_warning>
Warning in `mutate()`:
ℹ In argument: `new = convert_time(x)`.
Caused by warning in `convert_excel_time()`:
! NAs introduced by coercion

Backtrace:
▆

├─dummy %>% mutate(new = convert_time(x))
├─dplyr::mutate(., new = convert_time(x))
└─dplyr:::mutate.data.frame(., new = convert_time(x))

[[2]]
<warning/rlang_warning>
Warning in `mutate()`:
ℹ In argument: `new = convert_time(x)`.
Caused by warning:
! 2 parsing failures.
row col expected actual
2 -- time like 02:03:17+01:00
4 -- time like 0.1234

Backtrace:
▆

├─dummy %>% mutate(new = convert_time(x))
├─dplyr::mutate(., new = convert_time(x))
└─dplyr:::mutate.data.frame(., new = convert_time(x))


It seems to me, `convert_time` shouldn&#39;t be trying to parse those two observations at all since they are excluded by the left side of the `case_when` block. Similarly, I didn&#39;t expect `NA` coercion since the left hand side of the `case_when` prevents `convert_excel_time()` from seeing the hh:mm strings. Many thanks.
</details>
# 答案1
**得分**: 1
抱歉，我只返回翻译好的部分，不包括代码。以下是翻译好的内容：
"Gah! I didn't read all the documentation. The `dplyr` reference (https://dplyr.tidyverse.org/reference/case_when.html) clearly says `case_when` always solves all the RHS equations, which is why they throw a warning, but only uses the ones that match the LHS conditions.
# `case_when()` evaluates all RHS expressions, and then constructs its
# result by extracting the selected (via the LHS expressions) parts.
# In particular `NaN`s are produced in this case:
y &lt;- seq(-2, 2, by = .5)
case_when(
  y &gt;= 0 ~ sqrt(y),
  .default = y
)
#&gt; Warning: NaNs produced
#&gt; [1] -2.0000000 -1.5000000 -1.0000000 -0.5000000  0.0000000  0.7071068
#&gt; [7]  1.0000000  1.2247449  1.4142136"
<details>
<summary>英文:</summary>
Gah! I didn&#39;t read all the documentation. The `dplyr` reference (https://dplyr.tidyverse.org/reference/case_when.html) clearly says `case_when` always solves all the RHS equations, which is why they throw a warning, but only uses the ones that match the LHS conditions.

`case_when()` evaluates all RHS expressions, and then constructs its

result by extracting the selected (via the LHS expressions) parts.

In particular `NaN`s are produced in this case:

y <- seq(-2, 2, by = .5)
case_when(
y >= 0 ~ sqrt(y),
.default = y
)
#> Warning: NaNs produced
#> [1] -2.0000000 -1.5000000 -1.0000000 -0.5000000 0.0000000 0.7071068
#> [7] 1.0000000 1.2247449 1.4142136


</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

意外警告与 case_when 和正则表达式条件一起使用时，表明有太多情况匹配。

问题

A tibble: 4 × 2

[[1]]
<warning/rlang_warning>
Warning in `mutate()`:
ℹ In argument: `new = convert_time(x)`.
Caused by warning in `convert_excel_time()`:
! NAs introduced by coercion

[[2]]
<warning/rlang_warning>
Warning in `mutate()`:
ℹ In argument: `new = convert_time(x)`.
Caused by warning:
! 2 parsing failures.
row col expected actual
2 -- time like 02:03:17+01:00
4 -- time like 0.1234

`case_when()` evaluates all RHS expressions, and then constructs its

result by extracting the selected (via the LHS expressions) parts.

In particular `NaN`s are produced in this case:

如何使用 ggplot 绘制 F 统计量和 p 值

Automatically check/uncheck one checkbox if another is checked/unchecked in R Shiny

收集来自副本的多值结果到一个数据框中。

如何同时在一个y轴上绘制三个变量？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论

问题

A tibble: 4 × 2

[[1]] <warning/rlang_warning> Warning in mutate(): ℹ In argument: new = convert_time(x). Caused by warning in convert_excel_time(): ! NAs introduced by coercion

[[2]] <warning/rlang_warning> Warning in mutate(): ℹ In argument: new = convert_time(x). Caused by warning: ! 2 parsing failures. row col expected actual 2 -- time like 02:03:17+01:00 4 -- time like 0.1234

case_when() evaluates all RHS expressions, and then constructs its

result by extracting the selected (via the LHS expressions) parts.

In particular NaNs are produced in this case:

发表评论

[[1]]
<warning/rlang_warning>
Warning in `mutate()`:
ℹ In argument: `new = convert_time(x)`.
Caused by warning in `convert_excel_time()`:
! NAs introduced by coercion

[[2]]
<warning/rlang_warning>
Warning in `mutate()`:
ℹ In argument: `new = convert_time(x)`.
Caused by warning:
! 2 parsing failures.
row col expected actual
2 -- time like 02:03:17+01:00
4 -- time like 0.1234

`case_when()` evaluates all RHS expressions, and then constructs its

In particular `NaN`s are produced in this case: