如何在R中识别并将“混合”观察中的罗马数字转换为整数?

huangapple go评论63阅读模式
英文:

How can I identify and turn roman numbers into integers in "mixed" observations in R?

问题

我有一个包含混合字符(单词)和罗马数字的观察的数据框。它还包含整数、仅字符(例如"Apple")和NA值,但我希望保持它们不变。

所以观察如下:

x <- data.frame(col = c("15", "NA", "0", "Red", "iv", "Logic", "ix. Sweet", "VIII - Apple", 
"Big XVI", "WeirdVII", "XI: Small"))

我想要的是将每个包含罗马数字的观察(即使它们与单词混合在一起)转换为整数。因此,根据示例,结果数据框将如下所示:

15
NA
0
Red
4
Logic
9
8
16
7
11

有没有办法做到这一点?

我尝试过的是:

library(stringr)
library(gtools)

roman <- str_extract(x$col, "([IVXivx]+)")
roman_to_int <- roman2int(roman)
x$col <- ifelse(!is.na(roman_to_int), roman_to_int, x$col)

但这不起作用,因为字符观察也被转换为罗马数字,比如"Logic"被转换为"1"。我想避免这种情况。

英文:

I have a data frame with a column that contains observations that mix characters (words) and roman numbers. It also has integers, only characters (like the observation "Apple"), and NA's, but I want to leave them unchanged.

So it has observations like:

x &lt;- data.frame(col = c(&quot;15&quot;, &quot;NA&quot;, &quot;0&quot;, &quot;Red&quot;, &quot;iv&quot;, &quot;Logic&quot;, &quot;ix. Sweet&quot;, &quot;VIII - Apple&quot;, 
&quot;Big XVI&quot;, &quot;WeirdVII&quot;, &quot;XI: Small&quot;))

What I want is to turn every observation that has a roman number (even the ones that are mixed with words), and turn them into integers. So, following the example, the resulting data frame would be like:

15
NA
0
Red
4
Logic
9
8
16
7
11

Is there any way to do this?

What I have attempted is:

library(stringr)
 
library(gtools)

roman &lt;- str_extract(x$col, &quot;([IVXivx]+)&quot;)

roman_to_int &lt;- roman2int(roman)

x$col &lt;- ifelse(!is.na(roman_to_int), roman_to_int, x$col)

However, this does not work because the observations that are character but do not include roman integers are also turned into roman numbers, like the one "Logic" turns as "1". I want to avoid this.

答案1

得分: 2

pat <- "[IVXLCDM]{2,}|\b[ivxlcdm]+\b|\b[IVXLCDM]+\b"

str_replace_all(x$col, pat, gtools::roman2int)

[1] "15" "NA" "0" "Red" "4"
[6] "Logic" "9. Sweet" "8 - Apple" "Big 16" "Weird7"
[11] "11: Small"

英文:
pat &lt;-  &quot;[IVXLCDM]{2,}|\\b[ivxlcdm]+\\b|\\b[IVXLCDM]+\\b&quot;

str_replace_all(x$col,pat, gtools::roman2int)

  [1] &quot;15&quot;        &quot;NA&quot;        &quot;0&quot;         &quot;Red&quot;       &quot;4&quot;        
  [6] &quot;Logic&quot;     &quot;9. Sweet&quot;  &quot;8 - Apple&quot; &quot;Big 16&quot;    &quot;Weird7&quot;   
  [11] &quot;11: Small&quot;

huangapple
  • 本文由 发表于 2023年3月1日 09:42:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75598866.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定