在R数据框中反转非NA值的顺序。

huangapple go评论58阅读模式
英文:

Reverse the order of non-NA values in an r dataframe

问题

I am looking to reverse the order of numeric values in multiple columns of an R dataframe (so that the highest number becomes the lowest, and so forth), whilst leaving the NA values as they are.

An example of my dataframe:

my_data <- data.frame (animal  = c("fox", "rabbit", "cow", "sheep", "pig", "mole"),
                        x = c("1", "2", "1", "3", "NA", 'NA'),
                        y = c('NA','NA','1','3','2','NA'),
                        z = c('1','2','3','4','NA','5'),
                        area = c("field","field","farm","farm","farm","farm"))

and then, what I am trying to achieve:

my_ideal_data <- data.frame (animal  = c("fox", "rabbit", "cow", "sheep", "pig", "mole"),
                             x = c("3", "2", "3", "1", "NA", 'NA'),
                             y = c('NA','NA','3','1','2','NA'),
                             z = c('5','4','3','2','NA','1'),
                             area = c("field","field","farm","farm","farm","farm"))

The 'animal' and 'area' column remain the same, as do all of the NAs - but I need the values in x, y, and z to be placed in reverse order for each of the columns.

Any help would be greatly appreciated!

Thank you

英文:

I am looking to reverse the order of numeric values in multiple columns of an R dataframe (so that the highest number becomes the lowest, and so forth), whilst leaving the NA values as they are.

An example of my dataframe:

my_data &lt;- data.frame (animal  = c(&quot;fox&quot;, &quot;rabbit&quot;, &quot;cow&quot;, &quot;sheep&quot;, &quot;pig&quot;, &quot;mole&quot;),
                        x = c(&quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;3&quot;, &quot;NA&quot;, &#39;NA&#39;),
                       y = c(&#39;NA&#39;,&#39;NA&#39;,&#39;1&#39;,&#39;3&#39;,&#39;2&#39;,&#39;NA&#39;),
                       z = c(&#39;1&#39;,&#39;2&#39;,&#39;3&#39;,&#39;4&#39;,&#39;NA&#39;,&#39;5&#39;),
                       area = c(&quot;field&quot;,&quot;field&quot;,&quot;farm&quot;,&quot;farm&quot;,&quot;farm&quot;,&quot;farm&quot;))

and then, what I am trying to achieve:

my_ideal_data &lt;- data.frame (animal  = c(&quot;fox&quot;, &quot;rabbit&quot;, &quot;cow&quot;, &quot;sheep&quot;, &quot;pig&quot;, &quot;mole&quot;),
                             x = c(&quot;3&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;NA&quot;, &#39;NA&#39;),
                             y = c(&#39;NA&#39;,&#39;NA&#39;,&#39;3&#39;,&#39;1&#39;,&#39;2&#39;,&#39;NA&#39;),
                             z = c(&#39;5&#39;,&#39;4&#39;,&#39;3&#39;,&#39;2&#39;,&#39;NA&#39;,&#39;1&#39;),
                             area = c(&quot;field&quot;,&quot;field&quot;,&quot;farm&quot;,&quot;farm&quot;,&quot;farm&quot;,&quot;farm&quot;))

The 'animal' and 'area' column remain the same, as do all of the NAs - but I need the values in x, y and z to be placed in reverse order for each of the columns.

Any help would be greatly appreciated!

Thank you

答案1

得分: 1

以下是您要翻译的内容:

在这些数据中,您可以在转换为数值后,简单地从z列中减去6:

my_data$z <- 6 - as.numeric(my_data$z)  

# > my_data
#  animal  x  y  z  area
#1    fox  3 NA  5 field
#2 rabbit  2 NA  4 field
#3    cow  3  3  3  farm
#4  sheep  1  1  2  farm
#5    pig NA  2 NA  farm
#6   mole NA NA  1  farm

如果这些示例数据过于简化,另一种方法是使用grep索引非NA值,然后使用gtools::mixedsort()按降序值排序,然后使用索引替换这些值。这可能会更具可扩展性,而且您不必转换为数值。

idx <- grep("\\d+", my_data$z)
vals <- gtools::mixedsort(my_data$z[idx], decreasing = TRUE)
my_data$z[idx] <- vals

#  animal  x  y  z  area
#1    fox  3 NA  5 field
#2 rabbit  2 NA  4 field
#3    cow  3  3  3  farm
#4  sheep  1  1  2  farm
#5    pig NA  2 NA  farm
#6   mole NA NA  1  farm

如果您想将其应用于多列,您可以使用lapply包装它,形成一个函数:

myfun <- function(x){
  a <- grep("\\d+", x)
  x[a] <- gtools::mixedsort(x[a], decreasing = TRUE)
  x
}

my_data[c("x", "y", "z")] <- lapply(my_data[c("x", "y", "z")], myfun)
英文:

In these data, you could simply subtract 6 from the z column after converting to numeric:

my_data$z &lt;- 6 - as.numeric(my_data$z)  

#&gt; my_data
#  animal  x  y  z  area
#1    fox  3 NA  5 field
#2 rabbit  2 NA  4 field
#3    cow  3  3  3  farm
#4  sheep  1  1  2  farm
#5    pig NA  2 NA  farm
#6   mole NA NA  1  farm

An alternative if these sample data are too simplified would be to index the non-NA values using grep, then sort by decreasing value using gtools::mixedsort(), then replace those values using [indexing]. This might be a little more scalable, and you dont have to convert to numeric.

idx &lt;- grep(&quot;\\d+&quot;, my_data$z)
vals &lt;- gtools::mixedsort(my_data$z[idx], decreasing = TRUE)
my_data$z[idx] &lt;- vals

#  animal  x  y  z  area
#1    fox  3 NA  5 field
#2 rabbit  2 NA  4 field
#3    cow  3  3  3  farm
#4  sheep  1  1  2  farm
#5    pig NA  2 NA  farm
#6   mole NA NA  1  farm

If you wanted to apply it to multiple columns, you could wrap it in a functioning use lapply:

myfun &lt;- function(x){
  a &lt;-  grep(&quot;\\d+&quot;, x)
  x[a] &lt;- gtools::mixedsort(x[a], decreasing = TRUE)
  x
}

my_data[c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)] &lt;- lapply(my_data[c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)], myfun)

答案2

得分: 1

你可以在这里使用一个for循环。

首先将所有的"NA"字符替换为实际的NA

my_data[my_data == &quot;NA&quot;] &lt;- NA

然后定义一个包含你想要排序的列的向量。

target_col &lt;- c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)

然后使用for循环遍历目标列,并通过减去max+ 1来进行替换。

my_data[my_data == &quot;NA&quot;] &lt;- NA

target_col &lt;- c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)
for (i in target_col) {
  my_data[!is.na(my_data[,i]),i] &lt;- as.integer(max(my_data[,i], na.rm = T)) + 1 - as.integer(my_data[!is.na(my_data[,i]),i])
}
  animal    x    y    z  area
1    fox    3 &lt;NA&gt;    5 field
2 rabbit    2 &lt;NA&gt;    4 field
3    cow    3    3    3  farm
4  sheep    1    1    2  farm
5    pig &lt;NA&gt;    2 &lt;NA&gt;  farm
6   mole &lt;NA&gt; &lt;NA&gt;    1  farm
英文:

You might use a for loop here.

First replace all "NA" characters into real NA.

my_data[my_data == &quot;NA&quot;] &lt;- NA

Then define a vector containing the columns that you want to sort.

target_col &lt;- c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)

And use a for loop to go over the target columns and perform replacement by deducting the column values by the max values + 1.

my_data[my_data == &quot;NA&quot;] &lt;- NA

target_col &lt;- c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)
for (i in target_col) {
  my_data[!is.na(my_data[,i]),i] &lt;- as.integer(max(my_data[,i], na.rm = T)) + 1 - as.integer(my_data[!is.na(my_data[,i]),i])
}

  animal    x    y    z  area
1    fox    3 &lt;NA&gt;    5 field
2 rabbit    2 &lt;NA&gt;    4 field
3    cow    3    3    3  farm
4  sheep    1    1    2  farm
5    pig &lt;NA&gt;    2 &lt;NA&gt;  farm
6   mole &lt;NA&gt; &lt;NA&gt;    1  farm

答案3

得分: 0

使用dplyracross函数:

library(dplyr)

Cols <- c("x", "y", "z")

my_data[,Cols] <- Vectorize(\(x) as.numeric(x))(my_data[,Cols])

my_data %>%
  mutate(across(!!Cols, ~ max(.x[!is.na(.x)]) - .x + 1))

动物 x y z 区域
1 狐狸 3 NA 5 田地
2 兔子 2 NA 4 田地
3 牛 3 3 3 农场
4 羊 1 1 2 农场
5 猪 NA 2 NA 农场
6 鼹鼠 NA NA 1 农场


<details>
<summary>英文:</summary>

With `dplyr` using `across`

library(dplyr)

Cols <- c("x", "y", "z")

my_data[,Cols] <- Vectorize((x) as.numeric(x))(my_data[,Cols])

my_data %>%
mutate(across(!!Cols, ~ max(.x[!is.na(.x)]) - .x + 1))
animal x y z area
1 fox 3 NA 5 field
2 rabbit 2 NA 4 field
3 cow 3 3 3 farm
4 sheep 1 1 2 farm
5 pig NA 2 NA farm
6 mole NA NA 1 farm


</details>



huangapple
  • 本文由 发表于 2023年5月10日 19:36:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76217921.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定