在R数据框中反转非NA值的顺序。

huangapple go评论93阅读模式
英文:

Reverse the order of non-NA values in an r dataframe

问题

I am looking to reverse the order of numeric values in multiple columns of an R dataframe (so that the highest number becomes the lowest, and so forth), whilst leaving the NA values as they are.

An example of my dataframe:

  1. my_data <- data.frame (animal = c("fox", "rabbit", "cow", "sheep", "pig", "mole"),
  2. x = c("1", "2", "1", "3", "NA", 'NA'),
  3. y = c('NA','NA','1','3','2','NA'),
  4. z = c('1','2','3','4','NA','5'),
  5. area = c("field","field","farm","farm","farm","farm"))

and then, what I am trying to achieve:

  1. my_ideal_data <- data.frame (animal = c("fox", "rabbit", "cow", "sheep", "pig", "mole"),
  2. x = c("3", "2", "3", "1", "NA", 'NA'),
  3. y = c('NA','NA','3','1','2','NA'),
  4. z = c('5','4','3','2','NA','1'),
  5. area = c("field","field","farm","farm","farm","farm"))

The 'animal' and 'area' column remain the same, as do all of the NAs - but I need the values in x, y, and z to be placed in reverse order for each of the columns.

Any help would be greatly appreciated!

Thank you

英文:

I am looking to reverse the order of numeric values in multiple columns of an R dataframe (so that the highest number becomes the lowest, and so forth), whilst leaving the NA values as they are.

An example of my dataframe:

  1. my_data &lt;- data.frame (animal = c(&quot;fox&quot;, &quot;rabbit&quot;, &quot;cow&quot;, &quot;sheep&quot;, &quot;pig&quot;, &quot;mole&quot;),
  2. x = c(&quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;3&quot;, &quot;NA&quot;, &#39;NA&#39;),
  3. y = c(&#39;NA&#39;,&#39;NA&#39;,&#39;1&#39;,&#39;3&#39;,&#39;2&#39;,&#39;NA&#39;),
  4. z = c(&#39;1&#39;,&#39;2&#39;,&#39;3&#39;,&#39;4&#39;,&#39;NA&#39;,&#39;5&#39;),
  5. area = c(&quot;field&quot;,&quot;field&quot;,&quot;farm&quot;,&quot;farm&quot;,&quot;farm&quot;,&quot;farm&quot;))

and then, what I am trying to achieve:

  1. my_ideal_data &lt;- data.frame (animal = c(&quot;fox&quot;, &quot;rabbit&quot;, &quot;cow&quot;, &quot;sheep&quot;, &quot;pig&quot;, &quot;mole&quot;),
  2. x = c(&quot;3&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;NA&quot;, &#39;NA&#39;),
  3. y = c(&#39;NA&#39;,&#39;NA&#39;,&#39;3&#39;,&#39;1&#39;,&#39;2&#39;,&#39;NA&#39;),
  4. z = c(&#39;5&#39;,&#39;4&#39;,&#39;3&#39;,&#39;2&#39;,&#39;NA&#39;,&#39;1&#39;),
  5. area = c(&quot;field&quot;,&quot;field&quot;,&quot;farm&quot;,&quot;farm&quot;,&quot;farm&quot;,&quot;farm&quot;))

The 'animal' and 'area' column remain the same, as do all of the NAs - but I need the values in x, y and z to be placed in reverse order for each of the columns.

Any help would be greatly appreciated!

Thank you

答案1

得分: 1

以下是您要翻译的内容:

在这些数据中,您可以在转换为数值后,简单地从z列中减去6:

  1. my_data$z <- 6 - as.numeric(my_data$z)
  2. # > my_data
  3. # animal x y z area
  4. #1 fox 3 NA 5 field
  5. #2 rabbit 2 NA 4 field
  6. #3 cow 3 3 3 farm
  7. #4 sheep 1 1 2 farm
  8. #5 pig NA 2 NA farm
  9. #6 mole NA NA 1 farm

如果这些示例数据过于简化,另一种方法是使用grep索引非NA值,然后使用gtools::mixedsort()按降序值排序,然后使用索引替换这些值。这可能会更具可扩展性,而且您不必转换为数值。

  1. idx <- grep("\\d+", my_data$z)
  2. vals <- gtools::mixedsort(my_data$z[idx], decreasing = TRUE)
  3. my_data$z[idx] <- vals
  4. # animal x y z area
  5. #1 fox 3 NA 5 field
  6. #2 rabbit 2 NA 4 field
  7. #3 cow 3 3 3 farm
  8. #4 sheep 1 1 2 farm
  9. #5 pig NA 2 NA farm
  10. #6 mole NA NA 1 farm

如果您想将其应用于多列,您可以使用lapply包装它,形成一个函数:

  1. myfun <- function(x){
  2. a <- grep("\\d+", x)
  3. x[a] <- gtools::mixedsort(x[a], decreasing = TRUE)
  4. x
  5. }
  6. my_data[c("x", "y", "z")] <- lapply(my_data[c("x", "y", "z")], myfun)
英文:

In these data, you could simply subtract 6 from the z column after converting to numeric:

  1. my_data$z &lt;- 6 - as.numeric(my_data$z)
  2. #&gt; my_data
  3. # animal x y z area
  4. #1 fox 3 NA 5 field
  5. #2 rabbit 2 NA 4 field
  6. #3 cow 3 3 3 farm
  7. #4 sheep 1 1 2 farm
  8. #5 pig NA 2 NA farm
  9. #6 mole NA NA 1 farm

An alternative if these sample data are too simplified would be to index the non-NA values using grep, then sort by decreasing value using gtools::mixedsort(), then replace those values using [indexing]. This might be a little more scalable, and you dont have to convert to numeric.

  1. idx &lt;- grep(&quot;\\d+&quot;, my_data$z)
  2. vals &lt;- gtools::mixedsort(my_data$z[idx], decreasing = TRUE)
  3. my_data$z[idx] &lt;- vals
  4. # animal x y z area
  5. #1 fox 3 NA 5 field
  6. #2 rabbit 2 NA 4 field
  7. #3 cow 3 3 3 farm
  8. #4 sheep 1 1 2 farm
  9. #5 pig NA 2 NA farm
  10. #6 mole NA NA 1 farm

If you wanted to apply it to multiple columns, you could wrap it in a functioning use lapply:

  1. myfun &lt;- function(x){
  2. a &lt;- grep(&quot;\\d+&quot;, x)
  3. x[a] &lt;- gtools::mixedsort(x[a], decreasing = TRUE)
  4. x
  5. }
  6. my_data[c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)] &lt;- lapply(my_data[c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)], myfun)

答案2

得分: 1

你可以在这里使用一个for循环。

首先将所有的"NA"字符替换为实际的NA

  1. my_data[my_data == &quot;NA&quot;] &lt;- NA

然后定义一个包含你想要排序的列的向量。

  1. target_col &lt;- c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)

然后使用for循环遍历目标列,并通过减去max+ 1来进行替换。

  1. my_data[my_data == &quot;NA&quot;] &lt;- NA
  2. target_col &lt;- c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)
  3. for (i in target_col) {
  4. my_data[!is.na(my_data[,i]),i] &lt;- as.integer(max(my_data[,i], na.rm = T)) + 1 - as.integer(my_data[!is.na(my_data[,i]),i])
  5. }
  1. animal x y z area
  2. 1 fox 3 &lt;NA&gt; 5 field
  3. 2 rabbit 2 &lt;NA&gt; 4 field
  4. 3 cow 3 3 3 farm
  5. 4 sheep 1 1 2 farm
  6. 5 pig &lt;NA&gt; 2 &lt;NA&gt; farm
  7. 6 mole &lt;NA&gt; &lt;NA&gt; 1 farm
英文:

You might use a for loop here.

First replace all "NA" characters into real NA.

  1. my_data[my_data == &quot;NA&quot;] &lt;- NA

Then define a vector containing the columns that you want to sort.

  1. target_col &lt;- c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)

And use a for loop to go over the target columns and perform replacement by deducting the column values by the max values + 1.

  1. my_data[my_data == &quot;NA&quot;] &lt;- NA
  2. target_col &lt;- c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)
  3. for (i in target_col) {
  4. my_data[!is.na(my_data[,i]),i] &lt;- as.integer(max(my_data[,i], na.rm = T)) + 1 - as.integer(my_data[!is.na(my_data[,i]),i])
  5. }
  6. animal x y z area
  7. 1 fox 3 &lt;NA&gt; 5 field
  8. 2 rabbit 2 &lt;NA&gt; 4 field
  9. 3 cow 3 3 3 farm
  10. 4 sheep 1 1 2 farm
  11. 5 pig &lt;NA&gt; 2 &lt;NA&gt; farm
  12. 6 mole &lt;NA&gt; &lt;NA&gt; 1 farm

答案3

得分: 0

使用dplyracross函数:

  1. library(dplyr)
  2. Cols <- c("x", "y", "z")
  3. my_data[,Cols] <- Vectorize(\(x) as.numeric(x))(my_data[,Cols])
  4. my_data %>%
  5. mutate(across(!!Cols, ~ max(.x[!is.na(.x)]) - .x + 1))

动物 x y z 区域
1 狐狸 3 NA 5 田地
2 兔子 2 NA 4 田地
3 牛 3 3 3 农场
4 羊 1 1 2 农场
5 猪 NA 2 NA 农场
6 鼹鼠 NA NA 1 农场

  1. <details>
  2. <summary>英文:</summary>
  3. With `dplyr` using `across`

library(dplyr)

Cols <- c("x", "y", "z")

my_data[,Cols] <- Vectorize((x) as.numeric(x))(my_data[,Cols])

my_data %>%
mutate(across(!!Cols, ~ max(.x[!is.na(.x)]) - .x + 1))
animal x y z area
1 fox 3 NA 5 field
2 rabbit 2 NA 4 field
3 cow 3 3 3 farm
4 sheep 1 1 2 farm
5 pig NA 2 NA farm
6 mole NA NA 1 farm

  1. </details>

huangapple
  • 本文由 发表于 2023年5月10日 19:36:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76217921.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定