在R中查找数据框(在每一行中)特定值的列名。

huangapple go评论97阅读模式
英文:

Find the column name of a specific value on a data frame (in every row) R

问题

以下是已经翻译好的部分:

  1. 我有一个包含100列和超过10,000行的数据框。我为每行添加了两列,分别存储了最低值和次低值。现在,我想要添加两个额外的列,其中包含我找到这些值(最低值和次低值)的列的名称。
  2. 这是我的数据框的一部分:
  3. x y 2lowest lowest
  4. 1 23 3 23 3
  5. 2 41 12 41 12
  6. 3 32 33 33 32
  7. 4 58 38 58 38
  8. 我希望得到类似下面这样的结果:
  9. x y 2lowest lowest pos(2lowest) pos(lowest)
  10. 1 23 3 23 3 x y
  11. 2 41 12 41 12 x y
  12. 3 32 33 33 32 y x
  13. 4 58 38 58 38 x y
  14. 你有任何解决这个问题的想法吗?
  15. 我已经搜索过,但没有找到类似的内容。
  16. 非常感谢!
英文:

I have a data frame with 100 columns and more than 10k rows. I add two columns with each row's lowest and second lowest values. Now, I want to add two extra columns where I have the name of the columns where I found these values (lowest and second lowest).
This is a part of my data frame so far.

  1. df['2lowest']<-apply(df, 1, function(x) sort(x, decreasing = FALSE)[2])
  2. df['lowest']<-apply(df, 1, function(x) sort(x, decreasing = FALSE)[1])
  3. x y 2lowest lowest
  4. 1 23 3 23 3
  5. 2 41 12 41 12
  6. 3 32 33 33 32
  7. 4 58 38 58 38

and I want something like this.

  1. x y 2lowest lowest pos(2lowest) pos(lowest)
  2. 1 23 3 23 3 x y
  3. 2 41 12 41 12 x y
  4. 3 32 33 33 32 y x
  5. 4 58 38 58 38 x y

Do you have any idea how can I solve this?

I have searched but I could not find any similar to this.

Thank you very much!

答案1

得分: 2

我会一次性在一个单独的apply中完成所有操作。我添加了一个列到输入数据,以使其更有趣,并展示对于相等值的处理。

  1. df = read.table(text = ' x y z
  2. 1 23 3 4
  3. 2 41 12 11
  4. 3 32 33 32
  5. 4 58 38 58')
  6. result = apply(df, 1, \(x) {
  7. sx = sort(x)[1:2]
  8. c(as.list(sx), as.list(names(sx))) |>
  9. setNames(c("lowest", "2lowest", "pos(lowest)", "pos(2lowest)"))
  10. })
  11. cbind(df, do.call(rbind, result))
  12. # x y z lowest 2lowest pos(lowest) pos(2lowest)
  13. # 1 23 3 4 3 4 y z
  14. # 2 41 12 11 11 12 z y
  15. # 3 32 33 32 32 32 x z
  16. # 4 58 38 58 38 58 y x
英文:

I'd do it all at once in a single apply. I added a column to the input to make it a little more interesting and illustrate behavior with ties.

  1. df = read.table(text = ' x y z
  2. 1 23 3 4
  3. 2 41 12 11
  4. 3 32 33 32
  5. 4 58 38 58')
  6. result = apply(df, 1, \(x) {
  7. sx = sort(x)[1:2]
  8. c(as.list(sx), as.list(names(sx))) |>
  9. setNames(c("lowest", "2lowest", "pos(lowest)", "pos(2lowest)"))
  10. })
  11. cbind(df, do.call(rbind, result))
  12. # x y z lowest 2lowest pos(lowest) pos(2lowest)
  13. # 1 23 3 4 3 4 y z
  14. # 2 41 12 11 11 12 z y
  15. # 3 32 33 32 32 32 x z
  16. # 4 58 38 58 38 58 y x

答案2

得分: 0

我会将数据重塑成长格式,以检查相应的变量名称,然后再将其重塑回宽格式:

  1. library(dplyr)
  2. library(data.table)
  3. df <- tribble(
  4. ~x, ~y, ~`2lowest`, ~lowest,
  5. 23, 3, 23, 3,
  6. 41, 12, 41, 12,
  7. 32, 33, 33, 32,
  8. 58, 38, 58, 38
  9. )
  10. as.data.table(df) %>%
  11. .[, ID := 1:.N] %>%
  12. melt(id.vars = c("ID", "2lowest", "lowest")) %>%
  13. .[, ":=" (`pos(lowest)` = variable[lowest == value],
  14. `pos(2lowest)` = variable[`2lowest` == value]), by = ID] %>%
  15. dcast(ID + ... ~ variable, value.var = "value")
  1. ID 2lowest lowest pos(lowest) pos(2lowest) x y
  2. 1: 1 23 3 y x 23 3
  3. 2: 2 41 12 y x 41 12
  4. 3: 3 33 32 x y 32 33
  5. 4: 4 58 38 y x 58 38

希望这个翻译对你有帮助。

英文:

I would reshape the data to long format to check for the corresponding variable name and then reshape it back to the wide format:

  1. library(dplyr)
  2. library(data.table)
  3. df &lt;- tribble(
  4. ~x, ~y, ~`2lowest`, ~lowest,
  5. 23, 3, 23, 3,
  6. 41, 12, 41, 12,
  7. 32, 33, 33, 32,
  8. 58, 38, 58, 38
  9. )
  10. as.data.table(df) %&gt;%
  11. .[, ID := 1:.N] %&gt;%
  12. melt(id.vars = c(&quot;ID&quot;, &quot;2lowest&quot;, &quot;lowest&quot;)) %&gt;%
  13. .[, &quot;:=&quot; (`pos(lowest)` = variable[lowest == value],
  14. `pos(2lowest)` = variable[`2lowest` == value]), by = ID] %&gt;%
  15. dcast(ID + ... ~ variable, value.var = &quot;value&quot;)
  1. ID 2lowest lowest pos(lowest) pos(2lowest) x y
  2. 1: 1 23 3 y x 23 3
  3. 2: 2 41 12 y x 41 12
  4. 3: 3 33 32 x y 32 33
  5. 4: 4 58 38 y x 58 38

答案3

得分: 0

我们可以在base R中使用矢量化操作

  1. j1 <- max.col(-df, "first")
  2. i1 <- seq_len(nrow(df))
  3. m1 <- cbind(i1, j1)
  4. j2 <- max.col(-replace(df, m1, Inf), 'first')
  5. m2 <- cbind(i1, j2)
  6. df[c("lowest", "2lowest", "pos(lowest)", "pos(2lowest)")] <- list(df[m1],
  7. df[m2], names(df)[j1], names(df)[j2])

-输出

  1. > df
  2. x y z lowest 2lowest pos(lowest) pos(2lowest)
  3. 1 23 3 4 3 4 y z
  4. 2 41 12 11 11 12 z y
  5. 3 32 33 32 32 32 x z
  6. 4 58 38 58 38 58 y x

数据

  1. df <- structure(list(x = c(23L, 41L, 32L, 58L), y = c(3L, 12L, 33L,
  2. 38L), z = c(4L, 11L, 32L, 58L)), class = "data.frame", row.names = c("1",
  3. "2", "3", "4"))
英文:

We may use a vectorized operation in base R

  1. j1 &lt;- max.col(-df, &quot;first&quot;)
  2. i1 &lt;- seq_len(nrow(df))
  3. m1 &lt;- cbind(i1, j1)
  4. j2 &lt;- max.col(-replace(df, m1, Inf), &#39;first&#39;)
  5. m2 &lt;- cbind(i1, j2)
  6. df[c(&quot;lowest&quot;, &quot;2lowest&quot;, &quot;pos(lowest)&quot;, &quot;pos(2lowest)&quot;)] &lt;- list(df[m1],
  7. df[m2], names(df)[j1], names(df)[j2])

-output

  1. &gt; df
  2. x y z lowest 2lowest pos(lowest) pos(2lowest)
  3. 1 23 3 4 3 4 y z
  4. 2 41 12 11 11 12 z y
  5. 3 32 33 32 32 32 x z
  6. 4 58 38 58 38 58 y x

data

  1. df &lt;- structure(list(x = c(23L, 41L, 32L, 58L), y = c(3L, 12L, 33L,
  2. 38L), z = c(4L, 11L, 32L, 58L)), class = &quot;data.frame&quot;, row.names = c(&quot;1&quot;,
  3. &quot;2&quot;, &quot;3&quot;, &quot;4&quot;))

huangapple
  • 本文由 发表于 2023年2月24日 04:28:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75549999.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定