在R中查找数据框(在每一行中)特定值的列名。

huangapple go评论73阅读模式
英文:

Find the column name of a specific value on a data frame (in every row) R

问题

以下是已经翻译好的部分:

我有一个包含100列和超过10,000行的数据框。我为每行添加了两列,分别存储了最低值和次低值。现在,我想要添加两个额外的列,其中包含我找到这些值(最低值和次低值)的列的名称。

这是我的数据框的一部分:

   x   y 2lowest lowest 
1 23   3      23      3    
2 41  12      41     12   
3 32  33      33     32  
4 58  38      58     38 

我希望得到类似下面这样的结果:

   x   y 2lowest lowest pos(2lowest) pos(lowest)
1 23   3      23      3    x                y
2 41  12      41     12    x                y
3 32  33      33     32    y                x
4 58  38      58     38    x                y

你有任何解决这个问题的想法吗?

我已经搜索过,但没有找到类似的内容。

非常感谢!
英文:

I have a data frame with 100 columns and more than 10k rows. I add two columns with each row's lowest and second lowest values. Now, I want to add two extra columns where I have the name of the columns where I found these values (lowest and second lowest).
This is a part of my data frame so far.

df['2lowest']<-apply(df, 1, function(x) sort(x, decreasing = FALSE)[2])
df['lowest']<-apply(df, 1, function(x) sort(x, decreasing = FALSE)[1])

   x   y 2lowest lowest 
1 23   3      23      3    
2 41  12      41     12   
3 32  33      33     32  
4 58  38      58     38 

and I want something like this.

   x   y 2lowest lowest pos(2lowest) pos(lowest)
1 23   3      23      3    x                y
2 41  12      41     12    x                y
3 32  33      33     32    y                x
4 58  38      58     38    x                y

Do you have any idea how can I solve this?

I have searched but I could not find any similar to this.

Thank you very much!

答案1

得分: 2

我会一次性在一个单独的apply中完成所有操作。我添加了一个列到输入数据,以使其更有趣,并展示对于相等值的处理。


df = read.table(text = '   x   y z
1 23   3      4
2 41  12      11
3 32  33      32
4 58  38 58')

result = apply(df, 1, \(x) {
  sx = sort(x)[1:2]
  c(as.list(sx), as.list(names(sx))) |>
    setNames(c("lowest", "2lowest", "pos(lowest)", "pos(2lowest)"))
})

cbind(df, do.call(rbind, result))
#    x  y  z lowest 2lowest pos(lowest) pos(2lowest)
# 1 23  3  4      3       4           y            z
# 2 41 12 11     11      12           z            y
# 3 32 33 32     32      32           x            z
# 4 58 38 58     38      58           y            x
英文:

I'd do it all at once in a single apply. I added a column to the input to make it a little more interesting and illustrate behavior with ties.


df = read.table(text = '   x   y z
1 23   3      4
2 41  12      11
3 32  33      32
4 58  38 58')

result = apply(df, 1, \(x) {
  sx = sort(x)[1:2]
  c(as.list(sx), as.list(names(sx))) |>
    setNames(c("lowest", "2lowest", "pos(lowest)", "pos(2lowest)"))
})

cbind(df, do.call(rbind, result))
#    x  y  z lowest 2lowest pos(lowest) pos(2lowest)
# 1 23  3  4      3       4           y            z
# 2 41 12 11     11      12           z            y
# 3 32 33 32     32      32           x            z
# 4 58 38 58     38      58           y            x

答案2

得分: 0

我会将数据重塑成长格式,以检查相应的变量名称,然后再将其重塑回宽格式:

library(dplyr)
library(data.table)

df <- tribble(
  ~x, ~y, ~`2lowest`, ~lowest, 
  23,   3,        23,       3,    
  41,  12,        41,      12,   
  32,  33,        33,      32,  
  58,  38,        58,      38
)

as.data.table(df) %>%
  .[, ID := 1:.N] %>%
  melt(id.vars = c("ID", "2lowest", "lowest")) %>%
  .[, ":=" (`pos(lowest)` = variable[lowest == value],
            `pos(2lowest)` = variable[`2lowest` == value]), by = ID] %>%
  dcast(ID + ... ~ variable, value.var = "value")
   ID 2lowest lowest pos(lowest) pos(2lowest)  x  y
1:  1      23      3           y            x 23  3
2:  2      41     12           y            x 41 12
3:  3      33     32           x            y 32 33
4:  4      58     38           y            x 58 38

希望这个翻译对你有帮助。

英文:

I would reshape the data to long format to check for the corresponding variable name and then reshape it back to the wide format:


library(dplyr)
library(data.table)

df &lt;- tribble(
  ~x, ~y, ~`2lowest`, ~lowest, 
  23,   3,        23,       3,    
  41,  12,        41,      12,   
  32,  33,        33,      32,  
  58,  38,        58,      38
)

as.data.table(df) %&gt;%
  .[, ID := 1:.N] %&gt;%
  melt(id.vars = c(&quot;ID&quot;, &quot;2lowest&quot;, &quot;lowest&quot;)) %&gt;%
  .[, &quot;:=&quot; (`pos(lowest)` = variable[lowest == value],
            `pos(2lowest)` = variable[`2lowest` == value]), by = ID] %&gt;%
  dcast(ID + ... ~ variable, value.var = &quot;value&quot;)
   ID 2lowest lowest pos(lowest) pos(2lowest)  x  y
1:  1      23      3           y            x 23  3
2:  2      41     12           y            x 41 12
3:  3      33     32           x            y 32 33
4:  4      58     38           y            x 58 38

答案3

得分: 0

我们可以在base R中使用矢量化操作

j1 <- max.col(-df, "first")
i1 <- seq_len(nrow(df))
m1 <- cbind(i1, j1)
j2 <- max.col(-replace(df, m1, Inf), 'first')
m2 <- cbind(i1, j2)
df[c("lowest", "2lowest", "pos(lowest)", "pos(2lowest)")] <- list(df[m1], 
     df[m2], names(df)[j1], names(df)[j2])

-输出

> df
   x  y  z lowest 2lowest pos(lowest) pos(2lowest)
1 23  3  4      3       4           y            z
2 41 12 11     11      12           z            y
3 32 33 32     32      32           x            z
4 58 38 58     38      58           y            x

数据

df <- structure(list(x = c(23L, 41L, 32L, 58L), y = c(3L, 12L, 33L, 
38L), z = c(4L, 11L, 32L, 58L)), class = "data.frame", row.names = c("1", 
"2", "3", "4"))
英文:

We may use a vectorized operation in base R

j1 &lt;- max.col(-df, &quot;first&quot;)
i1 &lt;- seq_len(nrow(df))
m1 &lt;- cbind(i1, j1)
j2 &lt;- max.col(-replace(df, m1, Inf), &#39;first&#39;)
m2 &lt;- cbind(i1, j2)
 df[c(&quot;lowest&quot;, &quot;2lowest&quot;, &quot;pos(lowest)&quot;, &quot;pos(2lowest)&quot;)] &lt;- list(df[m1], 
     df[m2], names(df)[j1], names(df)[j2])

-output

&gt; df
   x  y  z lowest 2lowest pos(lowest) pos(2lowest)
1 23  3  4      3       4           y            z
2 41 12 11     11      12           z            y
3 32 33 32     32      32           x            z
4 58 38 58     38      58           y            x

data

df &lt;- structure(list(x = c(23L, 41L, 32L, 58L), y = c(3L, 12L, 33L, 
38L), z = c(4L, 11L, 32L, 58L)), class = &quot;data.frame&quot;, row.names = c(&quot;1&quot;, 
&quot;2&quot;, &quot;3&quot;, &quot;4&quot;))

huangapple
  • 本文由 发表于 2023年2月24日 04:28:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75549999.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定