2023年2月24日 04:28:55go评论97阅读模式

英文:

Find the column name of a specific value on a data frame (in every row) R

问题

以下是已经翻译好的部分：

我有一个包含100列和超过10,000行的数据框。我为每行添加了两列，分别存储了最低值和次低值。现在，我想要添加两个额外的列，其中包含我找到这些值（最低值和次低值）的列的名称。
这是我的数据框的一部分：
   x   y 2lowest lowest 
1 23   3      23      3    
2 41  12      41     12   
3 32  33      33     32  
4 58  38      58     38 
我希望得到类似下面这样的结果：
   x   y 2lowest lowest pos(2lowest) pos(lowest)
1 23   3      23      3    x                y
2 41  12      41     12    x                y
3 32  33      33     32    y                x
4 58  38      58     38    x                y
你有任何解决这个问题的想法吗？
我已经搜索过，但没有找到类似的内容。
非常感谢！

英文:

I have a data frame with 100 columns and more than 10k rows. I add two columns with each row's lowest and second lowest values. Now, I want to add two extra columns where I have the name of the columns where I found these values (lowest and second lowest).
This is a part of my data frame so far.

df[&#39;2lowest&#39;]&lt;-apply(df, 1, function(x) sort(x, decreasing = FALSE)[2])
df[&#39;lowest&#39;]&lt;-apply(df, 1, function(x) sort(x, decreasing = FALSE)[1])
   x   y 2lowest lowest 
1 23   3      23      3    
2 41  12      41     12   
3 32  33      33     32  
4 58  38      58     38

and I want something like this.

   x   y 2lowest lowest pos(2lowest) pos(lowest)
1 23   3      23      3    x                y
2 41  12      41     12    x                y
3 32  33      33     32    y                x
4 58  38      58     38    x                y

Do you have any idea how can I solve this?

I have searched but I could not find any similar to this.

Thank you very much!

答案1

得分: 2

我会一次性在一个单独的apply中完成所有操作。我添加了一个列到输入数据，以使其更有趣，并展示对于相等值的处理。


df = read.table(text = '   x   y z
1 23   3      4
2 41  12      11
3 32  33      32
4 58  38 58')
result = apply(df, 1, \(x) {
  sx = sort(x)[1:2]
  c(as.list(sx), as.list(names(sx))) |&gt;
    setNames(c("lowest", "2lowest", "pos(lowest)", "pos(2lowest)"))
})
cbind(df, do.call(rbind, result))
#    x  y  z lowest 2lowest pos(lowest) pos(2lowest)
# 1 23  3  4      3       4           y            z
# 2 41 12 11     11      12           z            y
# 3 32 33 32     32      32           x            z
# 4 58 38 58     38      58           y            x

英文:

I'd do it all at once in a single apply. I added a column to the input to make it a little more interesting and illustrate behavior with ties.


df = read.table(text = &#39;   x   y z
1 23   3      4
2 41  12      11
3 32  33      32
4 58  38 58&#39;)
result = apply(df, 1, \(x) {
  sx = sort(x)[1:2]
  c(as.list(sx), as.list(names(sx))) |&gt;
    setNames(c(&quot;lowest&quot;, &quot;2lowest&quot;, &quot;pos(lowest)&quot;, &quot;pos(2lowest)&quot;))
})
cbind(df, do.call(rbind, result))
#    x  y  z lowest 2lowest pos(lowest) pos(2lowest)
# 1 23  3  4      3       4           y            z
# 2 41 12 11     11      12           z            y
# 3 32 33 32     32      32           x            z
# 4 58 38 58     38      58           y            x

答案2

得分: 0

我会将数据重塑成长格式，以检查相应的变量名称，然后再将其重塑回宽格式：

library(dplyr)
library(data.table)
df <- tribble(
  ~x, ~y, ~`2lowest`, ~lowest, 
  23,   3,        23,       3,    
  41,  12,        41,      12,   
  32,  33,        33,      32,  
  58,  38,        58,      38
)
as.data.table(df) %>%
  .[, ID := 1:.N] %>%
  melt(id.vars = c("ID", "2lowest", "lowest")) %>%
  .[, ":=" (`pos(lowest)` = variable[lowest == value],
            `pos(2lowest)` = variable[`2lowest` == value]), by = ID] %>%
  dcast(ID + ... ~ variable, value.var = "value")

   ID 2lowest lowest pos(lowest) pos(2lowest)  x  y
1:  1      23      3           y            x 23  3
2:  2      41     12           y            x 41 12
3:  3      33     32           x            y 32 33
4:  4      58     38           y            x 58 38

希望这个翻译对你有帮助。

英文:

I would reshape the data to long format to check for the corresponding variable name and then reshape it back to the wide format:


library(dplyr)
library(data.table)
df &lt;- tribble(
  ~x, ~y, ~`2lowest`, ~lowest, 
  23,   3,        23,       3,    
  41,  12,        41,      12,   
  32,  33,        33,      32,  
  58,  38,        58,      38
)
as.data.table(df) %&gt;%
  .[, ID := 1:.N] %&gt;%
  melt(id.vars = c(&quot;ID&quot;, &quot;2lowest&quot;, &quot;lowest&quot;)) %&gt;%
  .[, &quot;:=&quot; (`pos(lowest)` = variable[lowest == value],
            `pos(2lowest)` = variable[`2lowest` == value]), by = ID] %&gt;%
  dcast(ID + ... ~ variable, value.var = &quot;value&quot;)

   ID 2lowest lowest pos(lowest) pos(2lowest)  x  y
1:  1      23      3           y            x 23  3
2:  2      41     12           y            x 41 12
3:  3      33     32           x            y 32 33
4:  4      58     38           y            x 58 38

答案3

得分: 0

我们可以在base R中使用矢量化操作

j1 <- max.col(-df, "first")
i1 <- seq_len(nrow(df))
m1 <- cbind(i1, j1)
j2 <- max.col(-replace(df, m1, Inf), 'first')
m2 <- cbind(i1, j2)
df[c("lowest", "2lowest", "pos(lowest)", "pos(2lowest)")] <- list(df[m1], 
     df[m2], names(df)[j1], names(df)[j2])

-输出

> df
   x  y  z lowest 2lowest pos(lowest) pos(2lowest)
1 23  3  4      3       4           y            z
2 41 12 11     11      12           z            y
3 32 33 32     32      32           x            z
4 58 38 58     38      58           y            x

数据

df <- structure(list(x = c(23L, 41L, 32L, 58L), y = c(3L, 12L, 33L, 
38L), z = c(4L, 11L, 32L, 58L)), class = "data.frame", row.names = c("1", 
"2", "3", "4"))

英文:

We may use a vectorized operation in base R

j1 &lt;- max.col(-df, &quot;first&quot;)
i1 &lt;- seq_len(nrow(df))
m1 &lt;- cbind(i1, j1)
j2 &lt;- max.col(-replace(df, m1, Inf), &#39;first&#39;)
m2 &lt;- cbind(i1, j2)
 df[c(&quot;lowest&quot;, &quot;2lowest&quot;, &quot;pos(lowest)&quot;, &quot;pos(2lowest)&quot;)] &lt;- list(df[m1], 
     df[m2], names(df)[j1], names(df)[j2])

-output

&gt; df
   x  y  z lowest 2lowest pos(lowest) pos(2lowest)
1 23  3  4      3       4           y            z
2 41 12 11     11      12           z            y
3 32 33 32     32      32           x            z
4 58 38 58     38      58           y            x

data

df &lt;- structure(list(x = c(23L, 41L, 32L, 58L), y = c(3L, 12L, 33L, 
38L), z = c(4L, 11L, 32L, 58L)), class = &quot;data.frame&quot;, row.names = c(&quot;1&quot;, 
&quot;2&quot;, &quot;3&quot;, &quot;4&quot;))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中查找数据框（在每一行中）特定值的列名。

问题

答案1

答案2

答案3

数据

data

计算在Pandas中特定列满足特定条件之前的天数

撤销在pandas数据框中使用字典进行替换。

在R中从单个数据框中运行多年线性回归并将系数存储到新数据框中。

Quanteda语料库在在WSL2上运行时出错 – 摘要函数

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。