英文:
Find the column name of a specific value on a data frame (in every row) R
问题
以下是已经翻译好的部分:
我有一个包含100列和超过10,000行的数据框。我为每行添加了两列,分别存储了最低值和次低值。现在,我想要添加两个额外的列,其中包含我找到这些值(最低值和次低值)的列的名称。
这是我的数据框的一部分:
x y 2lowest lowest
1 23 3 23 3
2 41 12 41 12
3 32 33 33 32
4 58 38 58 38
我希望得到类似下面这样的结果:
x y 2lowest lowest pos(2lowest) pos(lowest)
1 23 3 23 3 x y
2 41 12 41 12 x y
3 32 33 33 32 y x
4 58 38 58 38 x y
你有任何解决这个问题的想法吗?
我已经搜索过,但没有找到类似的内容。
非常感谢!
英文:
I have a data frame with 100 columns and more than 10k rows. I add two columns with each row's lowest and second lowest values. Now, I want to add two extra columns where I have the name of the columns where I found these values (lowest and second lowest).
This is a part of my data frame so far.
df['2lowest']<-apply(df, 1, function(x) sort(x, decreasing = FALSE)[2])
df['lowest']<-apply(df, 1, function(x) sort(x, decreasing = FALSE)[1])
x y 2lowest lowest
1 23 3 23 3
2 41 12 41 12
3 32 33 33 32
4 58 38 58 38
and I want something like this.
x y 2lowest lowest pos(2lowest) pos(lowest)
1 23 3 23 3 x y
2 41 12 41 12 x y
3 32 33 33 32 y x
4 58 38 58 38 x y
Do you have any idea how can I solve this?
I have searched but I could not find any similar to this.
Thank you very much!
答案1
得分: 2
我会一次性在一个单独的apply
中完成所有操作。我添加了一个列到输入数据,以使其更有趣,并展示对于相等值的处理。
df = read.table(text = ' x y z
1 23 3 4
2 41 12 11
3 32 33 32
4 58 38 58')
result = apply(df, 1, \(x) {
sx = sort(x)[1:2]
c(as.list(sx), as.list(names(sx))) |>
setNames(c("lowest", "2lowest", "pos(lowest)", "pos(2lowest)"))
})
cbind(df, do.call(rbind, result))
# x y z lowest 2lowest pos(lowest) pos(2lowest)
# 1 23 3 4 3 4 y z
# 2 41 12 11 11 12 z y
# 3 32 33 32 32 32 x z
# 4 58 38 58 38 58 y x
英文:
I'd do it all at once in a single apply
. I added a column to the input to make it a little more interesting and illustrate behavior with ties.
df = read.table(text = ' x y z
1 23 3 4
2 41 12 11
3 32 33 32
4 58 38 58')
result = apply(df, 1, \(x) {
sx = sort(x)[1:2]
c(as.list(sx), as.list(names(sx))) |>
setNames(c("lowest", "2lowest", "pos(lowest)", "pos(2lowest)"))
})
cbind(df, do.call(rbind, result))
# x y z lowest 2lowest pos(lowest) pos(2lowest)
# 1 23 3 4 3 4 y z
# 2 41 12 11 11 12 z y
# 3 32 33 32 32 32 x z
# 4 58 38 58 38 58 y x
答案2
得分: 0
我会将数据重塑成长格式,以检查相应的变量名称,然后再将其重塑回宽格式:
library(dplyr)
library(data.table)
df <- tribble(
~x, ~y, ~`2lowest`, ~lowest,
23, 3, 23, 3,
41, 12, 41, 12,
32, 33, 33, 32,
58, 38, 58, 38
)
as.data.table(df) %>%
.[, ID := 1:.N] %>%
melt(id.vars = c("ID", "2lowest", "lowest")) %>%
.[, ":=" (`pos(lowest)` = variable[lowest == value],
`pos(2lowest)` = variable[`2lowest` == value]), by = ID] %>%
dcast(ID + ... ~ variable, value.var = "value")
ID 2lowest lowest pos(lowest) pos(2lowest) x y
1: 1 23 3 y x 23 3
2: 2 41 12 y x 41 12
3: 3 33 32 x y 32 33
4: 4 58 38 y x 58 38
希望这个翻译对你有帮助。
英文:
I would reshape the data to long format to check for the corresponding variable name and then reshape it back to the wide format:
library(dplyr)
library(data.table)
df <- tribble(
~x, ~y, ~`2lowest`, ~lowest,
23, 3, 23, 3,
41, 12, 41, 12,
32, 33, 33, 32,
58, 38, 58, 38
)
as.data.table(df) %>%
.[, ID := 1:.N] %>%
melt(id.vars = c("ID", "2lowest", "lowest")) %>%
.[, ":=" (`pos(lowest)` = variable[lowest == value],
`pos(2lowest)` = variable[`2lowest` == value]), by = ID] %>%
dcast(ID + ... ~ variable, value.var = "value")
ID 2lowest lowest pos(lowest) pos(2lowest) x y
1: 1 23 3 y x 23 3
2: 2 41 12 y x 41 12
3: 3 33 32 x y 32 33
4: 4 58 38 y x 58 38
答案3
得分: 0
我们可以在base R
中使用矢量化操作
j1 <- max.col(-df, "first")
i1 <- seq_len(nrow(df))
m1 <- cbind(i1, j1)
j2 <- max.col(-replace(df, m1, Inf), 'first')
m2 <- cbind(i1, j2)
df[c("lowest", "2lowest", "pos(lowest)", "pos(2lowest)")] <- list(df[m1],
df[m2], names(df)[j1], names(df)[j2])
-输出
> df
x y z lowest 2lowest pos(lowest) pos(2lowest)
1 23 3 4 3 4 y z
2 41 12 11 11 12 z y
3 32 33 32 32 32 x z
4 58 38 58 38 58 y x
数据
df <- structure(list(x = c(23L, 41L, 32L, 58L), y = c(3L, 12L, 33L,
38L), z = c(4L, 11L, 32L, 58L)), class = "data.frame", row.names = c("1",
"2", "3", "4"))
英文:
We may use a vectorized operation in base R
j1 <- max.col(-df, "first")
i1 <- seq_len(nrow(df))
m1 <- cbind(i1, j1)
j2 <- max.col(-replace(df, m1, Inf), 'first')
m2 <- cbind(i1, j2)
df[c("lowest", "2lowest", "pos(lowest)", "pos(2lowest)")] <- list(df[m1],
df[m2], names(df)[j1], names(df)[j2])
-output
> df
x y z lowest 2lowest pos(lowest) pos(2lowest)
1 23 3 4 3 4 y z
2 41 12 11 11 12 z y
3 32 33 32 32 32 x z
4 58 38 58 38 58 y x
data
df <- structure(list(x = c(23L, 41L, 32L, 58L), y = c(3L, 12L, 33L,
38L), z = c(4L, 11L, 32L, 58L)), class = "data.frame", row.names = c("1",
"2", "3", "4"))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论