英文:
How do I find the indices given value in base R by object?
问题
这可能对您来说似乎有些奇怪,因为在dplyr
中可以轻松解决类似问题。但我仍然想知道如何做到这一点。
举个例子,假设我正在查看员工数据,目标是找出给定员工日期对应有多少记录。
现在的任务是找出所有那些计算出的行数超过1的员工日期对。我还没有在网上找到答案,因为“by”不是一个好的搜索词。我可以做的是类似下面的事情:
但我不确定如何得到(1, "2020-01-01")。
英文:
This might seem whimsical to you, because a similar problem is easily solved in dplyr
. But I still want to know how to do it.
To illustrate, imagine I am looking at employee data and the goal is to find how many records are there for a given employee-date pair.
# Mockup employee data
df <- data.frame(
person_id = c(1, 2, 1),
record_date = as.Date(c("2020-01-01", "2020-01-01", "2020-01-01")),
salary = c(100, 110, 109)
)
# By object counts rows for each unique employee-date pair
out <- by(
data = df,
INDICES = df[, c("win", "record_date")],
FUN = nrow
)
Now the task is to find all those employee-date pairs where the calculated number of rows is more than 1. I couldn't find answers on the web yet, "by" makes a bad search word. What I can do is something like:
out>1
# record_date
# person_id 2020-01-01
# 1 TRUE
# 2 FALSE
But I am not sure how to get (1, "2020-01-01").
答案1
得分: 2
如果您只想要具有多于一条记录的人员/ID组合,您可以执行以下操作:
subset(as.data.frame(with(df, table(person_id, record_date))), Freq > 1)
#> person_id record_date Freq
#> 1 1 2020-01-01 2
或者,如果您想要所有计数,只需删除subset
:
as.data.frame(with(df, table(person_id, record_date)))
#> person_id record_date Freq
#> 1 1 2020-01-01 2
#> 2 2 2020-01-01 1
英文:
If you only want the person / id combinations with more than one record, you can do
subset(as.data.frame(with(df, table(person_id, record_date))), Freq > 1)
#> person_id record_date Freq
#> 1 1 2020-01-01 2
Or if you want all the counts, just remove the subset
:
as.data.frame(with(df, table(person_id, record_date)))
#> person_id record_date Freq
#> 1 1 2020-01-01 2
#> 2 2 2020-01-01 1
答案2
得分: 1
您可以使用 ave
。
transform(df, flag=ave(person_id, person_id, record_date, FUN=\(x) length(x) > 1))
# person_id record_date salary flag
# 1 1 2020-01-01 100 1
# 2 2 2020-01-01 110 0
# 3 1 2020-01-01 109 1
您也可以在 subset
中使用它。
subset(df, ave(person_id, person_id, record_date, FUN=\(x) length(x) > 1) == 1)
# person_id record_date salary
# 1 1 2020-01-01 100
# 3 1 2020-01-01 109
请注意,ave
在内部使用了 by
。
英文:
You can use ave
.
transform(df, flag=ave(person_id, person_id, record_date, FUN=\(x) length(x) > 1))
# person_id record_date salary flag
# 1 1 2020-01-01 100 1
# 2 2 2020-01-01 110 0
# 3 1 2020-01-01 109 1
You can also use it in subset
.
subset(df, ave(person_id, person_id, record_date, FUN=\(x) length(x) > 1) == 1)
# person_id record_date salary
# 1 1 2020-01-01 100
# 3 1 2020-01-01 109
Note, that ave
internally uses by
.
答案3
得分: 0
在 base R
中使用 duplicated
:
subset(df, duplicated(person_id) | duplicated(person_id, fromLast = TRUE))
输出:
person_id record_date salary
1 1 2020-01-01 100
3 1 2020-01-01 109
英文:
Use duplicated
in base R
subset(df, duplicated(person_id)|duplicated(person_id, fromLast = TRUE))
-output
person_id record_date salary
1 1 2020-01-01 100
3 1 2020-01-01 109
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论