你可以使用以下方法在基本R中找到对象中给定值的索引:

huangapple go评论106阅读模式
英文:

How do I find the indices given value in base R by object?

问题

这可能对您来说似乎有些奇怪,因为在dplyr中可以轻松解决类似问题。但我仍然想知道如何做到这一点。

举个例子,假设我正在查看员工数据,目标是找出给定员工日期对应有多少记录。

现在的任务是找出所有那些计算出的行数超过1的员工日期对。我还没有在网上找到答案,因为“by”不是一个好的搜索词。我可以做的是类似下面的事情:

但我不确定如何得到(1, "2020-01-01")。

英文:

This might seem whimsical to you, because a similar problem is easily solved in dplyr. But I still want to know how to do it.
To illustrate, imagine I am looking at employee data and the goal is to find how many records are there for a given employee-date pair.

  1. # Mockup employee data
  2. df <- data.frame(
  3. person_id = c(1, 2, 1),
  4. record_date = as.Date(c("2020-01-01", "2020-01-01", "2020-01-01")),
  5. salary = c(100, 110, 109)
  6. )
  7. # By object counts rows for each unique employee-date pair
  8. out <- by(
  9. data = df,
  10. INDICES = df[, c("win", "record_date")],
  11. FUN = nrow
  12. )

Now the task is to find all those employee-date pairs where the calculated number of rows is more than 1. I couldn't find answers on the web yet, "by" makes a bad search word. What I can do is something like:

  1. out>1
  2. # record_date
  3. # person_id 2020-01-01
  4. # 1 TRUE
  5. # 2 FALSE

But I am not sure how to get (1, "2020-01-01").

答案1

得分: 2

如果您只想要具有多于一条记录的人员/ID组合,您可以执行以下操作:

  1. subset(as.data.frame(with(df, table(person_id, record_date))), Freq > 1)
  2. #> person_id record_date Freq
  3. #> 1 1 2020-01-01 2

或者,如果您想要所有计数,只需删除subset

  1. as.data.frame(with(df, table(person_id, record_date)))
  2. #> person_id record_date Freq
  3. #> 1 1 2020-01-01 2
  4. #> 2 2 2020-01-01 1
英文:

If you only want the person / id combinations with more than one record, you can do

  1. subset(as.data.frame(with(df, table(person_id, record_date))), Freq > 1)
  2. #> person_id record_date Freq
  3. #> 1 1 2020-01-01 2

Or if you want all the counts, just remove the subset:

  1. as.data.frame(with(df, table(person_id, record_date)))
  2. #> person_id record_date Freq
  3. #> 1 1 2020-01-01 2
  4. #> 2 2 2020-01-01 1

答案2

得分: 1

您可以使用 ave

  1. transform(df, flag=ave(person_id, person_id, record_date, FUN=\(x) length(x) > 1))
  2. # person_id record_date salary flag
  3. # 1 1 2020-01-01 100 1
  4. # 2 2 2020-01-01 110 0
  5. # 3 1 2020-01-01 109 1

您也可以在 subset 中使用它。

  1. subset(df, ave(person_id, person_id, record_date, FUN=\(x) length(x) > 1) == 1)
  2. # person_id record_date salary
  3. # 1 1 2020-01-01 100
  4. # 3 1 2020-01-01 109

请注意,ave 在内部使用了 by

英文:

You can use ave.

  1. transform(df, flag=ave(person_id, person_id, record_date, FUN=\(x) length(x) > 1))
  2. # person_id record_date salary flag
  3. # 1 1 2020-01-01 100 1
  4. # 2 2 2020-01-01 110 0
  5. # 3 1 2020-01-01 109 1

You can also use it in subset.

  1. subset(df, ave(person_id, person_id, record_date, FUN=\(x) length(x) > 1) == 1)
  2. # person_id record_date salary
  3. # 1 1 2020-01-01 100
  4. # 3 1 2020-01-01 109

Note, that ave internally uses by.

答案3

得分: 0

base R 中使用 duplicated

  1. subset(df, duplicated(person_id) | duplicated(person_id, fromLast = TRUE))

输出:

  1. person_id record_date salary
  2. 1 1 2020-01-01 100
  3. 3 1 2020-01-01 109
英文:

Use duplicated in base R

  1. subset(df, duplicated(person_id)|duplicated(person_id, fromLast = TRUE))

-output

  1. person_id record_date salary
  2. 1 1 2020-01-01 100
  3. 3 1 2020-01-01 109

huangapple
  • 本文由 发表于 2023年3月7日 22:17:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75663152.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定