2023年3月7日 22:17:24go评论106阅读模式

英文:

How do I find the indices given value in base R by object?

问题

这可能对您来说似乎有些奇怪，因为在dplyr中可以轻松解决类似问题。但我仍然想知道如何做到这一点。

举个例子，假设我正在查看员工数据，目标是找出给定员工日期对应有多少记录。

现在的任务是找出所有那些计算出的行数超过1的员工日期对。我还没有在网上找到答案，因为“by”不是一个好的搜索词。我可以做的是类似下面的事情：

但我不确定如何得到(1, "2020-01-01")。

英文:

This might seem whimsical to you, because a similar problem is easily solved in dplyr. But I still want to know how to do it.
To illustrate, imagine I am looking at employee data and the goal is to find how many records are there for a given employee-date pair.

# Mockup employee data
df &lt;- data.frame(
  person_id = c(1, 2, 1),
  record_date = as.Date(c(&quot;2020-01-01&quot;, &quot;2020-01-01&quot;, &quot;2020-01-01&quot;)),
  salary = c(100, 110, 109)
)
# By object counts rows for each unique employee-date pair
out &lt;- by(
    data = df,
    INDICES = df[, c(&quot;win&quot;, &quot;record_date&quot;)],
    FUN = nrow
)

Now the task is to find all those employee-date pairs where the calculated number of rows is more than 1. I couldn't find answers on the web yet, "by" makes a bad search word. What I can do is something like:

out&gt;1
#          record_date
# person_id 2020-01-01
#         1       TRUE
#         2      FALSE

But I am not sure how to get (1, "2020-01-01").

答案1

得分: 2

如果您只想要具有多于一条记录的人员/ID组合，您可以执行以下操作：

subset(as.data.frame(with(df, table(person_id, record_date))), Freq > 1)
#>   person_id record_date Freq
#> 1         1  2020-01-01    2

或者，如果您想要所有计数，只需删除subset：

as.data.frame(with(df, table(person_id, record_date)))
#>   person_id record_date Freq
#> 1         1  2020-01-01    2
#> 2         2  2020-01-01    1

英文:

If you only want the person / id combinations with more than one record, you can do

subset(as.data.frame(with(df, table(person_id, record_date))), Freq &gt; 1)
#&gt;   person_id record_date Freq
#&gt; 1         1  2020-01-01    2

Or if you want all the counts, just remove the subset:

as.data.frame(with(df, table(person_id, record_date)))
#&gt;   person_id record_date Freq
#&gt; 1         1  2020-01-01    2
#&gt; 2         2  2020-01-01    1

答案2

得分: 1

您可以使用 ave。

transform(df, flag=ave(person_id, person_id, record_date, FUN=\(x) length(x) &gt; 1))
#   person_id record_date salary flag
# 1         1  2020-01-01    100    1
# 2         2  2020-01-01    110    0
# 3         1  2020-01-01    109    1

您也可以在 subset 中使用它。

subset(df, ave(person_id, person_id, record_date, FUN=\(x) length(x) &gt; 1) == 1)
#   person_id record_date salary
# 1         1  2020-01-01    100
# 3         1  2020-01-01    109

请注意，ave 在内部使用了 by。

英文:

You can use ave.

transform(df, flag=ave(person_id, person_id, record_date, FUN=\(x) length(x) &gt; 1))
#   person_id record_date salary flag
# 1         1  2020-01-01    100    1
# 2         2  2020-01-01    110    0
# 3         1  2020-01-01    109    1

You can also use it in subset.

subset(df, ave(person_id, person_id, record_date, FUN=\(x) length(x) &gt; 1) == 1)
#   person_id record_date salary
# 1         1  2020-01-01    100
# 3         1  2020-01-01    109

Note, that ave internally uses by.

答案3

得分: 0

在 base R 中使用 duplicated：

subset(df, duplicated(person_id) | duplicated(person_id, fromLast = TRUE))

输出：

  person_id record_date salary
1         1  2020-01-01    100
3         1  2020-01-01    109

英文:

Use duplicated in base R

subset(df, duplicated(person_id)|duplicated(person_id, fromLast = TRUE))

-output

  person_id record_date salary
1         1  2020-01-01    100
3         1  2020-01-01    109

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

你可以使用以下方法在基本R中找到对象中给定值的索引：

问题

答案1

答案2

答案3

在R中变化线条粗细

Reading a File in Chunks from a Website

如何在使用 left_join() 合并数据时保留标签？

在R中有条件地替换匹配值列表的列数值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。