2023年4月13日 19:40:19go评论92阅读模式

英文:

Filtering out spouses from respondent-spouse groups in survey data

问题

以下是代码部分的翻译：

data.frame(resp = seq(1, 10), spouse = c(2, 1, 5, NA, 3, 3, NA, 10, NA, 8), outcome = seq(11, 20, 1)) -> df
df <- df[sample(1:nrow(df)), ]

这是您提供的R代码，用于创建一个包含resp、spouse和outcome列的数据框，并对其进行重新排序。

请注意，您还提到了一些问题，但您要求只翻译代码部分，因此我将不提供对问题的回答。如果您需要问题的回答，请告诉我，我会尽力为您提供帮助。

英文:

here is a small dataframe that is a simplification of what I am working with:

data.frame(resp = seq(1, 10), spouse = c(2, 1, 5, NA, 3, 3, NA, 10, NA, 8), outcome = seq(11, 20, 1)) -&gt; df
df &lt;- df[sample(1:nrow(df)), ]

Each respondent is identified by a unique identifier in the resp column. However, some respondents are spouses of other respondents, and so I must remove them to prevent reverse causality later on.
This would be what I want to get in the end (not in o
ideal_df <- df %>% filter(resp %in% c(1, 3, 4, 7, 8, 9))
I cannot just use filter because that would remove all spouses in the group and leave me with the singles and that third wheel. Like so:

df %&gt;% filter(!(resp %in% spouse))

I can't group by spouse and respondent groups either since spouses and respondents have different identifiers.
For the full dataset, I am using the RAND HRS dataset distributed by University of Michigan.
I could not just upload the entire set since I have no idea where the spouses are located, and so all I can do is subset by row and that might just leave all the spouses out.
Please let me know what I can do to either improve this question/where I can go/what I should do. Thank you very much for your help.

答案1

得分: 1

如果你只想保留每对中的一个，不管是哪一个，只需在筛选条件中添加一个额外条件即可：

df %>% filter(!((resp %in% spouse) & (spouse > resp)))

输出将包括所有单身以及每对夫妇中的一个个体：

# 输出
  resp spouse outcome
1    9     NA      19
2    6      3      16
3    1      2      11
4    8     10      18
5    4     NA      14
6    3      5      13
7    7     NA      17

英文:

If you just want to keep one of each pair and it does not matter which one you can simply keep one of them by adding an additional condition to your filter:

df %&gt;% filter(!((resp %in% spouse) &amp; (spouse &gt; resp)))

Output would be all the singles plus one individual from each of the couples:

# OUTPUT
  resp spouse outcome
1    9     NA      19
2    6      3      16
3    1      2      11
4    8     10      18
5    4     NA      14
6    3      5      13
7    7     NA      17

答案2

得分: 0

你可以使用 "dplyr" 中的 distinct 函数来添加一些 id 并保留不同的 id。

使用下面的 ifelse 语句，你会得到：

如果 rep id 是第一个，resp 和 spouse 具有相同的 couple id
如果 spouse 为空，那么 id_single

resp	spouse	couple
1	2	2_1
2	1	2_1
3	NA	3_single

英文:

you can add a couple id and keep the distinct ones with dplyr distinct.

with the ifelse statement below you have:

The same couple id for resp and spouse whenever is the rep id is first
and id_single if spouse is missing.

resp	spouse	couple
1	2	2_1
2	1	2_1
3	NA	3_single

library(dplyr)
df %&gt;%
  mutate(couple=ifelse(resp&gt;spouse,
                       paste0(resp,&quot;_&quot;,spouse),
                       paste0(spouse,&quot;_&quot;,resp)))%&gt;%
  mutate(couple=ifelse(is.na(couple),paste0(resp,&quot;_single&quot;),couple))%&gt;%
  distinct(couple,.keep_all = TRUE)
#&gt;   resp spouse outcome   couple
#&gt; 1    1      2      11      2_1
#&gt; 2    3      5      13      5_3
#&gt; 3    4     NA      14 4_single
#&gt; 4    6      3      16      6_3
#&gt; 5    7     NA      17 7_single
#&gt; 6    8     10      18     10_8
#&gt; 7    9     NA      19 9_single

<sup>Created on 2023-04-13 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

过滤调查数据中的配偶，不包括受访者与配偶的群体。

问题

答案1

答案2

如何在`tapply`中保留变量的类别？

如何在R数据框中查找列中特定值的百分比

R: 根据多个模式对多列进行数据透视

如何使用tableone按行更改表格百分比？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。