过滤调查数据中的配偶,不包括受访者与配偶的群体。

huangapple go评论72阅读模式
英文:

Filtering out spouses from respondent-spouse groups in survey data

问题

以下是代码部分的翻译:

data.frame(resp = seq(1, 10), spouse = c(2, 1, 5, NA, 3, 3, NA, 10, NA, 8), outcome = seq(11, 20, 1)) -> df
df <- df[sample(1:nrow(df)), ]

这是您提供的R代码,用于创建一个包含resp、spouse和outcome列的数据框,并对其进行重新排序。

请注意,您还提到了一些问题,但您要求只翻译代码部分,因此我将不提供对问题的回答。如果您需要问题的回答,请告诉我,我会尽力为您提供帮助。

英文:

here is a small dataframe that is a simplification of what I am working with:

data.frame(resp = seq(1, 10), spouse = c(2, 1, 5, NA, 3, 3, NA, 10, NA, 8), outcome = seq(11, 20, 1)) -&gt; df
df &lt;- df[sample(1:nrow(df)), ]

Each respondent is identified by a unique identifier in the resp column. However, some respondents are spouses of other respondents, and so I must remove them to prevent reverse causality later on.
This would be what I want to get in the end (not in o
ideal_df <- df %>% filter(resp %in% c(1, 3, 4, 7, 8, 9))
I cannot just use filter because that would remove all spouses in the group and leave me with the singles and that third wheel. Like so:

df %&gt;% filter(!(resp %in% spouse))

I can't group by spouse and respondent groups either since spouses and respondents have different identifiers.
For the full dataset, I am using the RAND HRS dataset distributed by University of Michigan.
I could not just upload the entire set since I have no idea where the spouses are located, and so all I can do is subset by row and that might just leave all the spouses out.
Please let me know what I can do to either improve this question/where I can go/what I should do. Thank you very much for your help.

答案1

得分: 1

如果你只想保留每对中的一个,不管是哪一个,只需在筛选条件中添加一个额外条件即可:

df %>% filter(!((resp %in% spouse) & (spouse > resp)))

输出将包括所有单身以及每对夫妇中的一个个体:

# 输出
  resp spouse outcome
1    9     NA      19
2    6      3      16
3    1      2      11
4    8     10      18
5    4     NA      14
6    3      5      13
7    7     NA      17
英文:

If you just want to keep one of each pair and it does not matter which one you can simply keep one of them by adding an additional condition to your filter:

df %&gt;% filter(!((resp %in% spouse) &amp; (spouse &gt; resp)))

Output would be all the singles plus one individual from each of the couples:

# OUTPUT
  resp spouse outcome
1    9     NA      19
2    6      3      16
3    1      2      11
4    8     10      18
5    4     NA      14
6    3      5      13
7    7     NA      17

答案2

得分: 0

你可以使用 "dplyr" 中的 distinct 函数来添加一些 id 并保留不同的 id。

使用下面的 ifelse 语句,你会得到:

  • 如果 rep id 是第一个,resp 和 spouse 具有相同的 couple id

  • 如果 spouse 为空,那么 id_single

resp spouse couple
1 2 2_1
2 1 2_1
3 NA 3_single
英文:

you can add a couple id and keep the distinct ones with dplyr distinct.

with the ifelse statement below you have:

  • The same couple id for resp and spouse whenever is the rep id is first

  • and id_single if spouse is missing.

resp spouse couple
1 2 2_1
2 1 2_1
3 NA 3_single
library(dplyr)

df %&gt;%
  mutate(couple=ifelse(resp&gt;spouse,
                       paste0(resp,&quot;_&quot;,spouse),
                       paste0(spouse,&quot;_&quot;,resp)))%&gt;%
  mutate(couple=ifelse(is.na(couple),paste0(resp,&quot;_single&quot;),couple))%&gt;%
  distinct(couple,.keep_all = TRUE)
#&gt;   resp spouse outcome   couple
#&gt; 1    1      2      11      2_1
#&gt; 2    3      5      13      5_3
#&gt; 3    4     NA      14 4_single
#&gt; 4    6      3      16      6_3
#&gt; 5    7     NA      17 7_single
#&gt; 6    8     10      18     10_8
#&gt; 7    9     NA      19 9_single

<sup>Created on 2023-04-13 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年4月13日 19:40:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76004992.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定