Assign value of 1 per each group in dataset.

huangapple go评论82阅读模式
英文:

Assign value of 1 per each group in dataset

问题

以下是已翻译的内容:

在下面提供的数据集中,我们可以看到ID重复三次,代表三种可能的选择,这意味着受访者必须选择其中的一种。最后一列chosen包含一个虚拟变量,每行默认值为0。

**问题:**我不明白如何随机分配1的值(表示选择了第一、第二或第三个替代方案)给每个组。例如,当ID == 1时,必须将值1随机分配给第一、第二或第三行,依此类推,直到数据的其余部分。

这是我尝试过的:

  1. for (i in seq(1, nrow(sim_data), 3)) { # 循环遍历每组三行
  2. chosen_index <- sample(i:(i+2), 1) # 在组内生成一个随机索引
  3. sim_data$Chosen[chosen_index] <- 1 # 将1分配给选定的索引
  4. }

由于我的数据有300行,循环只到298,所以它没有起作用。

数据集:

  1. attributes <- expand.grid(
  2. Company = c("Metalac", "NikolaTeslaAirport", "Jedinstvo", "Energoprojekt"),
  3. Return_rate = c(0, 0.05, 0.10, 0.15),
  4. Dividend = c(0, 1.5, 3.0, 4.5, 6),
  5. Trend = c("Trend1", "Trend2", "Trend3")
  6. )
  7. # 为100名受访者生成模拟数据
  8. set.seed(123) # 为了可重复性
  9. sim_data <- data.frame(
  10. ID = rep(1:100, each = 3), # 每个受访者三个选择
  11. alternative = rep(1:3, times = 100), # 选择编号
  12. attributes[sample(nrow(attributes), size = 100 * 3, replace = TRUE), ],
  13. chosen = 0
  14. )

Assign value of 1 per each group in dataset.

英文:

In the dataset provided below, we can see that I have ID repeat three times for three possible alternatives meaning that a respondent has to choose 1 of 3 alternatives. The last column chosen contains a dummy variable which has 0 in each row as a default value.

Problem: I do not understand how can I randomly assign a value of 1 (indicating that first, second or third alternative was chosen) for each group. For example, when ID == 1, value 1 has to be randomly assigned either to first, second or third row and so on for the rast of the data.

Assign value of 1 per each group in dataset.

Here's what I've tried:

  1. for (i in seq(1, nrow(sim_data), 3)) { # loop through each group of three rows
  2. chosen_index &lt;- sample(i:(i+2), 1) # generate a random index within the group
  3. sim_data$Chosen[chosen_index] &lt;- 1 # assign 1 to the chosen index
  4. }

Since my data has 300 rows and the cycle goes up to 298, it didn't work out.

Dataset:

  1. attributes &lt;- expand.grid(
  2. Company = c(&quot;Metalac&quot;, &quot;NikolaTeslaAirport&quot;, &quot;Jedinstvo&quot;, &quot;Energoprojekt&quot;),
  3. Return_rate = c(0, 0.05, 0.10, 0.15),
  4. Dividend = c(0, 1.5, 3.0, 4.5, 6),
  5. Trend = c(&quot;Trend1&quot;, &quot;Trend2&quot;, &quot;Trend3&quot;)
  6. )
  7. # Generate simulated data for 100 respondents
  8. set.seed(123) # for reproducibility
  9. sim_data &lt;- data.frame(
  10. ID = rep(1:100, each = 3), # three alternatives per respondent
  11. alternative = rep(1:3, times = 100), # alternative number
  12. attributes[sample(nrow(attributes), size = 100 * 3, replace = TRUE), ],
  13. chosen = 0
  14. )

答案1

得分: 0

以下是您要翻译的内容:

"I'm sure there are more elegant ways, but one dplyr solution would be to use slice_sample() to randomly sample 1 row per group, assign it a value of 1, then join the sampled data frame back with the full data frame (and then I do some clean up sorting back to the original and dropping temp variables)."

  1. sim_data %>%
  2. slice_sample(n = 1, by = ID) %>%
  3. mutate(temp = TRUE) %>%
  4. full_join(sim_data) %>%
  5. mutate(chosen = case_when(temp ~ 1, TRUE ~ chosen)) %>%
  6. arrange(ID, alternative) %>% select(-temp)

Output

  1. ID alternative Company Return_rate Dividend Trend chosen
  2. 1 1 1 Jedinstvo 0.15 6.0 Trend2 1
  3. 2 1 2 Jedinstvo 0.15 3.0 Trend3 0
  4. 3 1 3 Jedinstvo 0.00 1.5 Trend3 0
  5. 4 2 1 NikolaTeslaAirport 0.15 0.0 Trend1 0
  6. 5 2 2 Jedinstvo 0.00 3.0 Trend3 1
  7. 6 2 3 NikolaTeslaAirport 0.10 0.0 Trend3 0
  8. 7 3 1 NikolaTeslaAirport 0.00 4.5 Trend1 1
  9. 8 3 2 NikolaTeslaAirport 0.05 3.0 Trend2 0
  10. 9 3 3 Jedinstvo 0.10 3.0 Trend1 0
  11. # .....
英文:

I'm sure there are more elegant ways, but one dplyr solution would be to use slice_sample() to randomly sample 1 row per group, assign it a value of 1, then join the sampled data frame back with the full data frame (and then I do some clean up sorting back to the original and dropping temp variables).

  1. sim_data %&gt;%
  2. slice_sample(n = 1, by = ID) %&gt;%
  3. mutate(temp = TRUE) %&gt;%
  4. full_join(sim_data) %&gt;%
  5. mutate(chosen = case_when(temp ~ 1, TRUE ~ chosen)) %&gt;%
  6. arrange(ID, alternative) %&gt;% select(-temp)
  7. # or with older dplyr versions
  8. sim_data %&gt;%
  9. group_by(ID) %&gt;%
  10. slice_sample(n = 1) %&gt;%
  11. mutate(temp = TRUE) %&gt;%
  12. full_join(sim_data) %&gt;%
  13. mutate(chosen = case_when(temp ~ 1, TRUE ~ chosen)) %&gt;%
  14. arrange(ID, alternative) %&gt;% select(-temp)

Output

  1. ID alternative Company Return_rate Dividend Trend chosen
  2. 1 1 1 Jedinstvo 0.15 6.0 Trend2 1
  3. 2 1 2 Jedinstvo 0.15 3.0 Trend3 0
  4. 3 1 3 Jedinstvo 0.00 1.5 Trend3 0
  5. 4 2 1 NikolaTeslaAirport 0.15 0.0 Trend1 0
  6. 5 2 2 Jedinstvo 0.00 3.0 Trend3 1
  7. 6 2 3 NikolaTeslaAirport 0.10 0.0 Trend3 0
  8. 7 3 1 NikolaTeslaAirport 0.00 4.5 Trend1 1
  9. 8 3 2 NikolaTeslaAirport 0.05 3.0 Trend2 0
  10. 9 3 3 Jedinstvo 0.10 3.0 Trend1 0
  11. # .....

huangapple
  • 本文由 发表于 2023年5月11日 01:26:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76221128.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定