Assign value of 1 per each group in dataset.

huangapple go评论64阅读模式
英文:

Assign value of 1 per each group in dataset

问题

以下是已翻译的内容:

在下面提供的数据集中,我们可以看到ID重复三次,代表三种可能的选择,这意味着受访者必须选择其中的一种。最后一列chosen包含一个虚拟变量,每行默认值为0。

**问题:**我不明白如何随机分配1的值(表示选择了第一、第二或第三个替代方案)给每个组。例如,当ID == 1时,必须将值1随机分配给第一、第二或第三行,依此类推,直到数据的其余部分。

这是我尝试过的:

for (i in seq(1, nrow(sim_data), 3)) {  # 循环遍历每组三行
  chosen_index <- sample(i:(i+2), 1)  # 在组内生成一个随机索引
  sim_data$Chosen[chosen_index] <- 1  # 将1分配给选定的索引
}

由于我的数据有300行,循环只到298,所以它没有起作用。

数据集:

attributes <- expand.grid(
  Company = c("Metalac", "NikolaTeslaAirport", "Jedinstvo", "Energoprojekt"),
  Return_rate = c(0, 0.05, 0.10, 0.15),
  Dividend = c(0, 1.5, 3.0, 4.5, 6),
  Trend = c("Trend1", "Trend2", "Trend3")
)

# 为100名受访者生成模拟数据
set.seed(123) # 为了可重复性
sim_data <- data.frame(
  ID = rep(1:100, each = 3), # 每个受访者三个选择
  alternative = rep(1:3, times = 100), # 选择编号
  attributes[sample(nrow(attributes), size = 100 * 3, replace = TRUE), ],
  chosen = 0
)

Assign value of 1 per each group in dataset.

英文:

In the dataset provided below, we can see that I have ID repeat three times for three possible alternatives meaning that a respondent has to choose 1 of 3 alternatives. The last column chosen contains a dummy variable which has 0 in each row as a default value.

Problem: I do not understand how can I randomly assign a value of 1 (indicating that first, second or third alternative was chosen) for each group. For example, when ID == 1, value 1 has to be randomly assigned either to first, second or third row and so on for the rast of the data.

Assign value of 1 per each group in dataset.

Here's what I've tried:

for (i in seq(1, nrow(sim_data), 3)) {  # loop through each group of three rows
  chosen_index &lt;- sample(i:(i+2), 1)  # generate a random index within the group
  sim_data$Chosen[chosen_index] &lt;- 1  # assign 1 to the chosen index
}

Since my data has 300 rows and the cycle goes up to 298, it didn't work out.

Dataset:

attributes &lt;- expand.grid(
  Company = c(&quot;Metalac&quot;, &quot;NikolaTeslaAirport&quot;, &quot;Jedinstvo&quot;, &quot;Energoprojekt&quot;),
  Return_rate = c(0, 0.05, 0.10, 0.15),
  Dividend = c(0, 1.5, 3.0, 4.5, 6),
  Trend = c(&quot;Trend1&quot;, &quot;Trend2&quot;, &quot;Trend3&quot;)
)

# Generate simulated data for 100 respondents
set.seed(123) # for reproducibility
sim_data &lt;- data.frame(
  ID = rep(1:100, each = 3), # three alternatives per respondent
  alternative = rep(1:3, times = 100), # alternative number
  attributes[sample(nrow(attributes), size = 100 * 3, replace = TRUE), ],
  chosen = 0
)

答案1

得分: 0

以下是您要翻译的内容:

"I'm sure there are more elegant ways, but one dplyr solution would be to use slice_sample() to randomly sample 1 row per group, assign it a value of 1, then join the sampled data frame back with the full data frame (and then I do some clean up sorting back to the original and dropping temp variables)."

sim_data %>%
  slice_sample(n = 1, by = ID) %>%
  mutate(temp = TRUE) %>%
  full_join(sim_data) %>%
  mutate(chosen = case_when(temp ~ 1, TRUE ~ chosen)) %>%
  arrange(ID, alternative) %>% select(-temp)

Output

  ID alternative            Company Return_rate Dividend  Trend chosen
1  1           1          Jedinstvo        0.15      6.0 Trend2      1
2  1           2          Jedinstvo        0.15      3.0 Trend3      0
3  1           3          Jedinstvo        0.00      1.5 Trend3      0
4  2           1 NikolaTeslaAirport        0.15      0.0 Trend1      0
5  2           2          Jedinstvo        0.00      3.0 Trend3      1
6  2           3 NikolaTeslaAirport        0.10      0.0 Trend3      0
7  3           1 NikolaTeslaAirport        0.00      4.5 Trend1      1
8  3           2 NikolaTeslaAirport        0.05      3.0 Trend2      0
9  3           3          Jedinstvo        0.10      3.0 Trend1      0
# .....
英文:

I'm sure there are more elegant ways, but one dplyr solution would be to use slice_sample() to randomly sample 1 row per group, assign it a value of 1, then join the sampled data frame back with the full data frame (and then I do some clean up sorting back to the original and dropping temp variables).

sim_data %&gt;% 
  slice_sample(n = 1, by = ID) %&gt;%
  mutate(temp = TRUE) %&gt;%
  full_join(sim_data) %&gt;% 
  mutate(chosen = case_when(temp ~ 1, TRUE ~ chosen)) %&gt;%
  arrange(ID, alternative) %&gt;% select(-temp)

# or with older dplyr versions

sim_data %&gt;% 
  group_by(ID) %&gt;%
  slice_sample(n = 1) %&gt;%
  mutate(temp = TRUE) %&gt;%
  full_join(sim_data) %&gt;% 
  mutate(chosen = case_when(temp ~ 1, TRUE ~ chosen)) %&gt;%
  arrange(ID, alternative) %&gt;% select(-temp)

Output

  ID alternative            Company Return_rate Dividend  Trend chosen
1  1           1          Jedinstvo        0.15      6.0 Trend2      1
2  1           2          Jedinstvo        0.15      3.0 Trend3      0
3  1           3          Jedinstvo        0.00      1.5 Trend3      0
4  2           1 NikolaTeslaAirport        0.15      0.0 Trend1      0
5  2           2          Jedinstvo        0.00      3.0 Trend3      1
6  2           3 NikolaTeslaAirport        0.10      0.0 Trend3      0
7  3           1 NikolaTeslaAirport        0.00      4.5 Trend1      1
8  3           2 NikolaTeslaAirport        0.05      3.0 Trend2      0
9  3           3          Jedinstvo        0.10      3.0 Trend1      0
# .....

huangapple
  • 本文由 发表于 2023年5月11日 01:26:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76221128.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定