2023年5月11日 01:26:08go评论94阅读模式

英文:

Assign value of 1 per each group in dataset

问题

以下是已翻译的内容：

在下面提供的数据集中，我们可以看到ID重复三次，代表三种可能的选择，这意味着受访者必须选择其中的一种。最后一列chosen包含一个虚拟变量，每行默认值为0。

**问题：**我不明白如何随机分配1的值（表示选择了第一、第二或第三个替代方案）给每个组。例如，当ID == 1时，必须将值1随机分配给第一、第二或第三行，依此类推，直到数据的其余部分。

这是我尝试过的：

for (i in seq(1, nrow(sim_data), 3)) {  # 循环遍历每组三行
  chosen_index <- sample(i:(i+2), 1)  # 在组内生成一个随机索引
  sim_data$Chosen[chosen_index] <- 1  # 将1分配给选定的索引
}

由于我的数据有300行，循环只到298，所以它没有起作用。

数据集：

attributes <- expand.grid(
  Company = c("Metalac", "NikolaTeslaAirport", "Jedinstvo", "Energoprojekt"),
  Return_rate = c(0, 0.05, 0.10, 0.15),
  Dividend = c(0, 1.5, 3.0, 4.5, 6),
  Trend = c("Trend1", "Trend2", "Trend3")
)
# 为100名受访者生成模拟数据
set.seed(123) # 为了可重复性
sim_data <- data.frame(
  ID = rep(1:100, each = 3), # 每个受访者三个选择
  alternative = rep(1:3, times = 100), # 选择编号
  attributes[sample(nrow(attributes), size = 100 * 3, replace = TRUE), ],
  chosen = 0
)

英文:

In the dataset provided below, we can see that I have ID repeat three times for three possible alternatives meaning that a respondent has to choose 1 of 3 alternatives. The last column chosen contains a dummy variable which has 0 in each row as a default value.

Problem: I do not understand how can I randomly assign a value of 1 (indicating that first, second or third alternative was chosen) for each group. For example, when ID == 1, value 1 has to be randomly assigned either to first, second or third row and so on for the rast of the data.

Here's what I've tried:

for (i in seq(1, nrow(sim_data), 3)) {  # loop through each group of three rows
  chosen_index &lt;- sample(i:(i+2), 1)  # generate a random index within the group
  sim_data$Chosen[chosen_index] &lt;- 1  # assign 1 to the chosen index
}

Since my data has 300 rows and the cycle goes up to 298, it didn't work out.

Dataset:

attributes &lt;- expand.grid(
  Company = c(&quot;Metalac&quot;, &quot;NikolaTeslaAirport&quot;, &quot;Jedinstvo&quot;, &quot;Energoprojekt&quot;),
  Return_rate = c(0, 0.05, 0.10, 0.15),
  Dividend = c(0, 1.5, 3.0, 4.5, 6),
  Trend = c(&quot;Trend1&quot;, &quot;Trend2&quot;, &quot;Trend3&quot;)
)
# Generate simulated data for 100 respondents
set.seed(123) # for reproducibility
sim_data &lt;- data.frame(
  ID = rep(1:100, each = 3), # three alternatives per respondent
  alternative = rep(1:3, times = 100), # alternative number
  attributes[sample(nrow(attributes), size = 100 * 3, replace = TRUE), ],
  chosen = 0
)

答案1

得分: 0

以下是您要翻译的内容：

"I'm sure there are more elegant ways, but one dplyr solution would be to use slice_sample() to randomly sample 1 row per group, assign it a value of 1, then join the sampled data frame back with the full data frame (and then I do some clean up sorting back to the original and dropping temp variables)."

sim_data %>%
  slice_sample(n = 1, by = ID) %>%
  mutate(temp = TRUE) %>%
  full_join(sim_data) %>%
  mutate(chosen = case_when(temp ~ 1, TRUE ~ chosen)) %>%
  arrange(ID, alternative) %>% select(-temp)

Output

  ID alternative            Company Return_rate Dividend  Trend chosen
1  1           1          Jedinstvo        0.15      6.0 Trend2      1
2  1           2          Jedinstvo        0.15      3.0 Trend3      0
3  1           3          Jedinstvo        0.00      1.5 Trend3      0
4  2           1 NikolaTeslaAirport        0.15      0.0 Trend1      0
5  2           2          Jedinstvo        0.00      3.0 Trend3      1
6  2           3 NikolaTeslaAirport        0.10      0.0 Trend3      0
7  3           1 NikolaTeslaAirport        0.00      4.5 Trend1      1
8  3           2 NikolaTeslaAirport        0.05      3.0 Trend2      0
9  3           3          Jedinstvo        0.10      3.0 Trend1      0
# .....

英文:

I'm sure there are more elegant ways, but one dplyr solution would be to use slice_sample() to randomly sample 1 row per group, assign it a value of 1, then join the sampled data frame back with the full data frame (and then I do some clean up sorting back to the original and dropping temp variables).

sim_data %&gt;% 
  slice_sample(n = 1, by = ID) %&gt;%
  mutate(temp = TRUE) %&gt;%
  full_join(sim_data) %&gt;% 
  mutate(chosen = case_when(temp ~ 1, TRUE ~ chosen)) %&gt;%
  arrange(ID, alternative) %&gt;% select(-temp)
# or with older dplyr versions
sim_data %&gt;% 
  group_by(ID) %&gt;%
  slice_sample(n = 1) %&gt;%
  mutate(temp = TRUE) %&gt;%
  full_join(sim_data) %&gt;% 
  mutate(chosen = case_when(temp ~ 1, TRUE ~ chosen)) %&gt;%
  arrange(ID, alternative) %&gt;% select(-temp)

Output

  ID alternative            Company Return_rate Dividend  Trend chosen
1  1           1          Jedinstvo        0.15      6.0 Trend2      1
2  1           2          Jedinstvo        0.15      3.0 Trend3      0
3  1           3          Jedinstvo        0.00      1.5 Trend3      0
4  2           1 NikolaTeslaAirport        0.15      0.0 Trend1      0
5  2           2          Jedinstvo        0.00      3.0 Trend3      1
6  2           3 NikolaTeslaAirport        0.10      0.0 Trend3      0
7  3           1 NikolaTeslaAirport        0.00      4.5 Trend1      1
8  3           2 NikolaTeslaAirport        0.05      3.0 Trend2      0
9  3           3          Jedinstvo        0.10      3.0 Trend1      0
# .....

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Assign value of 1 per each group in dataset.

问题

答案1

如何在嵌套的for循环中使用mutate创建新变量？

使用rvest解析一个包含HTML和非HTML输入的类chr的数据框列。

Diff in Diff with panel dataset on R 在R中使用面板数据进行的差异和差异分析

跳过在 R 中的 Plotly 的 hovertemplate 中的 ‘trace0’。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。