2020年1月4日 01:42:38go评论111阅读模式

英文:

Split list based on rows of list items

问题

我尝试将我的数据帧列表拆分成一些子组，比如嵌套列表或多个列表。拆分应基于每个数据帧的行数，因此具有相同行数的数据帧应该放在同一个列表中。

full_list <- list(
  df1 = replicate(10, sample(0:1, 10, replace = TRUE)),
  df2 = replicate(10, sample(0:1, 15, replace = TRUE)),
  df3 = replicate(10, sample(0:1, 20, replace = TRUE)),
  df4 = replicate(10, sample(0:1, 10, replace = TRUE))
)

现在有两个数据帧，其中 nrow() == 10，因此它们应该放在它们自己的列表或子列表中。

我尝试了类似这样的方法，但我认为 split 不适用于列表：

sublist <- lapply(full_list, function(x) split(full_list, f = nrow(x)))

顺便提一下，更大的目标是将所有数据帧拆分为用于机器学习的训练数据集和测试数据集，使用以下函数。sample 将用于创建子集，但我希望相同行数的数据帧使用相同的 sample_vector。因此，我想事先将完整列表拆分为子列表。之后，我将所有数据帧再次组合在一起进行进一步处理（有点像拆分 - 应用 - 合并）。只是提一下，如果我可能过于复杂化了事情。

英文:

I'm trying to split my list of data frames into some kind of sub groups like a nested list or several lists. The split should be based on the number of rows per data frame, so data frames with the same number of rows should end up in the same list.

full_list &lt;- list(
  df1 = replicate(10, sample(0:1, 10, replace = TRUE)),
  df2 = replicate(10, sample(0:1, 15, replace = TRUE)),
  df3 = replicate(10, sample(0:1, 20, replace = TRUE)),
  df4 = replicate(10, sample(0:1, 10, replace = TRUE))
)

There are now two data frames with nrow() == 10, so they should end up in their own list or sublist

I tried something like this, but I don't think split is applicable for lists:

sublist &lt;- lapply(full_list, function(x) split(full_list, f = nrow(x)))

BTW: The greater goal is to split all data frames into a training and a test data set for machine learning with the function below. sample will be used to create the subsets, but I want the same sample_vector for data frames of same length. Therefore, I want to split the full list into sub lists beforehand. Afterwards I will put all data frames together again for further processing (kind of split - apply - combine). Just mentioning if I might be overcomplicating things here.

# function to split data frames in each sub list into train and test data frames 
counter &lt;- 0
train_test_list &lt;- list()
for (x_table in sublist) {
  counter &lt;- counter + 1
  current_name &lt;- paste(names(sublist)0
+
网站访问量
, sep = &quot;_&quot;)
  sample_vector &lt;- sample.int(n = nrow(x_table), 
    size = floor(0.8 * nrow(x_table)), replace = FALSE)
  train_set &lt;- x_table[sample_vector, ]
  test_set  &lt;- x_table[-sample_vector, ]
   
  train_test_list[[current_name]] &lt;- list(
    train_set = train_set, test_set = test_set, 
    table_name = names(sublist)0
+
网站访问量
  )
}
# combine all lists with test and train pairs back into one list 
full_train_test_list &lt;- c(train_test_list1, train_test_list2, train_test_list3, ...)

答案1

得分: 4

我们可以使用sapply和split来根据这些信息获取行数。

new_list <- split(full_list, sapply(full_list, nrow))
str(new_list)
#List of 3
# $ 10:List of 2
#  ..$ df1: int [1:10, 1:10] 1 0 0 1 1 0 1 0 0 1 ...
#  ..$ df4: int [1:10, 1:10] 1 0 1 1 1 0 0 0 1 1 ...
# $ 15:List of 1
#  ..$ df2: int [1:15, 1:10] 0 1 1 0 0 0 0 0 0 1 ...
# $ 20:List of 1
#  ..$ df3: int [1:20, 1:10] 1 1 0 1 0 1 1 1 0 1 ...

由于这是一个嵌套的list，我们可以在第一个lapply内部调用lapply来处理内部的list。

traintestlst <- lapply(new_list, function(sublst) lapply(sublst, function(x_table) {
     sample_vector <- sample.int(n = nrow(x_table), 
                size = floor(0.8 * nrow(x_table)), replace = FALSE)
      train_set <- x_table[sample_vector, ]
      test_set  <- x_table[-sample_vector, ]
      list(train_set = train_set, test_set = test_set)
     })
)

检查输出：

traintestlst[[1]]$df1
#$train_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    1    1    0    1    0    0    1    1    1     0
#[2,]    1    0    1    1    1    0    0    0    1     0
#[3,]    0    1    0    0    1    1    0    1    1     0
#[4,]    1    1    0    1    0    0    1    0    0     1
#[5,]    0    0    0    1    0    0    1    0    1     0
#[6,]    0    1    1    0    1    0    1    0    1     0
#[7,]    1    0    1    1    0    0    0    0    0     1
#[8,]    0    1    0    0    0    1    0    0    1     0
#$test_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    0    0    0    0    0    1    0    1    0     1
#[2,]    1    0    0    0    0    0    0    1    1     0

英文:

We can get the number of rows with sapply and split based on that info

new_list &lt;- split(full_list, sapply(full_list, nrow))
str(new_list)
#List of 3
# $ 10:List of 2
#  ..$ df1: int [1:10, 1:10] 1 0 0 1 1 0 1 0 0 1 ...
#  ..$ df4: int [1:10, 1:10] 1 0 1 1 1 0 0 0 1 1 ...
# $ 15:List of 1
#  ..$ df2: int [1:15, 1:10] 0 1 1 0 0 0 0 0 0 1 ...
# $ 20:List of 1
#  ..$ df3: int [1:20, 1:10] 1 1 0 1 0 1 1 1 0 1 ...

As it is a nested list, we can do the processing in the inner list by calling lapply inside the first lapply

traintestlst &lt;- lapply(new_list, function(sublst) lapply(sublst, function(x_table) {
     sample_vector &lt;- sample.int(n = nrow(x_table), 
                size = floor(0.8 * nrow(x_table)), replace = FALSE)
      train_set &lt;- x_table[sample_vector, ]
      test_set  &lt;- x_table[-sample_vector, ]
      list(train_set = train_set, test_set = test_set)
     })
    )

-checking the output

traintestlst[[1]]$df1
#$train_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    1    1    0    1    0    0    1    1    1     0
#[2,]    1    0    1    1    1    0    0    0    1     0
#[3,]    0    1    0    0    1    1    0    1    1     0
#[4,]    1    1    0    1    0    0    1    0    0     1
#[5,]    0    0    0    1    0    0    1    0    1     0
#[6,]    0    1    1    0    1    0    1    0    1     0
#[7,]    1    0    1    1    0    0    0    0    0     1
#[8,]    0    1    0    0    0    1    0    0    1     0
#$test_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    0    0    0    0    0    1    0    1    0     1
#[2,]    1    0    0    0    0    0    0    1    1     0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据列表项的行拆分列表。

问题

答案1

创建一个变量/列，使用从开始日期算起的13周期内事件的计数。

在R中迭代加载大型空间数据集并执行交叉操作。

使用dplyr找到数据框中所有右侧列都为零的最左列的方法。

使用apply()按名称选择特定变量

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。