# 问题

``````full_list <- list(
df1 = replicate(10, sample(0:1, 10, replace = TRUE)),
df2 = replicate(10, sample(0:1, 15, replace = TRUE)),
df3 = replicate(10, sample(0:1, 20, replace = TRUE)),
df4 = replicate(10, sample(0:1, 10, replace = TRUE))
)
``````

``````sublist <- lapply(full_list, function(x) split(full_list, f = nrow(x)))
``````

I'm trying to split my list of data frames into some kind of sub groups like a nested list or several lists. The split should be based on the number of rows per data frame, so data frames with the same number of rows should end up in the same list.

``````full_list &lt;- list(
df1 = replicate(10, sample(0:1, 10, replace = TRUE)),
df2 = replicate(10, sample(0:1, 15, replace = TRUE)),
df3 = replicate(10, sample(0:1, 20, replace = TRUE)),
df4 = replicate(10, sample(0:1, 10, replace = TRUE))
)
``````

There are now two data frames with `nrow() == 10`, so they should end up in their own list or sublist

I tried something like this, but I don't think `split` is applicable for lists:

``````sublist &lt;- lapply(full_list, function(x) split(full_list, f = nrow(x)))
``````

BTW: The greater goal is to split all data frames into a training and a test data set for machine learning with the function below. `sample` will be used to create the subsets, but I want the same `sample_vector` for data frames of same length. Therefore, I want to split the full list into sub lists beforehand. Afterwards I will put all data frames together again for further processing (kind of split - apply - combine). Just mentioning if I might be overcomplicating things here.

``````# function to split data frames in each sub list into train and test data frames
counter &lt;- 0
train_test_list &lt;- list()
for (x_table in sublist) {
counter &lt;- counter + 1
current_name <- paste(names(sublist), sep = "_")

sample_vector &lt;- sample.int(n = nrow(x_table),
size = floor(0.8 * nrow(x_table)), replace = FALSE)
train_set &lt;- x_table[sample_vector, ]
test_set  &lt;- x_table[-sample_vector, ]

train_test_list[[current_name]] &lt;- list(
train_set = train_set, test_set = test_set,
table_name = names(sublist)
)
}
# combine all lists with test and train pairs back into one list
full_train_test_list &lt;- c(train_test_list1, train_test_list2, train_test_list3, ...)
``````

# 答案1

``````new_list <- split(full_list, sapply(full_list, nrow))
str(new_list)
#List of 3
# \$ 10:List of 2
#  ..\$ df1: int [1:10, 1:10] 1 0 0 1 1 0 1 0 0 1 ...
#  ..\$ df4: int [1:10, 1:10] 1 0 1 1 1 0 0 0 1 1 ...
# \$ 15:List of 1
#  ..\$ df2: int [1:15, 1:10] 0 1 1 0 0 0 0 0 0 1 ...
# \$ 20:List of 1
#  ..\$ df3: int [1:20, 1:10] 1 1 0 1 0 1 1 1 0 1 ...
``````

``````traintestlst <- lapply(new_list, function(sublst) lapply(sublst, function(x_table) {
sample_vector <- sample.int(n = nrow(x_table),
size = floor(0.8 * nrow(x_table)), replace = FALSE)
train_set <- x_table[sample_vector, ]
test_set  <- x_table[-sample_vector, ]
list(train_set = train_set, test_set = test_set)
})
)
``````

``````traintestlst[[1]]\$df1
#\$train_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    1    1    0    1    0    0    1    1    1     0
#[2,]    1    0    1    1    1    0    0    0    1     0
#[3,]    0    1    0    0    1    1    0    1    1     0
#[4,]    1    1    0    1    0    0    1    0    0     1
#[5,]    0    0    0    1    0    0    1    0    1     0
#[6,]    0    1    1    0    1    0    1    0    1     0
#[7,]    1    0    1    1    0    0    0    0    0     1
#[8,]    0    1    0    0    0    1    0    0    1     0

#\$test_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    0    0    0    0    0    1    0    1    0     1
#[2,]    1    0    0    0    0    0    0    1    1     0
``````

We can get the number of rows with `sapply` and `split` based on that info

``````new_list &lt;- split(full_list, sapply(full_list, nrow))
str(new_list)
#List of 3
# \$ 10:List of 2
#  ..\$ df1: int [1:10, 1:10] 1 0 0 1 1 0 1 0 0 1 ...
#  ..\$ df4: int [1:10, 1:10] 1 0 1 1 1 0 0 0 1 1 ...
# \$ 15:List of 1
#  ..\$ df2: int [1:15, 1:10] 0 1 1 0 0 0 0 0 0 1 ...
# \$ 20:List of 1
#  ..\$ df3: int [1:20, 1:10] 1 1 0 1 0 1 1 1 0 1 ...
``````

As it is a nested `list`, we can do the processing in the inner `list` by calling `lapply` inside the first `lapply`

``````traintestlst &lt;- lapply(new_list, function(sublst) lapply(sublst, function(x_table) {

sample_vector &lt;- sample.int(n = nrow(x_table),
size = floor(0.8 * nrow(x_table)), replace = FALSE)
train_set &lt;- x_table[sample_vector, ]
test_set  &lt;- x_table[-sample_vector, ]
list(train_set = train_set, test_set = test_set)

})
)
``````

-checking the output

``````traintestlst[[1]]\$df1
#\$train_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    1    1    0    1    0    0    1    1    1     0
#[2,]    1    0    1    1    1    0    0    0    1     0
#[3,]    0    1    0    0    1    1    0    1    1     0
#[4,]    1    1    0    1    0    0    1    0    0     1
#[5,]    0    0    0    1    0    0    1    0    1     0
#[6,]    0    1    1    0    1    0    1    0    1     0
#[7,]    1    0    1    1    0    0    0    0    0     1
#[8,]    0    1    0    0    0    1    0    0    1     0

#\$test_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    0    0    0    0    0    1    0    1    0     1
#[2,]    1    0    0    0    0    0    0    1    1     0
``````

