# 根据列表项的行拆分列表。

go评论57阅读模式

Split list based on rows of list items

# 问题

``````full_list <- list(
df1 = replicate(10, sample(0:1, 10, replace = TRUE)),
df2 = replicate(10, sample(0:1, 15, replace = TRUE)),
df3 = replicate(10, sample(0:1, 20, replace = TRUE)),
df4 = replicate(10, sample(0:1, 10, replace = TRUE))
)
``````

``````sublist <- lapply(full_list, function(x) split(full_list, f = nrow(x)))
``````

I'm trying to split my list of data frames into some kind of sub groups like a nested list or several lists. The split should be based on the number of rows per data frame, so data frames with the same number of rows should end up in the same list.

``````full_list &lt;- list(
df1 = replicate(10, sample(0:1, 10, replace = TRUE)),
df2 = replicate(10, sample(0:1, 15, replace = TRUE)),
df3 = replicate(10, sample(0:1, 20, replace = TRUE)),
df4 = replicate(10, sample(0:1, 10, replace = TRUE))
)
``````

There are now two data frames with `nrow() == 10`, so they should end up in their own list or sublist

I tried something like this, but I don't think `split` is applicable for lists:

``````sublist &lt;- lapply(full_list, function(x) split(full_list, f = nrow(x)))
``````

BTW: The greater goal is to split all data frames into a training and a test data set for machine learning with the function below. `sample` will be used to create the subsets, but I want the same `sample_vector` for data frames of same length. Therefore, I want to split the full list into sub lists beforehand. Afterwards I will put all data frames together again for further processing (kind of split - apply - combine). Just mentioning if I might be overcomplicating things here.

``````# function to split data frames in each sub list into train and test data frames
counter &lt;- 0
train_test_list &lt;- list()
for (x_table in sublist) {
counter &lt;- counter + 1
current_name &lt;- paste(names(sublist)0+网站访问量, sep = &quot;_&quot;)

sample_vector &lt;- sample.int(n = nrow(x_table),
size = floor(0.8 * nrow(x_table)), replace = FALSE)
train_set &lt;- x_table[sample_vector, ]
test_set  &lt;- x_table[-sample_vector, ]

train_test_list[[current_name]] &lt;- list(
train_set = train_set, test_set = test_set,
table_name = names(sublist)0+网站访问量
)
}
# combine all lists with test and train pairs back into one list
full_train_test_list &lt;- c(train_test_list1, train_test_list2, train_test_list3, ...)
``````

# 答案1

``````new_list <- split(full_list, sapply(full_list, nrow))
str(new_list)
#List of 3
# \$ 10:List of 2
#  ..\$ df1: int [1:10, 1:10] 1 0 0 1 1 0 1 0 0 1 ...
#  ..\$ df4: int [1:10, 1:10] 1 0 1 1 1 0 0 0 1 1 ...
# \$ 15:List of 1
#  ..\$ df2: int [1:15, 1:10] 0 1 1 0 0 0 0 0 0 1 ...
# \$ 20:List of 1
#  ..\$ df3: int [1:20, 1:10] 1 1 0 1 0 1 1 1 0 1 ...
``````

``````traintestlst <- lapply(new_list, function(sublst) lapply(sublst, function(x_table) {
sample_vector <- sample.int(n = nrow(x_table),
size = floor(0.8 * nrow(x_table)), replace = FALSE)
train_set <- x_table[sample_vector, ]
test_set  <- x_table[-sample_vector, ]
list(train_set = train_set, test_set = test_set)
})
)
``````

``````traintestlst[[1]]\$df1
#\$train_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    1    1    0    1    0    0    1    1    1     0
#[2,]    1    0    1    1    1    0    0    0    1     0
#[3,]    0    1    0    0    1    1    0    1    1     0
#[4,]    1    1    0    1    0    0    1    0    0     1
#[5,]    0    0    0    1    0    0    1    0    1     0
#[6,]    0    1    1    0    1    0    1    0    1     0
#[7,]    1    0    1    1    0    0    0    0    0     1
#[8,]    0    1    0    0    0    1    0    0    1     0

#\$test_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    0    0    0    0    0    1    0    1    0     1
#[2,]    1    0    0    0    0    0    0    1    1     0
``````

We can get the number of rows with `sapply` and `split` based on that info

``````new_list &lt;- split(full_list, sapply(full_list, nrow))
str(new_list)
#List of 3
# \$ 10:List of 2
#  ..\$ df1: int [1:10, 1:10] 1 0 0 1 1 0 1 0 0 1 ...
#  ..\$ df4: int [1:10, 1:10] 1 0 1 1 1 0 0 0 1 1 ...
# \$ 15:List of 1
#  ..\$ df2: int [1:15, 1:10] 0 1 1 0 0 0 0 0 0 1 ...
# \$ 20:List of 1
#  ..\$ df3: int [1:20, 1:10] 1 1 0 1 0 1 1 1 0 1 ...
``````

As it is a nested `list`, we can do the processing in the inner `list` by calling `lapply` inside the first `lapply`

``````traintestlst &lt;- lapply(new_list, function(sublst) lapply(sublst, function(x_table) {

sample_vector &lt;- sample.int(n = nrow(x_table),
size = floor(0.8 * nrow(x_table)), replace = FALSE)
train_set &lt;- x_table[sample_vector, ]
test_set  &lt;- x_table[-sample_vector, ]
list(train_set = train_set, test_set = test_set)

})
)
``````

-checking the output

``````traintestlst[[1]]\$df1
#\$train_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    1    1    0    1    0    0    1    1    1     0
#[2,]    1    0    1    1    1    0    0    0    1     0
#[3,]    0    1    0    0    1    1    0    1    1     0
#[4,]    1    1    0    1    0    0    1    0    0     1
#[5,]    0    0    0    1    0    0    1    0    1     0
#[6,]    0    1    1    0    1    0    1    0    1     0
#[7,]    1    0    1    1    0    0    0    0    0     1
#[8,]    0    1    0    0    0    1    0    0    1     0

#\$test_set
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,]    0    0    0    0    0    1    0    1    0     1
#[2,]    1    0    0    0    0    0    0    1    1     0
``````

• 本文由 发表于 2020年1月4日 01:42:38
• 转载请务必保留本文链接：https://go.coder-hub.com/59583018.html
• list
• r
• split

go 54

go 46

go 39

go 49