英文:
Read an excel file with separate range of cells
问题
我正在尝试使用最常用的函数(至少对我来说是)从xlsx文件中导入单独的单元格范围。我知道你可以导入数据然后选择,但我猜可能可以用另一种方式来做。
#选项1
dat <- readxl::read_xlsx("C:/Users/javie/Documents/Estadística/Vitamina B12/Exoma_B12.xlsx", col_names = F, sheet = 2, skip = 1, range = cell_cols(c("A:L", "AP:BA")))
#选项2
dat <- readxl::read_xlsx("C:/Users/javie/Documents/Estadística/Vitamina B12/Exoma_B12.xlsx", col_names = F, sheet = 2, skip = 1, range = c("A:L", "AP:BA"))
#选项3
dat <- readxl::read_excel("C:/Users/javie/Documents/Estadística/Vitamina B12/Exoma_B12.xlsx", col_names = F, sheet = 2, skip = 1, range = cell_cols(c("A:L", "AP:BA")))
#选项4
dat <- readxl::read_excel("C:/Users/javie/Documents/Estadística/Vitamina B12/Exoma_B12.xlsx", col_names = F, sheet = 2, skip = 1, range = c("A:L", "AP:BA"))
英文:
I am trying to import a separate range of cells from an xlsx file using the most common functions (at least for me). I know you can import data and the select, but I guess it's possible to do it the other way
#Option 1
dat <- readxl::read_xlsx("C:/Users/javie/Documents/Estadística/Vitamina B12/Exoma_B12.xlsx", col_names = F, sheet = 2, skip = 1, range = cell_cols(c("A:L", "AP:BA")))
#Option 2
dat <- readxl::read_xlsx("C:/Users/javie/Documents/Estadística/Vitamina B12/Exoma_B12.xlsx", col_names = F, sheet = 2, skip = 1, range = c("A:L", "AP:BA"))
#Option 3
dat <- readxl::read_excel("C:/Users/javie/Documents/Estadística/Vitamina B12/Exoma_B12.xlsx", col_names = F, sheet = 2, skip = 1, range = cell_cols(c("A:L", "AP:BA")))
#Option 4
dat <- readxl::read_excel("C:/Users/javie/Documents/Estadística/Vitamina B12/Exoma_B12.xlsx", col_names = F, sheet = 2, skip = 1, range = c("A:L", "AP:BA"))
I guess this is easy but as far as I was looking through SO there is no a clear solution
答案1
得分: 1
如果我理解问题正确,您想要读取工作簿的特定列。可以使用openxlsx2
版本0.6.1来实现,如下所示:
library(openxlsx2)
# 创建示例文件
tmp <- temp_xlsx()
write_xlsx(list(head(iris), head(mtcars)), tmp)
# 从第二个工作表读取,从第2行开始,不包括列名
# 即使未写入,列L也包括在内
read_xlsx(tmp, sheet = 2, startRow = 2, colNames = FALSE, cols = c("A:C", "J:L"))
#> A B C J K L
#> 2 21.0 6 160 4 4 NA
#> 3 21.0 6 160 4 4 NA
#> 4 22.8 4 108 4 1 NA
#> 5 21.4 6 258 3 1 NA
#> 6 18.7 8 360 3 2 NA
#> 7 18.1 6 225 3 1 NA
希望这对您有所帮助。
英文:
If I understand the question correctly, you want to read specific columns of a workbook. This can be achieved using openxlsx2
release 0.6.1 as follows:
library(openxlsx2)
# create example file
tmp <- temp_xlsx()
write_xlsx(list(head(iris), head(mtcars)), tmp)
# read from second sheet, start at row 2 and avoid column names
# column L is included even though it is not written
read_xlsx(tmp, sheet = 2, startRow = 2, colNames = FALSE, cols = c("A:C", "J:L"))
#> A B C J K L
#> 2 21.0 6 160 4 4 NA
#> 3 21.0 6 160 4 4 NA
#> 4 22.8 4 108 4 1 NA
#> 5 21.4 6 258 3 1 NA
#> 6 18.7 8 360 3 2 NA
#> 7 18.1 6 225 3 1 NA
答案2
得分: 0
readxl 差不多能满足这个用例 (link),但不支持非连续的列(即 c('a:b','e:f')
)。
例如:
library(readxl)
library(cellranger)
example_file <- readxl_example("datasets.xlsx")
read_excel(example_excel, range=cell_cols('c:d'), sheet=2, skip=1, col_names=TRUE) -> foo
输出:
> foo
# A tibble: 32 × 2
disp hp
<dbl> <dbl>
1 160 110
2 160 110
3 108 93
4 258 110
5 360 175
6 225 105
7 360 245
8 147. 62
9 141. 95
10 168. 123
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows
Openxlsx::read.xlsx
也基本可以工作,但它会弄乱列的顺序:
> read.xlsx(example_excel, sheet=2, cols=c(1,3,4))
mpg disp hp
1 21.0 160.0 110
2 21.0 160.0 110
3 22.8 108.0 93
4 21.4 258.0 110
5 18.7 360.0 175
6 18.1 225.0 105
7 14.3 360.0 245
mpg 是原始数据中的第1列,它始终作为最左边的列返回。
作为第三个选项,相对来说也可以将 read_excel 包装成另一个函数,支持非连续的列。以下是我的尝试:
library(tidyverse); library(cellranger); library(readxl)
expand_sequence <- function(vec) {
vec <- strsplit(vec, ":")
vec <- unlist(lapply(vec, function(x) {
if(length(x) > 1){
sapply(seq(from = as.integer(charToRaw(as.character(x[1]))),
to = as.integer(charToRaw(as.character(x[2])))),
function(y) rawToChar(as.raw(y)))
} else {
as.character(x)
}
}))
return(vec)
}
better_read_excel <- function(file, columns, sheet, skip){
columnvec <- expand_sequence(columns)
data_list <- lapply(columnvec, function(i) {
if(i == columnvec[1]) {
table <- read_excel(file, range=cell_cols(i), sheet=sheet, skip=skip)
} else {
table <- read_excel(file, range=cell_cols(i), sheet=sheet, skip=skip, col_names=TRUE)
}
return(table)
})
table <- do.call(cbind, data_list)
return(table)
}
better_read_excel(example_file, columns=c('a:c','h','i'), sheet=2, skip=1)
> better_read_excel(example_file, columns=c('a:c','h','i'), sheet=2, skip=1)
mpg cyl disp vs am
1 21.0 6 160.0 0 1
2 21.0 6 160.0 0 1
3 22.8 4 108.0 1 1
4 21.4 6 258.0 1 0
5 18.7 8 360.0 0 0
英文:
readxl almost covers this usecase (link), but does not support non-contiguous columns (i.e. c('a:b','e:f')
)
For example:
library(readxl)
library(cellranger)
example_file <- readxl_example("datasets.xlsx")
read_excel(example_excel, range=cell_cols('c:d'), sheet=2, skip=1, col_names=TRUE) -> foo
Output:
> foo
# A tibble: 32 × 2
disp hp
<dbl> <dbl>
1 160 110
2 160 110
3 108 93
4 258 110
5 360 175
6 225 105
7 360 245
8 147. 62
9 141. 95
10 168. 123
# … with 22 more rows
# ℹ Use `print(n = ...)` to see more rows
Openxlsx::read.xlsx
also kind of works, but it messes up the column ordering:
> read.xlsx(example_excel, sheet=2, cols=c(1,3,4))
mpg disp hp
1 21.0 160.0 110
2 21.0 160.0 110
3 22.8 108.0 93
4 21.4 258.0 110
5 18.7 360.0 175
6 18.1 225.0 105
7 14.3 360.0 245
mpg is column 1 in the raw data, and it is always returned as the left-most column.
As a third option, it is also relatively straightforward to wrap read_excel into another function, that supports non-contiguous columns. Here's my attempt at that:
library(tidyverse); library(cellranger); library(readxl)
expand_sequence <- function(vec) {
vec <- strsplit(vec, ":")
vec <- unlist(lapply(vec, function(x) {
if(length(x) > 1){
sapply(seq(from = as.integer(charToRaw(as.character(x[1]))),
to = as.integer(charToRaw(as.character(x[2])))),
function(y) rawToChar(as.raw(y)))
} else {
as.character(x)
}
}))
return(vec)
}
better_read_excel <- function(file, columns, sheet, skip){
columnvec <- expand_sequence(columns)
data_list <- lapply(columnvec, function(i) {
if(i == columnvec[1]) {
table <- read_excel(file, range=cell_cols(i), sheet=sheet, skip=skip)
} else {
table <- read_excel(file, range=cell_cols(i), sheet=sheet, skip=skip, col_names=TRUE)
}
return(table)
})
table <- do.call(cbind, data_list)
return(table)
}
better_read_excel(example_file, columns=c('a:c','h','i'), sheet=2, skip=1)
> better_read_excel(example_file, columns=c('a:c','h','i'), sheet=2, skip=1)
mpg cyl disp vs am
1 21.0 6 160.0 0 1
2 21.0 6 160.0 0 1
3 22.8 4 108.0 1 1
4 21.4 6 258.0 1 0
5 18.7 8 360.0 0 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论