英文:
create a new column with the file name in R
问题
我正在网页抓取一个表格,但我想在所有包含信息的行中添加一个新列,该列的名称为文件名(类似于每一行都有一个ID)。例如:
文件名是"16-12-19.xlsx",所以我想在包含信息的每一行上添加一个新列,上面写着"16-12-19"。
示例图像:点击这里查看示例
整个代码如下:
library("openxlsx")
library("rvest")
start <- as.Date("16-12-19", format="%d-%m-%y")
end <- as.Date("05-01-20", format="%d-%m-%y")
theDate <- start
while (theDate <= end)
{
url <- (paste0("http://www.b3.com.br/pt_br/produtos-e-servicos/emprestimo-de-ativos/renda-variavel/emprestimos-registrados/renda-variavel-8AE490CA64CD50310164D1EFD6412F1C.htm?data=", format(theDate, "%d/%m/%y"), "&f=0"))
site <- read_html(url)
Info_Ajuste_HTML <- html_nodes(site,'table')
Info_ajuste <- html_text(Info_Ajuste_HTML)
head(Info_ajuste, 20)
if (length(Info_Ajuste_HTML) > 0) {
head(Info_Ajuste_HTML)
lista_tabela <- site %>%
html_nodes("table") %>%
html_table(fill = TRUE)
str(lista_tabela)
head(lista_tabela[[1]], 10)
AJUSTE <- lista_tabela[[1]]
#View(AJUSTE)
# 在这里添加新列
AJUSTE$NewColumnName <- "16-12-19"
write.xlsx(AJUSTE, file=paste0("C:/Users/Jessé/Desktop/R/XLS/", paste0(format(theDate, "%d-%m-%y"), ".xlsx")), col.names = (FALSE))
theDate <- theDate + 1
}
else {
theDate <- theDate + 1
}
}
英文:
I do web scraping of a table, but I would like to add a new column with the name of that file in all rows (like a ID in each row) containing information. Example:
the file name is "16-12-19.xlsx"
so I wanted to add a new column with "16-12-19" written on each line that contains an information.
Example:
*the data format I customize later
the whole code is:
library("openxlsx")
library('rvest')
start <- as.Date("16-12-19",format="%d-%m-%y")
end <- as.Date("05-01-20",format="%d-%m-%y")
theDate <- start
while (theDate <= end)
{
url <- (paste0("http://www.b3.com.br/pt_br/produtos-e-servicos/emprestimo-de-ativos/renda-variavel/emprestimos-registrados/renda-variavel-8AE490CA64CD50310164D1EFD6412F1C.htm?data=",format(theDate,"%d/%m/%y"),"&f=0"))
site <- read_html(url)
Info_Ajuste_HTML <- html_nodes(site,'table')
Info_ajuste <- html_text(Info_Ajuste_HTML)
head(Info_ajuste,20)
if (length(Info_Ajuste_HTML) > 0) { ### <- Added a check here
head(Info_Ajuste_HTML)
lista_tabela <- site %>%
html_nodes("table") %>%
html_table(fill = TRUE)
str(lista_tabela)
head(lista_tabela[[1]], 10)
AJUSTE <- lista_tabela[[1]]
#View(AJUSTE)
write.xlsx(AJUSTE, file=paste0("C:/Users/Jessé/Desktop/R/XLS/",paste0(format(theDate,"%d-%m-%y"),".xlsx")), col.names = (FALSE))
theDate <- theDate + 1
}
else {theDate <- theDate + 1 }
}
答案1
得分: 0
你可以使用 df[,c(filename)] = rep(filename, nrow(df))
,其中 df
是包含来自此文件的数据的数据框,而 filename
包含文件的名称。
英文:
You can use df[,c(filename)] = rep(filename, nrow(df))
where df
is the data frame containing the data from this file and filename
contains the name of the file.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论