在R中创建一个新列,该列包含文件名。

huangapple go评论113阅读模式
英文:

create a new column with the file name in R

问题

我正在网页抓取一个表格,但我想在所有包含信息的行中添加一个新列,该列的名称为文件名(类似于每一行都有一个ID)。例如:

文件名是"16-12-19.xlsx",所以我想在包含信息的每一行上添加一个新列,上面写着"16-12-19"。

示例图像:点击这里查看示例

整个代码如下:

library("openxlsx")
library("rvest")

start <- as.Date("16-12-19", format="%d-%m-%y")
end   <- as.Date("05-01-20", format="%d-%m-%y")
theDate <- start

while (theDate <= end)
{
  url <- (paste0("http://www.b3.com.br/pt_br/produtos-e-servicos/emprestimo-de-ativos/renda-variavel/emprestimos-registrados/renda-variavel-8AE490CA64CD50310164D1EFD6412F1C.htm?data=", format(theDate, "%d/%m/%y"), "&f=0"))
  site <- read_html(url)
  Info_Ajuste_HTML <- html_nodes(site,'table')

  
  Info_ajuste <- html_text(Info_Ajuste_HTML)
  head(Info_ajuste, 20)
  if (length(Info_Ajuste_HTML) > 0) { 
    head(Info_Ajuste_HTML)
    
    lista_tabela <- site %>%
      html_nodes("table") %>%
      html_table(fill = TRUE) 
    str(lista_tabela)
    head(lista_tabela[[1]], 10)
    AJUSTE <- lista_tabela[[1]]
    #View(AJUSTE)
    # 在这里添加新列
    AJUSTE$NewColumnName <- "16-12-19"
    write.xlsx(AJUSTE, file=paste0("C:/Users/Jessé/Desktop/R/XLS/", paste0(format(theDate, "%d-%m-%y"), ".xlsx")), col.names = (FALSE)) 
    theDate <- theDate + 1 
  }
  
  else {
    theDate <- theDate + 1 
  }
}
英文:

I do web scraping of a table, but I would like to add a new column with the name of that file in all rows (like a ID in each row) containing information. Example:

the file name is "16-12-19.xlsx"

so I wanted to add a new column with "16-12-19" written on each line that contains an information.

Example:

enter image description here

*the data format I customize later

the whole code is:

library(&quot;openxlsx&quot;)
library(&#39;rvest&#39;)

start &lt;- as.Date(&quot;16-12-19&quot;,format=&quot;%d-%m-%y&quot;)
end   &lt;- as.Date(&quot;05-01-20&quot;,format=&quot;%d-%m-%y&quot;)
theDate &lt;- start

while (theDate &lt;= end)
{
  url &lt;- (paste0(&quot;http://www.b3.com.br/pt_br/produtos-e-servicos/emprestimo-de-ativos/renda-variavel/emprestimos-registrados/renda-variavel-8AE490CA64CD50310164D1EFD6412F1C.htm?data=&quot;,format(theDate,&quot;%d/%m/%y&quot;),&quot;&amp;f=0&quot;))
  site &lt;- read_html(url)
  Info_Ajuste_HTML &lt;- html_nodes(site,&#39;table&#39;)
 
  
  Info_ajuste &lt;- html_text(Info_Ajuste_HTML)
  head(Info_ajuste,20)
  if (length(Info_Ajuste_HTML) &gt; 0) { ### &lt;- Added a check here
  head(Info_Ajuste_HTML)
    
  lista_tabela &lt;- site %&gt;%
  html_nodes(&quot;table&quot;) %&gt;%
  html_table(fill = TRUE) 
  str(lista_tabela)
  head(lista_tabela[[1]], 10)
  AJUSTE &lt;- lista_tabela[[1]]
  #View(AJUSTE)
  write.xlsx(AJUSTE, file=paste0(&quot;C:/Users/Jess&#233;/Desktop/R/XLS/&quot;,paste0(format(theDate,&quot;%d-%m-%y&quot;),&quot;.xlsx&quot;)), col.names = (FALSE)) 
  theDate &lt;- theDate + 1 
  }
  
  else {theDate &lt;- theDate + 1 }
  
  }

答案1

得分: 0

你可以使用 df[,c(filename)] = rep(filename, nrow(df)),其中 df 是包含来自此文件的数据的数据框,而 filename 包含文件的名称。

英文:

You can use df[,c(filename)] = rep(filename, nrow(df)) where df is the data frame containing the data from this file and filename contains the name of the file.

huangapple
  • 本文由 发表于 2020年1月6日 02:38:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/59603033.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定