在R中创建一个新列,该列包含文件名。

huangapple go评论136阅读模式
英文:

create a new column with the file name in R

问题

我正在网页抓取一个表格,但我想在所有包含信息的行中添加一个新列,该列的名称为文件名(类似于每一行都有一个ID)。例如:

文件名是"16-12-19.xlsx",所以我想在包含信息的每一行上添加一个新列,上面写着"16-12-19"。

示例图像:点击这里查看示例

整个代码如下:

  1. library("openxlsx")
  2. library("rvest")
  3. start <- as.Date("16-12-19", format="%d-%m-%y")
  4. end <- as.Date("05-01-20", format="%d-%m-%y")
  5. theDate <- start
  6. while (theDate <= end)
  7. {
  8. url <- (paste0("http://www.b3.com.br/pt_br/produtos-e-servicos/emprestimo-de-ativos/renda-variavel/emprestimos-registrados/renda-variavel-8AE490CA64CD50310164D1EFD6412F1C.htm?data=", format(theDate, "%d/%m/%y"), "&f=0"))
  9. site <- read_html(url)
  10. Info_Ajuste_HTML <- html_nodes(site,'table')
  11. Info_ajuste <- html_text(Info_Ajuste_HTML)
  12. head(Info_ajuste, 20)
  13. if (length(Info_Ajuste_HTML) > 0) {
  14. head(Info_Ajuste_HTML)
  15. lista_tabela <- site %>%
  16. html_nodes("table") %>%
  17. html_table(fill = TRUE)
  18. str(lista_tabela)
  19. head(lista_tabela[[1]], 10)
  20. AJUSTE <- lista_tabela[[1]]
  21. #View(AJUSTE)
  22. # 在这里添加新列
  23. AJUSTE$NewColumnName <- "16-12-19"
  24. write.xlsx(AJUSTE, file=paste0("C:/Users/Jessé/Desktop/R/XLS/", paste0(format(theDate, "%d-%m-%y"), ".xlsx")), col.names = (FALSE))
  25. theDate <- theDate + 1
  26. }
  27. else {
  28. theDate <- theDate + 1
  29. }
  30. }
英文:

I do web scraping of a table, but I would like to add a new column with the name of that file in all rows (like a ID in each row) containing information. Example:

the file name is "16-12-19.xlsx"

so I wanted to add a new column with "16-12-19" written on each line that contains an information.

Example:

enter image description here

*the data format I customize later

the whole code is:

  1. library(&quot;openxlsx&quot;)
  2. library(&#39;rvest&#39;)
  3. start &lt;- as.Date(&quot;16-12-19&quot;,format=&quot;%d-%m-%y&quot;)
  4. end &lt;- as.Date(&quot;05-01-20&quot;,format=&quot;%d-%m-%y&quot;)
  5. theDate &lt;- start
  6. while (theDate &lt;= end)
  7. {
  8. url &lt;- (paste0(&quot;http://www.b3.com.br/pt_br/produtos-e-servicos/emprestimo-de-ativos/renda-variavel/emprestimos-registrados/renda-variavel-8AE490CA64CD50310164D1EFD6412F1C.htm?data=&quot;,format(theDate,&quot;%d/%m/%y&quot;),&quot;&amp;f=0&quot;))
  9. site &lt;- read_html(url)
  10. Info_Ajuste_HTML &lt;- html_nodes(site,&#39;table&#39;)
  11. Info_ajuste &lt;- html_text(Info_Ajuste_HTML)
  12. head(Info_ajuste,20)
  13. if (length(Info_Ajuste_HTML) &gt; 0) { ### &lt;- Added a check here
  14. head(Info_Ajuste_HTML)
  15. lista_tabela &lt;- site %&gt;%
  16. html_nodes(&quot;table&quot;) %&gt;%
  17. html_table(fill = TRUE)
  18. str(lista_tabela)
  19. head(lista_tabela[[1]], 10)
  20. AJUSTE &lt;- lista_tabela[[1]]
  21. #View(AJUSTE)
  22. write.xlsx(AJUSTE, file=paste0(&quot;C:/Users/Jess&#233;/Desktop/R/XLS/&quot;,paste0(format(theDate,&quot;%d-%m-%y&quot;),&quot;.xlsx&quot;)), col.names = (FALSE))
  23. theDate &lt;- theDate + 1
  24. }
  25. else {theDate &lt;- theDate + 1 }
  26. }

答案1

得分: 0

你可以使用 df[,c(filename)] = rep(filename, nrow(df)),其中 df 是包含来自此文件的数据的数据框,而 filename 包含文件的名称。

英文:

You can use df[,c(filename)] = rep(filename, nrow(df)) where df is the data frame containing the data from this file and filename contains the name of the file.

huangapple
  • 本文由 发表于 2020年1月6日 02:38:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/59603033.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定