使用grep从环境中获取数据框名称,然后使用R中的rbind函数堆叠行。

huangapple go评论99阅读模式
英文:

Want to use grep to get dataframe names from environment and then stack the rows with rbind function in R

问题

I have thousands of dataframes and want to grep their names into a character vector. Then use the vector to complete the rbind function. Any suggestions?

  1. dat1lkq6 <- data.frame(color = c('COLOR: RED', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE'))
  2. dat1ah2 <- data.frame(style = c('SPORTY', 'HYBRID', 'FORMAL', 'CASUAL', 'CASUAL'))
  3. dat29fg <- data.frame(color = c('COLOR: RED', 'COLOR: CYAN', 'COLOR: BLUE', 'COLOR: RED', 'COLOR: BLUE'))
  4. dat2xl <- data.frame(color = c('COLOR: RED', 'COLOR: CYAN', 'COLOR: BLUE', 'COLOR: RED', 'COLOR: BLUE'))
  5. dat3g49 <- data.frame(color = c('COLOR: PURPLE', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE'))
  6. skus4 <- data.frame(sku = c('SKU: 1849354', 'SKU: 392856', 'SKU: 921385', 'SKU: 6395474', 'SKU: 8532449', 'SKU: 0285468', 'SKU: 2948327'))
  7. #grep to get only "dat" dataframe names
  8. all_dat_df <- base::ls(all.names = TRUE)[base::grep("^dat", base::ls(all.names = TRUE))]
  9. #want to stack all the "dat" df's into one df, but not working
  10. #result dataframe should have 25 rows
  11. rbind(all_dat_df)
  12. #tried various incarnations of dput, gsub, paste, noquote to no success

(Note: The code you provided seems to be a mix of HTML encoding and R code, and it contains some HTML entities like < and ", which might need further processing to be valid R code.)

英文:

I have thousands of dataframes and want to grep their names into a character vector. Then use the vector to complete the rbind function. Any suggestions?

  1. dat1lkq6 <- data.frame(color = c('COLOR: RED', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE'))
  2. dat1ah2 <- data.frame(style = c('SPORTY', 'HYBRID', 'FORMAL', 'CASUAL', 'CASUAL'))
  3. dat29fg <- data.frame(color = c('COLOR: RED', 'COLOR: CYAN', 'COLOR: BLUE', 'COLOR: RED', 'COLOR: BLUE'))
  4. dat2xl <- data.frame(color = c('COLOR: RED', 'COLOR: CYAN', 'COLOR: BLUE', 'COLOR: RED', 'COLOR: BLUE'))
  5. dat3g49 <- data.frame(color = c('COLOR: PURPLE', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE'))
  6. skus4 <- data.frame(sku = c('SKU: 1849354', 'SKU: 392856', 'SKU: 921385', 'SKU: 6395474', 'SKU: 8532449', 'SKU: 0285468', 'SKU: 2948327'))
  7. #grep to get only "dat" dataframe names
  8. all_dat_df <- base::ls(all.names = TRUE)[base::grep("^dat", base::ls(all.names = TRUE))]
  9. #want to stack all the "dat" df's into one df, but not working
  10. #result dataframe should have 25 rows
  11. rbind(all_dat_df)
  12. #tried various incarnations of dput, gsub, paste, noquote to no success

答案1

得分: 2

  1. 使用`grep()`函数并设置`value = TRUE`以获取匹配的对象名称。然后使用`mget()`函数从环境中获取关联数据框的列表。最后,您可以使用`dplyr::bind_rows()`或可能是`do.call(rbind)`将它们合并为一个数据框。
  1. 我使用了`dplyr::bind_rows()`,因为它处理了列名在不同数据框间不同的情况,就像您的示例数据中那样。如果列名*不*不同,而且您更喜欢基本的R解决方案,您可以使用`do.call(what = rbind)`
英文:

Use grep() with value = TRUE to get matching object names. Then use mget() to get a list of associated dataframes from the environment. Finally, you can use dplyr::bind_rows() or possibly do.call(rbind) to combine into one.

  1. library(dplyr)
  2. grep("^dat", ls(), value = TRUE) |>
  3. mget() |>
  4. bind_rows()
  1. style color
  2. 1 SPORTY <NA>
  3. 2 HYBRID <NA>
  4. 3 FORMAL <NA>
  5. 4 CASUAL <NA>
  6. 5 CASUAL <NA>
  7. 6 <NA> COLOR: RED
  8. 7 <NA> COLOR: RED
  9. 8 <NA> COLOR: BLUE
  10. 9 <NA> COLOR: GREEN
  11. 10 <NA> COLOR: BLUE
  12. 11 <NA> COLOR: RED
  13. 12 <NA> COLOR: CYAN
  14. 13 <NA> COLOR: BLUE
  15. 14 <NA> COLOR: RED
  16. 15 <NA> COLOR: BLUE
  17. 16 <NA> COLOR: RED
  18. 17 <NA> COLOR: CYAN
  19. 18 <NA> COLOR: BLUE
  20. 19 <NA> COLOR: RED
  21. 20 <NA> COLOR: BLUE
  22. 21 <NA> COLOR: PURPLE
  23. 22 <NA> COLOR: RED
  24. 23 <NA> COLOR: BLUE
  25. 24 <NA> COLOR: GREEN
  26. 25 <NA> COLOR: BLUE

I used dplyr::bind_rows() because it handles cases where column names differ across dataframes, as in your example data. If columns don’t differ and you prefer a base R solution, you could use do.call(rbind).

  1. grep("^dat", ls(), value = TRUE) |>
  2. mget() |>
  3. do.call(what = rbind)

答案2

得分: 2

你可以在ls中设置pattern^dat以获取以dat开头的项。

  1. all_dat_df <- ls(pattern = "^dat")
  2. all_dat_df
  3. #[1] "dat1ah2" "dat1lkq6" "dat29fg" "dat2xl" "dat3g49"
  4. do.call(dplyr::bind_rows, mget(all_dat_df))
  5. # style color
  6. #1 SPORTY <NA>
  7. #2 HYBRID <NA>
  8. #3 FORMAL <NA>
  9. #4 CASUAL <NA>
  10. #5 CASUAL <NA>
  11. #6 <NA> COLOR: RED
  12. #7 <NA> COLOR: RED
  13. #8 <NA> COLOR: BLUE
  14. #9 <NA> COLOR: GREEN
  15. #10 <NA> COLOR: BLUE
  16. #11 <NA> COLOR: RED
  17. #12 <NA> COLOR: CYAN
  18. #13 <NA> COLOR: BLUE
  19. #14 <NA> COLOR: RED
  20. #15 <NA> COLOR: BLUE
  21. #16 <NA> COLOR: RED
  22. #17 <NA> COLOR: CYAN
  23. #18 <NA> COLOR: BLUE
  24. #19 <NA> COLOR: RED
  25. #20 <NA> COLOR: BLUE
  26. #21 <NA> COLOR: PURPLE
  27. #22 <NA> COLOR: RED
  28. #23 <NA> COLOR: BLUE
  29. #24 <NA> COLOR: GREEN
  30. #25 <NA> COLOR: BLUE
英文:

You can set pattern in ls to ^dat to get only those starting with dat.

  1. all_dat_df &lt;- ls(pattern = &quot;^dat&quot;)
  2. all_dat_df
  3. #[1] &quot;dat1ah2&quot; &quot;dat1lkq6&quot; &quot;dat29fg&quot; &quot;dat2xl&quot; &quot;dat3g49&quot;
  4. #do.call(rbind, mget(all_dat_df)) #Does not work, as the named are not the same
  5. do.call(dplyr::bind_rows, mget(all_dat_df))
  6. # style color
  7. #1 SPORTY &lt;NA&gt;
  8. #2 HYBRID &lt;NA&gt;
  9. #3 FORMAL &lt;NA&gt;
  10. #4 CASUAL &lt;NA&gt;
  11. #5 CASUAL &lt;NA&gt;
  12. #6 &lt;NA&gt; COLOR: RED
  13. #7 &lt;NA&gt; COLOR: RED
  14. #8 &lt;NA&gt; COLOR: BLUE
  15. #9 &lt;NA&gt; COLOR: GREEN
  16. #10 &lt;NA&gt; COLOR: BLUE
  17. #11 &lt;NA&gt; COLOR: RED
  18. #12 &lt;NA&gt; COLOR: CYAN
  19. #13 &lt;NA&gt; COLOR: BLUE
  20. #14 &lt;NA&gt; COLOR: RED
  21. #15 &lt;NA&gt; COLOR: BLUE
  22. #16 &lt;NA&gt; COLOR: RED
  23. #17 &lt;NA&gt; COLOR: CYAN
  24. #18 &lt;NA&gt; COLOR: BLUE
  25. #19 &lt;NA&gt; COLOR: RED
  26. #20 &lt;NA&gt; COLOR: BLUE
  27. #21 &lt;NA&gt; COLOR: PURPLE
  28. #22 &lt;NA&gt; COLOR: RED
  29. #23 &lt;NA&gt; COLOR: BLUE
  30. #24 &lt;NA&gt; COLOR: GREEN
  31. #25 &lt;NA&gt; COLOR: BLUE

答案3

得分: 1

以下是您要翻译的代码部分:

首先,您使用grep()函数来获取所有包含data.frame的对象的名称,就像您已经在做的那样:

  1. all_objects &lt;- base::ls()
  2. all_dat_df &lt;- all_objects[base::grep(&quot;^dat&quot;, all_objects)]

现在,all_dat_df对象是一个字符向量,包含了一组对象名称。但这些只是对象的名称,它没有指向这些对象存在的位置,它包含的值等信息。

所以您需要将这些名称转换为对您想要合并的实际对象/数据框的引用。为此,您使用lapply()函数将这些对象收集到一个列表中。

lapply()函数将对all_dat_df中的每个对象名称应用get()函数。这个get()函数将获取您用对象名称引用的对象的实际引用。lapply()将只是将get()的结果存储在一个R列表中。

  1. list_of_data_frames &lt;- lapply(all_dat_df, get, envir = globalenv())

之后,您只需在收集到的数据框列表上应用dplyr::bind_rows()函数:

  1. big_data_frame &lt;- dplyr::bind_rows(list_of_data_frames)

现在,big_data_frame对象包含一个单一的数据框,其中包含了通过grep()在您的全局环境中找到的所有数据框的所有行。

**注意:**为了使用dplyr::bind_rows()函数,您需要在您的计算机上安装dplyr包。如果您尚未安装此包,请使用以下代码来安装它:

  1. install.packages(&quot;dplyr&quot;)
英文:

First, you use the grep() function to get the names of all objects that holds a data.frame, as you are already doing:

  1. all_objects &lt;- base::ls()
  2. all_dat_df &lt;- all_objects[base::grep(&quot;^dat&quot;, all_objects)]

Now, the all_dat_df object is a character vector that holds a list of object names. But these are just the names of the objects, it has no reference to "where" this object lives, "what" values it holds, etc.

So you need to transform these names, into actual references to the objects/data.frames you want to combine. To do that, you ask R to collect these objects into a list with the lapply() function.

The lapply() function will apply the get() function to each object name in all_dat_df. This get() function will get the actual reference of the object that you are referencing with the object name. lapply() will just store the results of get() inside a R list.

  1. list_of_data_frames &lt;- lapply(all_dat_df, get, envir = globalenv())

After that, you just need to apply the dplyr::bind_rows() function over the list of data.frames you collected:

  1. big_data_frame &lt;- dplyr::bind_rows(list_of_data_frames)

Now, the big_data_frame object holds a single data.frame that have all rows from all data.frames found at your global environment by grep().

NOTE: In order to use the dplyr::bind_rows() function, you need to have the dplyr package installed in your machine. If you do not have this package installed, use the code below to install it:

  1. install.packages(&quot;dplyr&quot;)

huangapple
  • 本文由 发表于 2023年4月17日 07:45:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76030874.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定