使用grep从环境中获取数据框名称,然后使用R中的rbind函数堆叠行。

huangapple go评论65阅读模式
英文:

Want to use grep to get dataframe names from environment and then stack the rows with rbind function in R

问题

I have thousands of dataframes and want to grep their names into a character vector. Then use the vector to complete the rbind function. Any suggestions?

dat1lkq6 <- data.frame(color = c('COLOR: RED', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE'))
dat1ah2 <- data.frame(style = c('SPORTY', 'HYBRID', 'FORMAL', 'CASUAL', 'CASUAL'))
dat29fg <- data.frame(color = c('COLOR: RED', 'COLOR: CYAN', 'COLOR: BLUE', 'COLOR: RED', 'COLOR: BLUE'))
dat2xl <- data.frame(color = c('COLOR: RED', 'COLOR: CYAN', 'COLOR: BLUE', 'COLOR: RED', 'COLOR: BLUE'))
dat3g49 <- data.frame(color = c('COLOR: PURPLE', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE'))
skus4 <- data.frame(sku = c('SKU: 1849354', 'SKU: 392856', 'SKU: 921385', 'SKU: 6395474', 'SKU: 8532449', 'SKU: 0285468', 'SKU: 2948327'))

#grep to get only "dat" dataframe names
all_dat_df <- base::ls(all.names = TRUE)[base::grep("^dat", base::ls(all.names = TRUE))]

#want to stack all the "dat" df's into one df, but not working
#result dataframe should have 25 rows
rbind(all_dat_df)

#tried various incarnations of dput, gsub, paste, noquote to no success

(Note: The code you provided seems to be a mix of HTML encoding and R code, and it contains some HTML entities like < and ", which might need further processing to be valid R code.)

英文:

I have thousands of dataframes and want to grep their names into a character vector. Then use the vector to complete the rbind function. Any suggestions?

dat1lkq6 <- data.frame(color = c('COLOR: RED', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE'))
dat1ah2 <- data.frame(style = c('SPORTY', 'HYBRID', 'FORMAL', 'CASUAL', 'CASUAL'))
dat29fg <- data.frame(color = c('COLOR: RED', 'COLOR: CYAN', 'COLOR: BLUE', 'COLOR: RED', 'COLOR: BLUE'))
dat2xl <- data.frame(color = c('COLOR: RED', 'COLOR: CYAN', 'COLOR: BLUE', 'COLOR: RED', 'COLOR: BLUE'))
dat3g49 <- data.frame(color = c('COLOR: PURPLE', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE'))
skus4 <- data.frame(sku = c('SKU: 1849354', 'SKU: 392856', 'SKU: 921385', 'SKU: 6395474', 'SKU: 8532449', 'SKU: 0285468', 'SKU: 2948327'))

#grep to get only "dat" dataframe names
all_dat_df <- base::ls(all.names = TRUE)[base::grep("^dat", base::ls(all.names = TRUE))]

#want to stack all the "dat" df's into one df, but not working
#result dataframe should have 25 rows
rbind(all_dat_df)

#tried various incarnations of dput, gsub, paste, noquote to no success

答案1

得分: 2

使用`grep()`函数并设置`value = TRUE`以获取匹配的对象名称。然后使用`mget()`函数从环境中获取关联数据框的列表。最后,您可以使用`dplyr::bind_rows()`或可能是`do.call(rbind)`将它们合并为一个数据框。
我使用了`dplyr::bind_rows()`,因为它处理了列名在不同数据框间不同的情况,就像您的示例数据中那样。如果列名*不*不同,而且您更喜欢基本的R解决方案,您可以使用`do.call(what = rbind)`。
英文:

Use grep() with value = TRUE to get matching object names. Then use mget() to get a list of associated dataframes from the environment. Finally, you can use dplyr::bind_rows() or possibly do.call(rbind) to combine into one.

library(dplyr)

grep("^dat", ls(), value = TRUE) |>
  mget() |>
  bind_rows()
    style         color
1  SPORTY          <NA>
2  HYBRID          <NA>
3  FORMAL          <NA>
4  CASUAL          <NA>
5  CASUAL          <NA>
6    <NA>    COLOR: RED
7    <NA>    COLOR: RED
8    <NA>   COLOR: BLUE
9    <NA>  COLOR: GREEN
10   <NA>   COLOR: BLUE
11   <NA>    COLOR: RED
12   <NA>   COLOR: CYAN
13   <NA>   COLOR: BLUE
14   <NA>    COLOR: RED
15   <NA>   COLOR: BLUE
16   <NA>    COLOR: RED
17   <NA>   COLOR: CYAN
18   <NA>   COLOR: BLUE
19   <NA>    COLOR: RED
20   <NA>   COLOR: BLUE
21   <NA> COLOR: PURPLE
22   <NA>    COLOR: RED
23   <NA>   COLOR: BLUE
24   <NA>  COLOR: GREEN
25   <NA>   COLOR: BLUE

I used dplyr::bind_rows() because it handles cases where column names differ across dataframes, as in your example data. If columns don’t differ and you prefer a base R solution, you could use do.call(rbind).

grep("^dat", ls(), value = TRUE) |>
  mget() |>
  do.call(what = rbind)

答案2

得分: 2

你可以在ls中设置pattern^dat以获取以dat开头的项。

all_dat_df <- ls(pattern = "^dat")
all_dat_df
#[1] "dat1ah2"  "dat1lkq6" "dat29fg"  "dat2xl"   "dat3g49"

do.call(dplyr::bind_rows, mget(all_dat_df))
#    style         color
#1  SPORTY          <NA>
#2  HYBRID          <NA>
#3  FORMAL          <NA>
#4  CASUAL          <NA>
#5  CASUAL          <NA>
#6     <NA>    COLOR: RED
#7     <NA>    COLOR: RED
#8     <NA>   COLOR: BLUE
#9     <NA>  COLOR: GREEN
#10    <NA>   COLOR: BLUE
#11    <NA>    COLOR: RED
#12    <NA>   COLOR: CYAN
#13    <NA>   COLOR: BLUE
#14    <NA>    COLOR: RED
#15    <NA>   COLOR: BLUE
#16    <NA>    COLOR: RED
#17    <NA>   COLOR: CYAN
#18    <NA>   COLOR: BLUE
#19    <NA>    COLOR: RED
#20    <NA>   COLOR: BLUE
#21    <NA> COLOR: PURPLE
#22    <NA>    COLOR: RED
#23    <NA>   COLOR: BLUE
#24    <NA>  COLOR: GREEN
#25    <NA>   COLOR: BLUE
英文:

You can set pattern in ls to ^dat to get only those starting with dat.

all_dat_df &lt;- ls(pattern = &quot;^dat&quot;)
all_dat_df
#[1] &quot;dat1ah2&quot;  &quot;dat1lkq6&quot; &quot;dat29fg&quot;  &quot;dat2xl&quot;   &quot;dat3g49&quot;

#do.call(rbind, mget(all_dat_df)) #Does not work, as the named are not the same
do.call(dplyr::bind_rows, mget(all_dat_df))
#    style         color
#1  SPORTY          &lt;NA&gt;
#2  HYBRID          &lt;NA&gt;
#3  FORMAL          &lt;NA&gt;
#4  CASUAL          &lt;NA&gt;
#5  CASUAL          &lt;NA&gt;
#6    &lt;NA&gt;    COLOR: RED
#7    &lt;NA&gt;    COLOR: RED
#8    &lt;NA&gt;   COLOR: BLUE
#9    &lt;NA&gt;  COLOR: GREEN
#10   &lt;NA&gt;   COLOR: BLUE
#11   &lt;NA&gt;    COLOR: RED
#12   &lt;NA&gt;   COLOR: CYAN
#13   &lt;NA&gt;   COLOR: BLUE
#14   &lt;NA&gt;    COLOR: RED
#15   &lt;NA&gt;   COLOR: BLUE
#16   &lt;NA&gt;    COLOR: RED
#17   &lt;NA&gt;   COLOR: CYAN
#18   &lt;NA&gt;   COLOR: BLUE
#19   &lt;NA&gt;    COLOR: RED
#20   &lt;NA&gt;   COLOR: BLUE
#21   &lt;NA&gt; COLOR: PURPLE
#22   &lt;NA&gt;    COLOR: RED
#23   &lt;NA&gt;   COLOR: BLUE
#24   &lt;NA&gt;  COLOR: GREEN
#25   &lt;NA&gt;   COLOR: BLUE

答案3

得分: 1

以下是您要翻译的代码部分:

首先,您使用grep()函数来获取所有包含data.frame的对象的名称,就像您已经在做的那样:

all_objects &lt;- base::ls()
all_dat_df &lt;- all_objects[base::grep(&quot;^dat&quot;, all_objects)]

现在,all_dat_df对象是一个字符向量,包含了一组对象名称。但这些只是对象的名称,它没有指向这些对象存在的位置,它包含的值等信息。

所以您需要将这些名称转换为对您想要合并的实际对象/数据框的引用。为此,您使用lapply()函数将这些对象收集到一个列表中。

lapply()函数将对all_dat_df中的每个对象名称应用get()函数。这个get()函数将获取您用对象名称引用的对象的实际引用。lapply()将只是将get()的结果存储在一个R列表中。

list_of_data_frames &lt;- lapply(all_dat_df, get, envir = globalenv())

之后,您只需在收集到的数据框列表上应用dplyr::bind_rows()函数:

big_data_frame &lt;- dplyr::bind_rows(list_of_data_frames)

现在,big_data_frame对象包含一个单一的数据框,其中包含了通过grep()在您的全局环境中找到的所有数据框的所有行。

**注意:**为了使用dplyr::bind_rows()函数,您需要在您的计算机上安装dplyr包。如果您尚未安装此包,请使用以下代码来安装它:

install.packages(&quot;dplyr&quot;)
英文:

First, you use the grep() function to get the names of all objects that holds a data.frame, as you are already doing:

all_objects &lt;- base::ls()
all_dat_df &lt;- all_objects[base::grep(&quot;^dat&quot;, all_objects)]

Now, the all_dat_df object is a character vector that holds a list of object names. But these are just the names of the objects, it has no reference to "where" this object lives, "what" values it holds, etc.

So you need to transform these names, into actual references to the objects/data.frames you want to combine. To do that, you ask R to collect these objects into a list with the lapply() function.

The lapply() function will apply the get() function to each object name in all_dat_df. This get() function will get the actual reference of the object that you are referencing with the object name. lapply() will just store the results of get() inside a R list.

list_of_data_frames &lt;- lapply(all_dat_df, get, envir = globalenv())

After that, you just need to apply the dplyr::bind_rows() function over the list of data.frames you collected:

big_data_frame &lt;- dplyr::bind_rows(list_of_data_frames)

Now, the big_data_frame object holds a single data.frame that have all rows from all data.frames found at your global environment by grep().

NOTE: In order to use the dplyr::bind_rows() function, you need to have the dplyr package installed in your machine. If you do not have this package installed, use the code below to install it:

install.packages(&quot;dplyr&quot;)

huangapple
  • 本文由 发表于 2023年4月17日 07:45:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76030874.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定