可以使用lapply在R中更新多个数据集吗?

huangapple go评论79阅读模式
英文:

Is it possible to update multiple datasets using lapply in R?

问题

我目前正在尝试通过向每个数据集添加新列来更新多个数据集。

我已经阅读了这个问题上的解决方案。
但是运行

lapply(list(annual_2022_v2, bottom_2022_v2, q1_2022_v2, q2_2022_v2, q3_2022_v2, q4_2022_v2, top_2022_v2), transform, start_hour = hour(started_at))

只会打印正确的输出,但不会更新或添加新列到我的原始数据集。

为了在单个数据集上测试它,我做了以下操作,

lapply(list(q1_2022_v2), transform, start_hour = hour(started_at)).

虽然它确实打印出了具有新列的正确数据集,但它并没有更新它。

我试图找出"最佳"的方法,以便能够编写某种循环,而不是硬编码8个不同的数据集,例如

q1_2022_v2$start_hour <- hour(q1_2022_v2$started_at)
q2_2022_v2$start_hour <- hour(q2_2022_v2$started_at)
q3_2022_v2$start_hour <- hour(q3_2022_v2$started_at)
q4_2022_v2$start_hour <- hour(q4_2022_v2$started_at)

我还看到一些使用Map()和cbind()的解决方案,但我对它们的工作方式感到困惑。

我最终决定不要复杂化事情,只处理一个数据集。

英文:

I am currently trying to update multiple datasets by adding a new column to each of them.

I did read the solution on this question.
However running

lapply(list(annual_2022_v2, bottom_2022_v2, q1_2022_v2, q2_2022_v2, q3_2022_v2, q4_2022_v2, top_2022_v2), transform, start_hour = hour(started_at))

only printed the correct output, but didn't update or added the new column to my original datasets.

To test it on an individual dataset I did,

lapply(list(q1_2022_v2), transform, start_hour = hour(started_at)).

Although it did print the correct dataset with the new column, it didn't update it.

I am trying to figure out the "optimal" way to be able to write some sort of loop, rather than hard-coding 8 different datasets, such as

q1_2022_v2$start_hour &lt;- hour(q1_2022_v2$started_at)
q2_2022_v2$start_hour &lt;- hour(q2_2022_v2$started_at)
q3_2022_v2$start_hour &lt;- hour(q3_2022_v2$started_at)
q4_2022_v2$start_hour &lt;- hour(q4_2022_v2$started_at)

I also see solutions using Map() and cbind(), but I am confused on how they work.


I eventually decided not to complicate things and just work with one dataset.

答案1

得分: 2

如果您不分配它,lapply 的返回值会丢失。lapply 不是一个 for 循环,它执行函数式编程。您看到的是它的返回值。

首先将这些数据集放入一个列表中。我强烈怀疑它们都具有相同的结构,这意味着它们在创建或导入时就不应该分开,也就是在创建/导入它们时将它们放入列表中。

all_2022_v2 <- mget(ls(pattern = glob2rx("*_2022_v2")))

all_2022_v2 <- lapply(all_2022_v2, transform, start_hour = hour(started_at))

您可能应该使用 rbind 函数合并这四个数据集,并将 q 作为分组列。

英文:

If you don't assign it, lapply's return value is lost. lapply is not a for loop, it does functional programming. What you see printed is its return value.

Start with putting these datasets into a list. I strongly suspect they all have the same structure, which means they should have never been separate, i.e. put them into the list when they are created/imported.

all_2022_v2 &lt;- mget(ls(pattern = glob2rx(&quot;*_2022_v2&quot;)))

all_2022_v2 &lt;- lapply(all_2022_v2, transform, start_hour = hour(started_at))

You should probably rbind the four datasets and have q as a grouping column.

答案2

得分: 0

我认为你需要将该代码分配给新的数据,尝试这样做:

df <- lapply(list(data), transform, newcol = somevalue)
英文:

i think you need to assigned that code to new data try this

df&lt;-lapply(list(data), transform, newcol = somevalue)

huangapple
  • 本文由 发表于 2023年3月7日 14:05:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75658514.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定