英文:
Pulling data from a row up to a new column for each ID
问题
我有一个包含不同ID的多年数值的数据框。目前,它的显示方式如下:
(可以使用以下代码重新创建):
df<-data.frame(WLH_ID=c("15-7318","15-7318","15-7319","15-7319","15-7320","15-7320","15-7320"),
year=c("2017","2018","2017","2018","2017","2018","2019"),
overlap_95=c("1","1","0.626311190934023","0.968386735773874","0.713286882088087","0.824103998854928","0.451493099154607"))
我希望将其重塑为如下所示:
基本上,我想从行中提取值,使每年都在自己的列中,但保留与ID相同的数据,以便我可以在年度之间进行比较。
一些ID可能具有比其他ID更多年的数据,对于这种情况,我希望额外的年份表示为NAs或NULLs。
我假设这是可以完成的事情,只是不知道从哪里开始。我也没有找到已经回答过这个问题,但可能是我的措辞不对。
提前感谢!
英文:
I have data in a dataframe that contains values across multiple years for different IDs. Currently, it appears as follows:
(can be recreated with the following code):
df<-data.frame(WLH_ID=c("15-7318","15-7318","15-7319","15-7319","15-7320","15-7320","15-7320"),
year=c("2017","2018","2017","2018","2017","2018","2019"),
overlap_95=c("1","1","0.626311190934023","0.968386735773874","0.713286882088087","0.824103998854928","0.451493099154607"))
I hope to reshape it to appear as follows::
Essentially I want to pull the value from the row so each year is in its own column- but keep the data in the same row as the ID if that makes sense so I can compare across each ID between years
Some IDs may have more years' worth of data than others, and in such cases, I would like the extra years to be represented as NAs or NULLs.
I am assuming this is something that can be done- just don't know where to start. I couldn't find this question already answered either but I could have worded it wrong.
Thanks in advance!
答案1
得分: 1
我认为您可能只需要使用 dplyr::pivot_wider()
函数:
> df %>%
pivot_wider(names_from = year, values_from = overlap_95, names_prefix = "overlap_95_")
一个数据框:3 行 × 4 列
WLH_ID overlap_95_2017 overlap_95_2018 overlap_95_2019
1 15-7318 1 1 NA
2 15-7319 0.626311190934023 0.968386735773874 NA
3 15-7320 0.713286882088087 0.824103998854928 0.451493099154607
<details>
<summary>英文:</summary>
I think maybe you just need `dplyr::pivot_wider()`?
> df %>%
pivot_wider(names_from = year,values_from = overlap_95,names_prefix = "overlap_95_")
# A tibble: 3 × 4
WLH_ID overlap_95_2017 overlap_95_2018 overlap_95_2019
<chr> <chr> <chr> <chr>
1 15-7318 1 1 NA
2 15-7319 0.626311190934023 0.968386735773874 NA
3 15-7320 0.713286882088087 0.824103998854928 0.451493099154607
</details>
# 答案2
**得分**: 0
使用 `pivot_wider()`,接着使用 `rename()` 重命名列名,最后使用 `replace()` 将缺失值替换为 `blank` 单元格的方式之一如下:
```R
library(tidyverse)
dfw <- df %>%
pivot_wider(names_from = "year", values_from = "overlap_95") %>%
rename(overlap_95Y1 = '2017', overlap_95Y2 = '2018', overlap_95Y3 = '2019') %>%
replace(is.na(.), "")
dfw
应该类似于这样:
View(dfw)
英文:
One way is using pivot_wider()
, following rename()
your columns, and finally, replace()
to replace missing values with blank
cells:
library(tidyverse)
dfw <- df %>% pivot_wider(names_from = "year", values_from = "overlap_95") %>%
rename(verlap_95Y1='2017',verlap_95Y2='2018',verlap_95Y3='2019') %>%
replace(is.na(.), "")
The dfw
should look like this:
View(dfw)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论