从一行中提取数据,创建一个新的列,每个ID对应一个列。

huangapple go评论63阅读模式
英文:

Pulling data from a row up to a new column for each ID

问题

我有一个包含不同ID的多年数值的数据框。目前,它的显示方式如下:

(可以使用以下代码重新创建):

df<-data.frame(WLH_ID=c("15-7318","15-7318","15-7319","15-7319","15-7320","15-7320","15-7320"),
year=c("2017","2018","2017","2018","2017","2018","2019"),
overlap_95=c("1","1","0.626311190934023","0.968386735773874","0.713286882088087","0.824103998854928","0.451493099154607"))

我希望将其重塑为如下所示:

基本上,我想从行中提取值,使每年都在自己的列中,但保留与ID相同的数据,以便我可以在年度之间进行比较。

一些ID可能具有比其他ID更多年的数据,对于这种情况,我希望额外的年份表示为NAs或NULLs。

我假设这是可以完成的事情,只是不知道从哪里开始。我也没有找到已经回答过这个问题,但可能是我的措辞不对。

提前感谢!

英文:

I have data in a dataframe that contains values across multiple years for different IDs. Currently, it appears as follows:

从一行中提取数据,创建一个新的列,每个ID对应一个列。

(can be recreated with the following code):

df&lt;-data.frame(WLH_ID=c(&quot;15-7318&quot;,&quot;15-7318&quot;,&quot;15-7319&quot;,&quot;15-7319&quot;,&quot;15-7320&quot;,&quot;15-7320&quot;,&quot;15-7320&quot;),
year=c(&quot;2017&quot;,&quot;2018&quot;,&quot;2017&quot;,&quot;2018&quot;,&quot;2017&quot;,&quot;2018&quot;,&quot;2019&quot;),
overlap_95=c(&quot;1&quot;,&quot;1&quot;,&quot;0.626311190934023&quot;,&quot;0.968386735773874&quot;,&quot;0.713286882088087&quot;,&quot;0.824103998854928&quot;,&quot;0.451493099154607&quot;))

I hope to reshape it to appear as follows::

Essentially I want to pull the value from the row so each year is in its own column- but keep the data in the same row as the ID if that makes sense so I can compare across each ID between years

从一行中提取数据,创建一个新的列,每个ID对应一个列。

Some IDs may have more years' worth of data than others, and in such cases, I would like the extra years to be represented as NAs or NULLs.

I am assuming this is something that can be done- just don't know where to start. I couldn't find this question already answered either but I could have worded it wrong.

Thanks in advance!

答案1

得分: 1

我认为您可能只需要使用 dplyr::pivot_wider() 函数:

> df %>%
   pivot_wider(names_from = year, values_from = overlap_95, names_prefix = "overlap_95_")

一个数据框:3 行 × 4 列

WLH_ID overlap_95_2017 overlap_95_2018 overlap_95_2019

1 15-7318 1 1 NA
2 15-7319 0.626311190934023 0.968386735773874 NA
3 15-7320 0.713286882088087 0.824103998854928 0.451493099154607


<details>
<summary>英文:</summary>

I think maybe you just need `dplyr::pivot_wider()`?

    &gt; df %&gt;%
       pivot_wider(names_from = year,values_from = overlap_95,names_prefix = &quot;overlap_95_&quot;)
    
    # A tibble: 3 &#215; 4
      WLH_ID  overlap_95_2017   overlap_95_2018   overlap_95_2019  
      &lt;chr&gt;   &lt;chr&gt;             &lt;chr&gt;             &lt;chr&gt;            
    1 15-7318 1                 1                 NA               
    2 15-7319 0.626311190934023 0.968386735773874 NA               
    3 15-7320 0.713286882088087 0.824103998854928 0.451493099154607

</details>



# 答案2
**得分**: 0

使用 `pivot_wider()`,接着使用 `rename()` 重命名列名,最后使用 `replace()` 将缺失值替换为 `blank` 单元格的方式之一如下:

```R
library(tidyverse)

dfw <- df %>%
  pivot_wider(names_from = "year", values_from = "overlap_95") %>%
  rename(overlap_95Y1 = '2017', overlap_95Y2 = '2018', overlap_95Y3 = '2019') %>%
  replace(is.na(.), "")

dfw 应该类似于这样:

View(dfw)

从一行中提取数据,创建一个新的列,每个ID对应一个列。

英文:

One way is using pivot_wider(), following rename() your columns, and finally, replace() to replace missing values with blank cells:

library(tidyverse)

dfw &lt;- df %&gt;% pivot_wider(names_from = &quot;year&quot;, values_from = &quot;overlap_95&quot;) %&gt;%
      rename(verlap_95Y1=&#39;2017&#39;,verlap_95Y2=&#39;2018&#39;,verlap_95Y3=&#39;2019&#39;) %&gt;%
      replace(is.na(.), &quot;&quot;)

The dfw should look like this:

View(dfw)

从一行中提取数据,创建一个新的列,每个ID对应一个列。

huangapple
  • 本文由 发表于 2023年3月12日 08:21:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75710372.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定