将一个列中的枢轴ID从一个变为多个,并将它们与另一个列中的字符配对

huangapple go评论84阅读模式
英文:

Pivot ID's from one column to several one and pair them with another column character

问题

你可以使用下面的R代码来实现你想要的数据框转换:

  1. library(dplyr)
  2. library(tidyr)
  3. df_wanted <- df %>%
  4. group_by(ID) %>%
  5. mutate(row = row_number()) %>%
  6. pivot_wider(id_cols = ID, names_from = row, values_from = Fahrzeugart) %>%
  7. rename_with(~paste0("Fahrzeug_", .), starts_with("Col_")) %>%
  8. select(-starts_with("Col_"))

这段代码会将原始数据框 df 转换成你期望的格式,并生成新的数据框 df_wanted,其中每个 Fahrzeugart 对应一个新的列,并且每个 ID 仅有一行。

请注意,为了运行这段代码,你需要先加载 dplyrtidyr 这两个包。

英文:

My problem is the following. I have this data frame:

  1. ID &lt;- c(1,2,NA,3,NA,4,NA,NA,5,NA,NA,NA)
  2. Col_1 &lt;- c(NA,45,NA,23,1,2,8,NA,78,12,NA,19)
  3. Objekt.Nr. &lt;- c(1,1,2,1,2,1,2,3,1,2,3,4)
  4. Fahrzeugart &lt;- c(&quot;E-Bike&quot;, &quot;Fahrrad&quot;, &quot;Fahrrad&quot;, &quot;Fahrrad&quot;, &quot;Bus&quot;, &quot;Bus&quot;, &quot;Fahrrad&quot;, &quot;Auto&quot;, &quot;E-Bike&quot;, &quot;Fahrrad&quot;, &quot;Fahrrad&quot;, &quot;Fahrrad&quot;)
  5. Col_2 &lt;- c(1,2,3,4,NA,5,6,7,NA,89,10,12)
  6. df &lt;- data.frame(ID,Col_1, Objekt.Nr., Fahrzeugart, Col_2)

I need to transform it so that there is only one row for every ID, not several like there are now. For that, I need to pivot the data frame so that every object Objekt.Nr will correspond to a new column with the Fahrzeugart.

My goal is that the data frame will look like this:

  1. ID &lt;- c(1,2,3,4,5)
  2. Fahrzeug_1 &lt;- c(&quot;E-Bike&quot;,&quot;Fahrrad&quot;,&quot;Fahrrad&quot;,&quot;Bus&quot;,&quot;E-Bike&quot;)
  3. Fahrzeug_2 &lt;- c(NA, &quot;Fahrrad&quot;, &quot;Bus&quot;, &quot;Fahrrad&quot;, &quot;Fahrrad&quot;)
  4. Fahrzeug_3 &lt;- c(NA,NA,NA, &quot;Auto&quot;, &quot;Fahrrad&quot;)
  5. Fahrzeug_4 &lt;- c(NA,NA,NA,NA, &quot;Fahrrad&quot;)
  6. col_1 &lt;- c(1,(2,3)...) #merged for every ID
  7. same for Col_2
  8. df_wanted &lt;- data.frame(ID,Fahrzeug_1,Fahrzeug_2,Fahrzeug_3,Fahrzeug_4)

I tried using this code, but it will only return binary values for "Fahrzeugart":

  1. df_melted &lt;- melt(df, id.vars = c(&quot;ID&quot;), measure.vars = c(&quot;Fahrzeugart&quot;))
  2. df_wanted &lt;- dcast(df_melted, ID ~ Objekt.Nr., value.var = &quot;Fahrzeugart&quot;)

Thank you very much!

答案1

得分: 2

你可以使用 tidyr 包中的 fill() 函数来填充缺失的 ID 值,然后再使用 tidyr 包中的 pivot_wider() 函数将数据从长格式转换为宽格式。

  1. library(dplyr)
  2. library(tidyr)
  3. ID <- c(1,2,NA,3,NA,4,NA,NA,5,NA,NA,NA)
  4. Objekt.Nr. <- c(1,1,2,1,2,1,2,3,1,2,3,4)
  5. Fahrzeugart <- c("E-Bike", "Fahrrad", "Fahrrad", "Fahrrad", "Bus", "Bus", "Fahrrad", "Auto", "E-Bike", "Fahrrad", "Fahrrad", "Fahrrad")
  6. df <- data.frame(ID, Objekt.Nr., Fahrzeugart)
  7. df %>%
  8. fill(ID, .direction="down") %>%
  9. pivot_wider(names_from="Objekt.Nr.", values_from = "Fahrzeugart", names_prefix="Fahrzeugart_")
  10. #> # A tibble: 5 × 5
  11. #> ID Fahrzeugart_1 Fahrzeugart_2 Fahrzeugart_3 Fahrzeugart_4
  12. #> <dbl> <chr> <chr> <chr> <chr>
  13. #> 1 1 E-Bike <NA> <NA> <NA>
  14. #> 2 2 Fahrrad Fahrrad <NA> <NA>
  15. #> 3 3 Fahrrad Bus <NA> <NA>
  16. #> 4 4 Bus Fahrrad Auto <NA>
  17. #> 5 5 E-Bike Fahrrad Fahrrad Fahrrad

如果有其他列的话,你可以使用以下方法,允许在数据中包含一些列表列:

  1. library(dplyr)
  2. library(tidyr)
  3. ID <- c(1,2,NA,3,NA,4,NA,NA,5,NA,NA,NA)
  4. Col_1 <- c(NA,45,NA,23,1,2,8,NA,78,12,NA,19)
  5. Objekt.Nr. <- c(1,1,2,1,2,1,2,3,1,2,3,4)
  6. Fahrzeugart <- c("E-Bike", "Fahrrad", "Fahrrad", "Fahrrad", "Bus", "Bus", "Fahrrad", "Auto", "E-Bike", "Fahrrad", "Fahrrad", "Fahrrad")
  7. Col_2 <- c(1,2,3,4,NA,5,6,7,NA,89,10,12)
  8. df <- data.frame(ID,Col_1, Objekt.Nr., Fahrzeugart, Col_2)
  9. df %>%
  10. fill(ID, .direction="down") %>%
  11. pivot_wider(id_cols=ID,
  12. names_from="Objekt.Nr.",
  13. values_from = "Fahrzeugart",
  14. names_prefix="Fahrzeugart_",
  15. unused_fn = list)
  16. #> # A tibble: 5 × 7
  17. #> ID Fahrzeugart_1 Fahrzeugart_2 Fahrzeugart_3 Fahrzeugart_4 Col_1 Col_2
  18. #> <dbl> <chr> <chr> <chr> <chr> <list> <list>
  19. #> 1 1 E-Bike <NA> <NA> <NA> <dbl [1]> <dbl>
  20. #> 2 2 Fahrrad Fahrrad <NA> <NA> <dbl [2]> <dbl>
  21. #> 3 3 Fahrrad Bus <NA> <NA> <dbl [2]> <dbl>
  22. #> 4 4 Bus Fahrrad Auto <NA> <dbl [3]> <dbl>
  23. #> 5 5 E-Bike Fahrrad Fahrrad Fahrrad <dbl [4]> <dbl>
英文:

You can use fill() from the tidyr package to fill in the missing ID values and then pivot_wider() also from the tidyr package to change from long to wide-form.

  1. library(dplyr)
  2. library(tidyr)
  3. ID &lt;- c(1,2,NA,3,NA,4,NA,NA,5,NA,NA,NA)
  4. Objekt.Nr. &lt;- c(1,1,2,1,2,1,2,3,1,2,3,4)
  5. Fahrzeugart &lt;- c(&quot;E-Bike&quot;, &quot;Fahrrad&quot;, &quot;Fahrrad&quot;, &quot;Fahrrad&quot;, &quot;Bus&quot;, &quot;Bus&quot;, &quot;Fahrrad&quot;, &quot;Auto&quot;, &quot;E-Bike&quot;, &quot;Fahrrad&quot;, &quot;Fahrrad&quot;, &quot;Fahrrad&quot;)
  6. df &lt;- data.frame(ID, Objekt.Nr., Fahrzeugart)
  7. df %&gt;%
  8. fill(ID, .direction=&quot;down&quot;) %&gt;%
  9. pivot_wider(names_from=&quot;Objekt.Nr.&quot;, values_from = &quot;Fahrzeugart&quot;, names_prefix=&quot;Fahrzeugart_&quot;)
  10. #&gt; # A tibble: 5 &#215; 5
  11. #&gt; ID Fahrzeugart_1 Fahrzeugart_2 Fahrzeugart_3 Fahrzeugart_4
  12. #&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  13. #&gt; 1 1 E-Bike &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;
  14. #&gt; 2 2 Fahrrad Fahrrad &lt;NA&gt; &lt;NA&gt;
  15. #&gt; 3 3 Fahrrad Bus &lt;NA&gt; &lt;NA&gt;
  16. #&gt; 4 4 Bus Fahrrad Auto &lt;NA&gt;
  17. #&gt; 5 5 E-Bike Fahrrad Fahrrad Fahrrad

<sup>Created on 2023-02-14 by the reprex package (v2.0.1)</sup>


Edit: what if there are other columns

If you're alright having some list columns in your data, you could do the following:

  1. library(dplyr)
  2. library(tidyr)
  3. ID &lt;- c(1,2,NA,3,NA,4,NA,NA,5,NA,NA,NA)
  4. Col_1 &lt;- c(NA,45,NA,23,1,2,8,NA,78,12,NA,19)
  5. Objekt.Nr. &lt;- c(1,1,2,1,2,1,2,3,1,2,3,4)
  6. Fahrzeugart &lt;- c(&quot;E-Bike&quot;, &quot;Fahrrad&quot;, &quot;Fahrrad&quot;, &quot;Fahrrad&quot;, &quot;Bus&quot;, &quot;Bus&quot;, &quot;Fahrrad&quot;, &quot;Auto&quot;, &quot;E-Bike&quot;, &quot;Fahrrad&quot;, &quot;Fahrrad&quot;, &quot;Fahrrad&quot;)
  7. Col_2 &lt;- c(1,2,3,4,NA,5,6,7,NA,89,10,12)
  8. df &lt;- data.frame(ID,Col_1, Objekt.Nr., Fahrzeugart, Col_2)
  9. df %&gt;%
  10. fill(ID, .direction=&quot;down&quot;) %&gt;%
  11. pivot_wider(id_cols=ID,
  12. names_from=&quot;Objekt.Nr.&quot;,
  13. values_from = &quot;Fahrzeugart&quot;,
  14. names_prefix=&quot;Fahrzeugart_&quot;,
  15. unused_fn = list)
  16. #&gt; # A tibble: 5 &#215; 7
  17. #&gt; ID Fahrzeugart_1 Fahrzeugart_2 Fahrzeugart_3 Fahrzeugart_4 Col_1 Col_2
  18. #&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;list&gt; &lt;list&gt;
  19. #&gt; 1 1 E-Bike &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;dbl [1]&gt; &lt;dbl&gt;
  20. #&gt; 2 2 Fahrrad Fahrrad &lt;NA&gt; &lt;NA&gt; &lt;dbl [2]&gt; &lt;dbl&gt;
  21. #&gt; 3 3 Fahrrad Bus &lt;NA&gt; &lt;NA&gt; &lt;dbl [2]&gt; &lt;dbl&gt;
  22. #&gt; 4 4 Bus Fahrrad Auto &lt;NA&gt; &lt;dbl [3]&gt; &lt;dbl&gt;
  23. #&gt; 5 5 E-Bike Fahrrad Fahrrad Fahrrad &lt;dbl [4]&gt; &lt;dbl&gt;

<sup>Created on 2023-02-14 by the reprex package (v2.0.1)</sup>

huangapple
  • 本文由 发表于 2023年2月14日 22:24:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/75449214.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定