生成新的行并使用数据框中特定参数的值进行赋值。

huangapple go评论92阅读模式
英文:

Generate new rows and assign values using data from specific parameters in dataframe

问题

我一直想使用特定参数为未知数据分配值。例如,我有特定湖泊在2004年的土地利用数据。然而,我没有这些湖泊在2002年和2003年的土地利用数据。有没有办法使用2004年的数据添加新的行,以包括2002-2003年的土地利用数据?我将提供一个展示我的数据现在是什么样子的数据集。

这是我想要的数据框的样子:

湖泊 年份 城市 农业 湿地 森林 灌木
A 2002 90 5 1 1 3
A 2003 90 5 1 1 3
A 2004 90 5 1 1 3
B 2002 80 1 4 5 10
B 2003 80 1 4 5 10
B 2004 80 1 4 5 10
C 2002 20 20 20 20 20
C 2003 20 20 20 20 20
C 2004 20 20 20 20 20

希望对您有所帮助。再次感谢!

英文:

I have been wanting to assign values for unknown data using specific parameters. For example, I have land usage data for specific lakes for the year 2004. However, I do not have land use data for these lakes for the years 2002 & 2003. Is there a way to add new rows with land use data for the years 2002-2003 using 2004 data? I will provide a data set for how my data looks like now.

Lake year urban agriculture wetlands forest shrub
A 2004 90 5 1 1 3
B 2004 80 1 4 5 10
C 2004 20 20 20 20 20

This is what I would like my data frame to look like:

Lake year urban agriculture wetlands forest shrub
A 2002 90 5 1 1 3
A 2003 90 5 1 1 3
A 2004 90 5 1 1 3
B 2002 80 1 4 5 10
B 2003 80 1 4 5 10
B 2004 80 1 4 5 10
C 2002 20 20 20 20 20
C 2003 20 20 20 20 20
C 2004 20 20 20 20 20

Any help is appreciated. Thanks again!

I do not know where to start so I have not tried anything.

答案1

得分: 1

毫无疑问,有很多方法可以做到这一点。如果这确实是您的用例,那么对于每一行,您可以将 year 转换为一个包含值 2002:2004 的列表。然后,只需使用 unnest() 函数对年份列表进行展开,所有其他数据将被复制以适用于所有 year 的值:

  1. library(dplyr)
  2. library(tidyr)
  3. tab <- read.table(header=TRUE, textConnection("Lake year urban agriculture wetlands forest shrub
  4. A 2004 90 5 1 1 3
  5. B 2004 80 1 4 5 10
  6. C 2004 20 20 20 20 20"))
  7. tab %>%
  8. rowwise() %>%
  9. mutate(year = list(2002:2004)) %>%
  10. unnest(year)
  11. #> # A tibble: 9 × 7
  12. #> Lake year urban agriculture wetlands forest shrub
  13. #> <chr> <int> <int> <int> <int> <int> <int>
  14. #> 1 A 2002 90 5 1 1 3
  15. #> 2 A 2003 90 5 1 1 3
  16. #> 3 A 2004 90 5 1 1 3
  17. #> 4 B 2002 80 1 4 5 10
  18. #> 5 B 2003 80 1 4 5 10
  19. #> 6 B 2004 80 1 4 5 10
  20. #> 7 C 2002 20 20 20 20 20
  21. #> 8 C 2003 20 20 20 20 20
  22. #> 9 C 2004 20 20 20 20 20

创建于 2023-06-08,使用 reprex v2.0.2

英文:

No doubt there are lots of ways to do this. If this is really your use case, then for each row, you could turn year into a list with the values 2002:2004. Then, just unnest() the year list and all the other data will be replicated for all values of year:

  1. library(dplyr)
  2. library(tidyr)
  3. tab &lt;- read.table(header=TRUE, textConnection(&quot;Lake year urban agriculture wetlands forest shrub
  4. A 2004 90 5 1 1 3
  5. B 2004 80 1 4 5 10
  6. C 2004 20 20 20 20 20&quot;))
  7. tab %&gt;%
  8. rowwise() %&gt;%
  9. mutate(year = list(2002:2004)) %&gt;%
  10. unnest(year)
  11. #&gt; # A tibble: 9 &#215; 7
  12. #&gt; Lake year urban agriculture wetlands forest shrub
  13. #&gt; &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;
  14. #&gt; 1 A 2002 90 5 1 1 3
  15. #&gt; 2 A 2003 90 5 1 1 3
  16. #&gt; 3 A 2004 90 5 1 1 3
  17. #&gt; 4 B 2002 80 1 4 5 10
  18. #&gt; 5 B 2003 80 1 4 5 10
  19. #&gt; 6 B 2004 80 1 4 5 10
  20. #&gt; 7 C 2002 20 20 20 20 20
  21. #&gt; 8 C 2003 20 20 20 20 20
  22. #&gt; 9 C 2004 20 20 20 20 20

<sup>Created on 2023-06-08 with reprex v2.0.2</sup>

答案2

得分: 1

  1. library(dplyr)
  2. library(tidyr)
  3. df |&gt;
  4. uncount(3, .id = &quot;year&quot;) |&gt;
  5. mutate(year = year + 2001)

这段代码假设有一系列连续的3年,从2001年后开始(例如2002年,2003年,2004年)。

  • uncount 将每一行重复3次,并添加一个名为 year 的列,该列仅是一个序列 1:3,用于标识每个重复的行。
  • mutate 用于调整创建的 year 列,通过添加2001,将 1:3 转换为 2002:2004

输出

  1. Lake year urban agriculture wetlands forest shrub
  2. 1 A 2002 90 5 1 1 3
  3. 2 A 2003 90 5 1 1 3
  4. 3 A 2004 90 5 1 1 3
  5. 4 B 2002 80 1 4 5 10
  6. 5 B 2003 80 1 4 5 10
  7. 6 B 2004 80 1 4 5 10
  8. 7 C 2002 20 20 20 20 20
  9. 8 C 2003 20 20 20 20 20
  10. 9 C 2004 20 20 20 20 20

如果您想要非连续年份,可以像这样操作:

  1. library(dplyr)
  2. library(tidyr)
  3. years &lt;- c(2002, 2004) # 编辑这部分以获取您想要的年份
  4. df |&gt;
  5. uncount(length(years)) |&gt;
  6. mutate(year = rep(years, times = nrow(df)))

数据

  1. df &lt;- structure(list(Lake = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;), year = c(2004L, 2004L,
  2. 2004L), urban = c(90L, 80L, 20L), agriculture = c(5L, 1L, 20L
  3. ), wetlands = c(1L, 4L, 20L), forest = c(1L, 5L, 20L), shrub = c(3L,
  4. 10L, 20L)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))
英文:
  1. library(dplyr)
  2. library(tidyr)
  3. df |&gt;
  4. uncount(3, .id = &quot;year&quot;) |&gt;
  5. mutate(year = year + 2001)

This assume a sequence of 3 consecutive years starting after 2001 (e.g. 2002, 2003, 2004).

  • uncount duplicates each row 3 times and adds the column year which is just a sequence 1:3 to identify each duplicated row.
  • mutate we use to adjust this created year by adding 2001, which transforms 1:3 into 2002:2004.

Output

  1. Lake year urban agriculture wetlands forest shrub
  2. 1 A 2002 90 5 1 1 3
  3. 2 A 2003 90 5 1 1 3
  4. 3 A 2004 90 5 1 1 3
  5. 4 B 2002 80 1 4 5 10
  6. 5 B 2003 80 1 4 5 10
  7. 6 B 2004 80 1 4 5 10
  8. 7 C 2002 20 20 20 20 20
  9. 8 C 2003 20 20 20 20 20
  10. 9 C 2004 20 20 20 20 20

If you wanted to do non-sequential years you could do something like:

  1. library(dplyr)
  2. library(tidyr)
  3. years &lt;- c(2002, 2004) # edit this for years you want
  4. df |&gt;
  5. uncount(length(years)) |&gt;
  6. mutate(year = rep(years, times = nrow(df)))

Data

  1. df &lt;- structure(list(Lake = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;), year = c(2004L, 2004L,
  2. 2004L), urban = c(90L, 80L, 20L), agriculture = c(5L, 1L, 20L
  3. ), wetlands = c(1L, 4L, 20L), forest = c(1L, 5L, 20L), shrub = c(3L,
  4. 10L, 20L)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))

huangapple
  • 本文由 发表于 2023年6月9日 02:28:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76434731.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定