生成新的行并使用数据框中特定参数的值进行赋值。

huangapple go评论69阅读模式
英文:

Generate new rows and assign values using data from specific parameters in dataframe

问题

我一直想使用特定参数为未知数据分配值。例如,我有特定湖泊在2004年的土地利用数据。然而,我没有这些湖泊在2002年和2003年的土地利用数据。有没有办法使用2004年的数据添加新的行,以包括2002-2003年的土地利用数据?我将提供一个展示我的数据现在是什么样子的数据集。

这是我想要的数据框的样子:

湖泊 年份 城市 农业 湿地 森林 灌木
A 2002 90 5 1 1 3
A 2003 90 5 1 1 3
A 2004 90 5 1 1 3
B 2002 80 1 4 5 10
B 2003 80 1 4 5 10
B 2004 80 1 4 5 10
C 2002 20 20 20 20 20
C 2003 20 20 20 20 20
C 2004 20 20 20 20 20

希望对您有所帮助。再次感谢!

英文:

I have been wanting to assign values for unknown data using specific parameters. For example, I have land usage data for specific lakes for the year 2004. However, I do not have land use data for these lakes for the years 2002 & 2003. Is there a way to add new rows with land use data for the years 2002-2003 using 2004 data? I will provide a data set for how my data looks like now.

Lake year urban agriculture wetlands forest shrub
A 2004 90 5 1 1 3
B 2004 80 1 4 5 10
C 2004 20 20 20 20 20

This is what I would like my data frame to look like:

Lake year urban agriculture wetlands forest shrub
A 2002 90 5 1 1 3
A 2003 90 5 1 1 3
A 2004 90 5 1 1 3
B 2002 80 1 4 5 10
B 2003 80 1 4 5 10
B 2004 80 1 4 5 10
C 2002 20 20 20 20 20
C 2003 20 20 20 20 20
C 2004 20 20 20 20 20

Any help is appreciated. Thanks again!

I do not know where to start so I have not tried anything.

答案1

得分: 1

毫无疑问,有很多方法可以做到这一点。如果这确实是您的用例,那么对于每一行,您可以将 year 转换为一个包含值 2002:2004 的列表。然后,只需使用 unnest() 函数对年份列表进行展开,所有其他数据将被复制以适用于所有 year 的值:

library(dplyr)
library(tidyr)
tab <- read.table(header=TRUE, textConnection("Lake year    urban   agriculture wetlands    forest  shrub
A   2004    90  5   1   1   3
B   2004    80  1   4   5   10
C   2004    20  20  20  20  20"))

tab %>%
  rowwise() %>%
  mutate(year = list(2002:2004)) %>%
  unnest(year)
#> # A tibble: 9 × 7
#>   Lake   year urban agriculture wetlands forest shrub
#>   <chr> <int> <int>       <int>    <int>  <int> <int>
#> 1 A      2002    90           5        1      1     3
#> 2 A      2003    90           5        1      1     3
#> 3 A      2004    90           5        1      1     3
#> 4 B      2002    80           1        4      5    10
#> 5 B      2003    80           1        4      5    10
#> 6 B      2004    80           1        4      5    10
#> 7 C      2002    20          20       20     20    20
#> 8 C      2003    20          20       20     20    20
#> 9 C      2004    20          20       20     20    20

创建于 2023-06-08,使用 reprex v2.0.2

英文:

No doubt there are lots of ways to do this. If this is really your use case, then for each row, you could turn year into a list with the values 2002:2004. Then, just unnest() the year list and all the other data will be replicated for all values of year:

library(dplyr)
library(tidyr)
tab &lt;- read.table(header=TRUE, textConnection(&quot;Lake year    urban   agriculture wetlands    forest  shrub
A   2004    90  5   1   1   3
B   2004    80  1   4   5   10
C   2004    20  20  20  20  20&quot;))

tab %&gt;% 
  rowwise() %&gt;% 
  mutate(year = list(2002:2004)) %&gt;% 
  unnest(year)
#&gt; # A tibble: 9 &#215; 7
#&gt;   Lake   year urban agriculture wetlands forest shrub
#&gt;   &lt;chr&gt; &lt;int&gt; &lt;int&gt;       &lt;int&gt;    &lt;int&gt;  &lt;int&gt; &lt;int&gt;
#&gt; 1 A      2002    90           5        1      1     3
#&gt; 2 A      2003    90           5        1      1     3
#&gt; 3 A      2004    90           5        1      1     3
#&gt; 4 B      2002    80           1        4      5    10
#&gt; 5 B      2003    80           1        4      5    10
#&gt; 6 B      2004    80           1        4      5    10
#&gt; 7 C      2002    20          20       20     20    20
#&gt; 8 C      2003    20          20       20     20    20
#&gt; 9 C      2004    20          20       20     20    20

<sup>Created on 2023-06-08 with reprex v2.0.2</sup>

答案2

得分: 1

library(dplyr)
library(tidyr)

df |&gt;
  uncount(3, .id = &quot;year&quot;) |&gt;
  mutate(year = year + 2001)

这段代码假设有一系列连续的3年,从2001年后开始(例如2002年,2003年,2004年)。

  • uncount 将每一行重复3次,并添加一个名为 year 的列,该列仅是一个序列 1:3,用于标识每个重复的行。
  • mutate 用于调整创建的 year 列,通过添加2001,将 1:3 转换为 2002:2004

输出

  Lake year urban agriculture wetlands forest shrub
1    A 2002    90           5        1      1     3
2    A 2003    90           5        1      1     3
3    A 2004    90           5        1      1     3
4    B 2002    80           1        4      5    10
5    B 2003    80           1        4      5    10
6    B 2004    80           1        4      5    10
7    C 2002    20          20       20     20    20
8    C 2003    20          20       20     20    20
9    C 2004    20          20       20     20    20

如果您想要非连续年份,可以像这样操作:

library(dplyr)
library(tidyr)

years &lt;- c(2002, 2004) # 编辑这部分以获取您想要的年份

df |&gt;
  uncount(length(years)) |&gt;
  mutate(year = rep(years, times = nrow(df)))

数据

df &lt;- structure(list(Lake = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;), year = c(2004L, 2004L, 
2004L), urban = c(90L, 80L, 20L), agriculture = c(5L, 1L, 20L
), wetlands = c(1L, 4L, 20L), forest = c(1L, 5L, 20L), shrub = c(3L, 
10L, 20L)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))
英文:
library(dplyr)
library(tidyr)

df |&gt;
  uncount(3, .id = &quot;year&quot;) |&gt;
  mutate(year = year + 2001)

This assume a sequence of 3 consecutive years starting after 2001 (e.g. 2002, 2003, 2004).

  • uncount duplicates each row 3 times and adds the column year which is just a sequence 1:3 to identify each duplicated row.
  • mutate we use to adjust this created year by adding 2001, which transforms 1:3 into 2002:2004.

Output

  Lake year urban agriculture wetlands forest shrub
1    A 2002    90           5        1      1     3
2    A 2003    90           5        1      1     3
3    A 2004    90           5        1      1     3
4    B 2002    80           1        4      5    10
5    B 2003    80           1        4      5    10
6    B 2004    80           1        4      5    10
7    C 2002    20          20       20     20    20
8    C 2003    20          20       20     20    20
9    C 2004    20          20       20     20    20

If you wanted to do non-sequential years you could do something like:

library(dplyr)
library(tidyr)

years &lt;- c(2002, 2004) # edit this for years you want

df |&gt;
  uncount(length(years)) |&gt;
  mutate(year = rep(years, times = nrow(df)))

Data

df &lt;- structure(list(Lake = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;), year = c(2004L, 2004L, 
2004L), urban = c(90L, 80L, 20L), agriculture = c(5L, 1L, 20L
), wetlands = c(1L, 4L, 20L), forest = c(1L, 5L, 20L), shrub = c(3L, 
10L, 20L)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))

huangapple
  • 本文由 发表于 2023年6月9日 02:28:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76434731.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定