英文:
Generate new rows and assign values using data from specific parameters in dataframe
问题
我一直想使用特定参数为未知数据分配值。例如,我有特定湖泊在2004年的土地利用数据。然而,我没有这些湖泊在2002年和2003年的土地利用数据。有没有办法使用2004年的数据添加新的行,以包括2002-2003年的土地利用数据?我将提供一个展示我的数据现在是什么样子的数据集。
这是我想要的数据框的样子:
湖泊 | 年份 | 城市 | 农业 | 湿地 | 森林 | 灌木 |
---|---|---|---|---|---|---|
A | 2002 | 90 | 5 | 1 | 1 | 3 |
A | 2003 | 90 | 5 | 1 | 1 | 3 |
A | 2004 | 90 | 5 | 1 | 1 | 3 |
B | 2002 | 80 | 1 | 4 | 5 | 10 |
B | 2003 | 80 | 1 | 4 | 5 | 10 |
B | 2004 | 80 | 1 | 4 | 5 | 10 |
C | 2002 | 20 | 20 | 20 | 20 | 20 |
C | 2003 | 20 | 20 | 20 | 20 | 20 |
C | 2004 | 20 | 20 | 20 | 20 | 20 |
希望对您有所帮助。再次感谢!
英文:
I have been wanting to assign values for unknown data using specific parameters. For example, I have land usage data for specific lakes for the year 2004. However, I do not have land use data for these lakes for the years 2002 & 2003. Is there a way to add new rows with land use data for the years 2002-2003 using 2004 data? I will provide a data set for how my data looks like now.
Lake | year | urban | agriculture | wetlands | forest | shrub |
---|---|---|---|---|---|---|
A | 2004 | 90 | 5 | 1 | 1 | 3 |
B | 2004 | 80 | 1 | 4 | 5 | 10 |
C | 2004 | 20 | 20 | 20 | 20 | 20 |
This is what I would like my data frame to look like:
Lake | year | urban | agriculture | wetlands | forest | shrub |
---|---|---|---|---|---|---|
A | 2002 | 90 | 5 | 1 | 1 | 3 |
A | 2003 | 90 | 5 | 1 | 1 | 3 |
A | 2004 | 90 | 5 | 1 | 1 | 3 |
B | 2002 | 80 | 1 | 4 | 5 | 10 |
B | 2003 | 80 | 1 | 4 | 5 | 10 |
B | 2004 | 80 | 1 | 4 | 5 | 10 |
C | 2002 | 20 | 20 | 20 | 20 | 20 |
C | 2003 | 20 | 20 | 20 | 20 | 20 |
C | 2004 | 20 | 20 | 20 | 20 | 20 |
Any help is appreciated. Thanks again!
I do not know where to start so I have not tried anything.
答案1
得分: 1
毫无疑问,有很多方法可以做到这一点。如果这确实是您的用例,那么对于每一行,您可以将 year
转换为一个包含值 2002:2004
的列表。然后,只需使用 unnest()
函数对年份列表进行展开,所有其他数据将被复制以适用于所有 year
的值:
library(dplyr)
library(tidyr)
tab <- read.table(header=TRUE, textConnection("Lake year urban agriculture wetlands forest shrub
A 2004 90 5 1 1 3
B 2004 80 1 4 5 10
C 2004 20 20 20 20 20"))
tab %>%
rowwise() %>%
mutate(year = list(2002:2004)) %>%
unnest(year)
#> # A tibble: 9 × 7
#> Lake year urban agriculture wetlands forest shrub
#> <chr> <int> <int> <int> <int> <int> <int>
#> 1 A 2002 90 5 1 1 3
#> 2 A 2003 90 5 1 1 3
#> 3 A 2004 90 5 1 1 3
#> 4 B 2002 80 1 4 5 10
#> 5 B 2003 80 1 4 5 10
#> 6 B 2004 80 1 4 5 10
#> 7 C 2002 20 20 20 20 20
#> 8 C 2003 20 20 20 20 20
#> 9 C 2004 20 20 20 20 20
创建于 2023-06-08,使用 reprex v2.0.2
英文:
No doubt there are lots of ways to do this. If this is really your use case, then for each row, you could turn year
into a list with the values 2002:2004
. Then, just unnest()
the year list and all the other data will be replicated for all values of year
:
library(dplyr)
library(tidyr)
tab <- read.table(header=TRUE, textConnection("Lake year urban agriculture wetlands forest shrub
A 2004 90 5 1 1 3
B 2004 80 1 4 5 10
C 2004 20 20 20 20 20"))
tab %>%
rowwise() %>%
mutate(year = list(2002:2004)) %>%
unnest(year)
#> # A tibble: 9 × 7
#> Lake year urban agriculture wetlands forest shrub
#> <chr> <int> <int> <int> <int> <int> <int>
#> 1 A 2002 90 5 1 1 3
#> 2 A 2003 90 5 1 1 3
#> 3 A 2004 90 5 1 1 3
#> 4 B 2002 80 1 4 5 10
#> 5 B 2003 80 1 4 5 10
#> 6 B 2004 80 1 4 5 10
#> 7 C 2002 20 20 20 20 20
#> 8 C 2003 20 20 20 20 20
#> 9 C 2004 20 20 20 20 20
<sup>Created on 2023-06-08 with reprex v2.0.2</sup>
答案2
得分: 1
library(dplyr)
library(tidyr)
df |>
uncount(3, .id = "year") |>
mutate(year = year + 2001)
这段代码假设有一系列连续的3年,从2001年后开始(例如2002年,2003年,2004年)。
uncount
将每一行重复3次,并添加一个名为year
的列,该列仅是一个序列1:3
,用于标识每个重复的行。mutate
用于调整创建的year
列,通过添加2001,将1:3
转换为2002:2004
。
输出
Lake year urban agriculture wetlands forest shrub
1 A 2002 90 5 1 1 3
2 A 2003 90 5 1 1 3
3 A 2004 90 5 1 1 3
4 B 2002 80 1 4 5 10
5 B 2003 80 1 4 5 10
6 B 2004 80 1 4 5 10
7 C 2002 20 20 20 20 20
8 C 2003 20 20 20 20 20
9 C 2004 20 20 20 20 20
如果您想要非连续年份,可以像这样操作:
library(dplyr)
library(tidyr)
years <- c(2002, 2004) # 编辑这部分以获取您想要的年份
df |>
uncount(length(years)) |>
mutate(year = rep(years, times = nrow(df)))
数据
df <- structure(list(Lake = c("A", "B", "C"), year = c(2004L, 2004L,
2004L), urban = c(90L, 80L, 20L), agriculture = c(5L, 1L, 20L
), wetlands = c(1L, 4L, 20L), forest = c(1L, 5L, 20L), shrub = c(3L,
10L, 20L)), class = "data.frame", row.names = c(NA, -3L))
英文:
library(dplyr)
library(tidyr)
df |>
uncount(3, .id = "year") |>
mutate(year = year + 2001)
This assume a sequence of 3
consecutive years starting after 2001
(e.g. 2002, 2003, 2004).
uncount
duplicates each row 3 times and adds the columnyear
which is just a sequence1:3
to identify each duplicated row.mutate
we use to adjust this createdyear
by adding 2001, which transforms1:3
into2002:2004
.
Output
Lake year urban agriculture wetlands forest shrub
1 A 2002 90 5 1 1 3
2 A 2003 90 5 1 1 3
3 A 2004 90 5 1 1 3
4 B 2002 80 1 4 5 10
5 B 2003 80 1 4 5 10
6 B 2004 80 1 4 5 10
7 C 2002 20 20 20 20 20
8 C 2003 20 20 20 20 20
9 C 2004 20 20 20 20 20
If you wanted to do non-sequential years you could do something like:
library(dplyr)
library(tidyr)
years <- c(2002, 2004) # edit this for years you want
df |>
uncount(length(years)) |>
mutate(year = rep(years, times = nrow(df)))
Data
df <- structure(list(Lake = c("A", "B", "C"), year = c(2004L, 2004L,
2004L), urban = c(90L, 80L, 20L), agriculture = c(5L, 1L, 20L
), wetlands = c(1L, 4L, 20L), forest = c(1L, 5L, 20L), shrub = c(3L,
10L, 20L)), class = "data.frame", row.names = c(NA, -3L))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论