2023年6月9日 02:28:32go评论92阅读模式

英文:

Generate new rows and assign values using data from specific parameters in dataframe

问题

我一直想使用特定参数为未知数据分配值。例如，我有特定湖泊在2004年的土地利用数据。然而，我没有这些湖泊在2002年和2003年的土地利用数据。有没有办法使用2004年的数据添加新的行，以包括2002-2003年的土地利用数据？我将提供一个展示我的数据现在是什么样子的数据集。

这是我想要的数据框的样子：

湖泊	年份	城市	农业	湿地	森林	灌木
A	2002	90	5	1	1	3
A	2003	90	5	1	1	3
A	2004	90	5	1	1	3
B	2002	80	1	4	5	10
B	2003	80	1	4	5	10
B	2004	80	1	4	5	10
C	2002	20	20	20	20	20
C	2003	20	20	20	20	20
C	2004	20	20	20	20	20

希望对您有所帮助。再次感谢！

英文:

I have been wanting to assign values for unknown data using specific parameters. For example, I have land usage data for specific lakes for the year 2004. However, I do not have land use data for these lakes for the years 2002 & 2003. Is there a way to add new rows with land use data for the years 2002-2003 using 2004 data? I will provide a data set for how my data looks like now.

Lake	year	urban	agriculture	wetlands	forest	shrub
A	2004	90	5	1	1	3
B	2004	80	1	4	5	10
C	2004	20	20	20	20	20

This is what I would like my data frame to look like:

Lake	year	urban	agriculture	wetlands	forest	shrub
A	2002	90	5	1	1	3
A	2003	90	5	1	1	3
A	2004	90	5	1	1	3
B	2002	80	1	4	5	10
B	2003	80	1	4	5	10
B	2004	80	1	4	5	10
C	2002	20	20	20	20	20
C	2003	20	20	20	20	20
C	2004	20	20	20	20	20

Any help is appreciated. Thanks again!

I do not know where to start so I have not tried anything.

答案1

得分: 1

毫无疑问，有很多方法可以做到这一点。如果这确实是您的用例，那么对于每一行，您可以将 year 转换为一个包含值 2002:2004 的列表。然后，只需使用 unnest() 函数对年份列表进行展开，所有其他数据将被复制以适用于所有 year 的值：

library(dplyr)
library(tidyr)
tab <- read.table(header=TRUE, textConnection("Lake year    urban   agriculture wetlands    forest  shrub
A   2004    90  5   1   1   3
B   2004    80  1   4   5   10
C   2004    20  20  20  20  20"))
tab %>%
  rowwise() %>%
  mutate(year = list(2002:2004)) %>%
  unnest(year)
#> # A tibble: 9 × 7
#>   Lake   year urban agriculture wetlands forest shrub
#>   <chr> <int> <int>       <int>    <int>  <int> <int>
#> 1 A      2002    90           5        1      1     3
#> 2 A      2003    90           5        1      1     3
#> 3 A      2004    90           5        1      1     3
#> 4 B      2002    80           1        4      5    10
#> 5 B      2003    80           1        4      5    10
#> 6 B      2004    80           1        4      5    10
#> 7 C      2002    20          20       20     20    20
#> 8 C      2003    20          20       20     20    20
#> 9 C      2004    20          20       20     20    20

^{创建于 2023-06-08，使用 reprex v2.0.2}

英文:

No doubt there are lots of ways to do this. If this is really your use case, then for each row, you could turn year into a list with the values 2002:2004. Then, just unnest() the year list and all the other data will be replicated for all values of year:

library(dplyr)
library(tidyr)
tab &lt;- read.table(header=TRUE, textConnection(&quot;Lake year    urban   agriculture wetlands    forest  shrub
A   2004    90  5   1   1   3
B   2004    80  1   4   5   10
C   2004    20  20  20  20  20&quot;))
tab %&gt;% 
  rowwise() %&gt;% 
  mutate(year = list(2002:2004)) %&gt;% 
  unnest(year)
#&gt; # A tibble: 9 &#215; 7
#&gt;   Lake   year urban agriculture wetlands forest shrub
#&gt;   &lt;chr&gt; &lt;int&gt; &lt;int&gt;       &lt;int&gt;    &lt;int&gt;  &lt;int&gt; &lt;int&gt;
#&gt; 1 A      2002    90           5        1      1     3
#&gt; 2 A      2003    90           5        1      1     3
#&gt; 3 A      2004    90           5        1      1     3
#&gt; 4 B      2002    80           1        4      5    10
#&gt; 5 B      2003    80           1        4      5    10
#&gt; 6 B      2004    80           1        4      5    10
#&gt; 7 C      2002    20          20       20     20    20
#&gt; 8 C      2003    20          20       20     20    20
#&gt; 9 C      2004    20          20       20     20    20

<sup>Created on 2023-06-08 with reprex v2.0.2</sup>

答案2

得分: 1

library(dplyr)
library(tidyr)
df |&gt;
  uncount(3, .id = &quot;year&quot;) |&gt;
  mutate(year = year + 2001)

这段代码假设有一系列连续的3年，从2001年后开始（例如2002年，2003年，2004年）。

uncount 将每一行重复3次，并添加一个名为 year 的列，该列仅是一个序列 1:3，用于标识每个重复的行。
mutate 用于调整创建的 year 列，通过添加2001，将 1:3 转换为 2002:2004。

输出

  Lake year urban agriculture wetlands forest shrub
1    A 2002    90           5        1      1     3
2    A 2003    90           5        1      1     3
3    A 2004    90           5        1      1     3
4    B 2002    80           1        4      5    10
5    B 2003    80           1        4      5    10
6    B 2004    80           1        4      5    10
7    C 2002    20          20       20     20    20
8    C 2003    20          20       20     20    20
9    C 2004    20          20       20     20    20

如果您想要非连续年份，可以像这样操作：

library(dplyr)
library(tidyr)
years &lt;- c(2002, 2004) # 编辑这部分以获取您想要的年份
df |&gt;
  uncount(length(years)) |&gt;
  mutate(year = rep(years, times = nrow(df)))

数据

df &lt;- structure(list(Lake = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;), year = c(2004L, 2004L, 
2004L), urban = c(90L, 80L, 20L), agriculture = c(5L, 1L, 20L
), wetlands = c(1L, 4L, 20L), forest = c(1L, 5L, 20L), shrub = c(3L, 
10L, 20L)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))

英文:

library(dplyr)
library(tidyr)
df |&gt;
  uncount(3, .id = &quot;year&quot;) |&gt;
  mutate(year = year + 2001)

This assume a sequence of 3 consecutive years starting after 2001 (e.g. 2002, 2003, 2004).

uncount duplicates each row 3 times and adds the column year which is just a sequence 1:3 to identify each duplicated row.
mutate we use to adjust this created year by adding 2001, which transforms 1:3 into 2002:2004.

Output

  Lake year urban agriculture wetlands forest shrub
1    A 2002    90           5        1      1     3
2    A 2003    90           5        1      1     3
3    A 2004    90           5        1      1     3
4    B 2002    80           1        4      5    10
5    B 2003    80           1        4      5    10
6    B 2004    80           1        4      5    10
7    C 2002    20          20       20     20    20
8    C 2003    20          20       20     20    20
9    C 2004    20          20       20     20    20

If you wanted to do non-sequential years you could do something like:

library(dplyr)
library(tidyr)
years &lt;- c(2002, 2004) # edit this for years you want
df |&gt;
  uncount(length(years)) |&gt;
  mutate(year = rep(years, times = nrow(df)))

Data

df &lt;- structure(list(Lake = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;), year = c(2004L, 2004L, 
2004L), urban = c(90L, 80L, 20L), agriculture = c(5L, 1L, 20L
), wetlands = c(1L, 4L, 20L), forest = c(1L, 5L, 20L), shrub = c(3L, 
10L, 20L)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

生成新的行并使用数据框中特定参数的值进行赋值。

问题

答案1

答案2

OpenAI ChatGPT (GPT-3.5) API错误 400: “‘user’ 不是类型为 ‘object’ 的对象”

如何在R中从两列创建一个数据框。

获取一个URL中<head>标签中的<title>。

复制列名并拼接

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。