2023年6月26日 18:55:20go评论103阅读模式

英文:

Find max time difference within each year in R

问题

这是您要翻译的内容：

"I have a function that calculates the average, min and max values for each year in my dataframe, then merges them to output the alltime average, min and max values. Each year needs to be calculated separately first because my dates only refer to the months of April through August. If I didn't group it by year, there would be calculations between August of one year and April of the next year. I want to avoid this.

Example dataframe:

date            NDVI        cloud_cover    field_id
23/04/2017      0.6494          12           KM60        
23/04/2017      0.5683          0            KM1
05/05/2017      0.3467          0            KM60
31/07/2017      0.6743          05           KM60
31/07/2017        NA            97           KM1
31/07/2017      0.3456          07           LM27
01/04/2018        NA            100          KM60
03/06/2018      0.6743          11           KM60
03/06/2018      0.2346          12           KM1
04/05/2019        NA            99           KM60
05/05/2019      0.5432          20           KM60

NDVI and cloud_cover shouldn't influence calculations. Although field_ids most times provide the same dates, this also shouldn't influence them.

This is the current code:

calculate_time_diff <- function(df) {
  # Convert "date" column to datetime
  df$date <- as.POSIXct(df$date)
  
  # Group the data by year
  df_calc <- split(df, format(df$date, "%Y"))
  
  # Calculate time differences between consecutive observations for each year
  time_diffs <- lapply(df_calc, function(group) {
    # Sort dataframe based on "date"
    group <- group[order(group$date), ]
    
    # Filter out duplicate dates
    group <- group[!duplicated(group$date), ]
    
    # Calculate time differences between consecutive observations
    diff(group$date)
  })
  
  # Combine time differences from all years into a single vector
  all_time_diffs <- unlist(time_diffs)
  
  # Compute average time difference
  avg_time_diff <- mean(all_time_diffs)
  
  # Calculate smallest and biggest time differences
  smallest_time_diff <- min(all_time_diffs)
  biggest_time_diff <- max(all_time_diffs)
  
  return(list(avg_time_diff = avg_time_diff,
              smallest_time_diff = smallest_time_diff,
              biggest_time_diff = biggest_time_diff))
}

The output is giving me "240" as max time difference, which I know to be unrealistic. My dataframe refers to the revisit dates of three satellites and none of them should be more than at the very most a month apart.

I thought it could have something to do with the way years are being extracted, but this user seems to have successfully used format() just as I did. lapply() should iterate through each split year group in the same way as group_by(). So what could be the problem in my script?"

英文:

I have a function that calculates the average, min and max values for each year in my dataframe, then merges them to output the alltime average, min and max values. Each year needs to be calculated separately first because my dates only refer to the months of April through August. If I didn't group it by year, there would be calculations between August of one year and April of the next year. I want to avoid this.

Example dataframe:

date            NDVI        cloud_cover    field_id
23/04/2017      0.6494          12           KM60        
23/04/2017      0.5683          0            KM1
05/05/2017      0.3467          0            KM60
31/07/2017      0.6743          05           KM60
31/07/2017        NA            97           KM1
31/07/2017      0.3456          07           LM27
01/04/2018        NA            100          KM60
03/06/2018      0.6743          11           KM60
03/06/2018      0.2346          12           KM1
04/05/2019        NA            99           KM60
05/05/2019      0.5432          20           KM60

NDVI and cloud_cover shouldn't influence calculations. Although field_ids most times provide the same dates, this also shouldn't influence them.

This is the current code:

calculate_time_diff &lt;- function(df) {
  # Convert &quot;date&quot; column to datetime
  df$date &lt;- as.POSIXct(df$date)
  
  # Group the data by year
  df_calc &lt;- split(df, format(df$date, &quot;%Y&quot;))
  
  # Calculate time differences between consecutive observations for each year
  time_diffs &lt;- lapply(df_calc, function(group) {
    # Sort dataframe based on &quot;date&quot;
    group &lt;- group[order(group$date), ]
    
    # Filter out duplicate dates
    group &lt;- group[!duplicated(group$date), ]
    
    # Calculate time differences between consecutive observations
    diff(group$date)
  })
  
  # Combine time differences from all years into a single vector
  all_time_diffs &lt;- unlist(time_diffs)
  
  # Compute average time difference
  avg_time_diff &lt;- mean(all_time_diffs)
  
  # Calculate smallest and biggest time differences
  smallest_time_diff &lt;- min(all_time_diffs)
  biggest_time_diff &lt;- max(all_time_diffs)
  
  return(list(avg_time_diff = avg_time_diff,
              smallest_time_diff = smallest_time_diff,
              biggest_time_diff = biggest_time_diff))
}

答案1

得分: 0

Using dplyr:

data %>%
  distinct(date) %>%
  arrange(date) %>%
  group_by(format(date, "%Y")) %>%
  reframe(dateDiff = diff(date)) %>%
  with(list(avg_time_diff = mean(dateDiff),
            smallest_time_diff = min(dateDiff),
            biggest_time_diff = max(dateDiff)))

Result:

$avg_time_diff
Time difference of 30.02198 days
$smallest_time_diff
Time difference of 12 days
$biggest_time_diff
Time difference of 51 days

Dummy data:

data <- data.frame(date = seq(as.Date("2017-01-01"), by = "month", length.out = 100) + sample(0:20, 100, TRUE))

英文:

Using dplyr:

data %&gt;%
  distinct(date) %&gt;% #remove duplicates
  arrange(date) %&gt;% #order by date
  group_by(format(date, &quot;%Y&quot;)) %&gt;% #group by year
  reframe(dateDiff = diff(date)) %&gt;% #apply &#39;diff&#39; to every group
  with(list(avg_time_diff = mean(dateDiff),
            smallest_time_diff = min(dateDiff),
            biggest_time_diff = max(dateDiff))) #create your metrics

Result:

$avg_time_diff
Time difference of 30.02198 days
$smallest_time_diff
Time difference of 12 days
$biggest_time_diff
Time difference of 51 days

Dummy data:

data &lt;- data.frame(date = seq(as.Date(&quot;2017-01-01&quot;), by = &quot;month&quot;, length.out = 100) + sample(0:20, 100, TRUE))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

找到R中每年的最大时间差

问题

答案1

传递类方法作为参数，以及参数。

选择行和列

如何比较一组向量以查找它们是否包含共同元素？

删除R中每个第n范围的行

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。