无法使用read.zoo,因为存在NAs。

huangapple go评论111阅读模式
英文:

Unable to use read.zoo due to the presence of NAs

问题

I have a large dataset of irregular multivariate timeseries that I want to convert with read.zoo.

一些最后几行填充了NAs。当我运行read.zoo包括带有NAs的行时,我收到以下错误消息:"index has bad entries at data rows: 43 44 ..."

当我检查is.na()时,NA单元格显示为TRUE。我尝试了来自这里的na.fill解决方案,但它不起作用。

下面是一个包含两个变量Var1和Var2以及它们的日期date1和date2的数据集摘录:

  1. 2023-01-13 100.325 2023-01-11 99.748
  2. 2023-01-16 100.378 2023-01-12 99.832
  3. 2023-01-17 100.826 2023-01-13 99.878
  4. ...
  5. (后续数据省略)
  6. ...
  7. 2023-03-10 99.489 2023-03-08 101.915
  8. NA NA 2023-03-09 101.927
  9. NA NA 2023-03-10 101.775
  10. NA NA NA NA
  11. NA NA NA NA
  12. NA NA NA NA
英文:

I have a large dataset of irregular multivariate timeseries that I want to convert with read.zoo.

Some of the last rows are populated with NAs. When I run read.zoo including the rows with the NAs, I get the following error message: "index has bad entries at data rows: 43 44 ...".

When I check is.na() the NA cells indicate TRUE. And I tried the na.fill solution from here, but it doesn't work.

Below is an extract of the dataset with two variables Var1 and Var2 with their respective dates date1 and date2:

  1. date1 Var1 date2 Var2
  2. 2023-01-13 100.325 2023-01-11 99.748
  3. 2023-01-16 100.378 2023-01-12 99.832
  4. 2023-01-17 100.826 2023-01-13 99.878
  5. 2023-01-18 100.933 2023-01-16 99.762
  6. 2023-01-19 100.641 2023-01-17 99.484
  7. 2023-01-20 100.148 2023-01-18 99.743
  8. 2023-01-23 99.972 2023-01-19 99.419
  9. 2023-01-24 100.256 2023-01-20 99.364
  10. 2023-01-25 100.348 2023-01-23 99.533
  11. 2023-01-26 100.146 2023-01-24 99.711
  12. 2023-01-27 100.063 2023-01-25 99.798
  13. 2023-01-30 99.649 2023-01-26 100.481
  14. 2023-01-31 99.822 2023-01-27 100.708
  15. 2023-02-01 99.885 2023-01-30 100.57
  16. 2023-02-02 101.121 2023-01-31 100.773
  17. 2023-02-03 100.854 2023-02-01 100.999
  18. 2023-02-06 100.5 2023-02-02 102.037
  19. 2023-02-07 100.272 2023-02-03 102.104
  20. 2023-02-08 100.372 2023-02-06 101.85
  21. 2023-02-09 100.659 2023-02-07 101.765
  22. 2023-02-10 100.421 2023-02-08 101.806
  23. 2023-02-13 100.418 2023-02-09 101.905
  24. 2023-02-14 100.202 2023-02-10 101.675
  25. 2023-02-15 99.913 2023-02-13 101.491
  26. 2023-02-16 99.832 2023-02-14 101.304
  27. 2023-02-17 99.911 2023-02-15 101.242
  28. 2023-02-20 99.791 2023-02-16 101.621
  29. 2023-02-21 99.451 2023-02-17 101.581
  30. 2023-02-22 99.467 2023-02-20 101.545
  31. 2023-02-23 99.642 2023-02-21 101.334
  32. 2023-02-24 99.278 2023-02-22 101.246
  33. 2023-02-27 99.114 2023-02-23 101.857
  34. 2023-02-28 98.784 2023-02-24 101.71
  35. 2023-03-01 98.486 2023-02-27 101.759
  36. 2023-03-02 98.396 2023-02-28 101.649
  37. 2023-03-03 98.467 2023-03-01 101.583
  38. 2023-03-06 98.276 2023-03-02 101.426
  39. 2023-03-07 98.495 2023-03-03 101.666
  40. 2023-03-08 98.572 2023-03-06 101.919
  41. 2023-03-09 98.747 2023-03-07 102.048
  42. 2023-03-10 99.489 2023-03-08 101.915
  43. NA NA 2023-03-09 101.927
  44. NA NA 2023-03-10 101.775
  45. NA NA NA NA
  46. NA NA NA NA
  47. NA NA NA NA

答案1

得分: 1

The solution was provided by @G. Grothendieck in another post here:

将 as.data.frame(x) 替换为 na.omit(as.data.frame(x))

英文:

The solution was provided by @G. Grothendieck in another post here:

Replace as.data.frame(x) with na.omit(as.data.frame(x))

答案2

得分: 0

首先,让我为您从您的数据创建一个数据框架:

  1. lines <- "date1 Var1 date2 Var2
  2. 2023-01-13 100.325 2023-01-11 99.748
  3. 2023-01-16 100.378 2023-01-12 99.832
  4. 2023-01-17 100.826 2023-01-13 99.878
  5. 2023-01-18 100.933 2023-01-16 99.762
  6. 2023-01-19 100.641 2023-01-17 99.484
  7. 2023-01-20 100.148 2023-01-18 99.743
  8. 2023-01-23 99.972 2023-01-19 99.419
  9. 2023-01-24 100.256 2023-01-20 99.364
  10. 2023-01-25 100.348 2023-01-23 99.533
  11. 2023-01-26 100.146 2023-01-24 99.711
  12. 2023-01-27 100.063 2023-01-25 99.798
  13. 2023-01-30 99.649 2023-01-26 100.481
  14. 2023-01-31 99.822 2023-01-27 100.708
  15. 2023-02-01 99.885 2023-01-30 100.57
  16. 2023-02-02 101.121 2023-01-31 100.773
  17. 2023-02-03 100.854 2023-02-01 100.999
  18. 2023-02-06 100.5 2023-02-02 102.037
  19. 2023-02-07 100.272 2023-02-03 102.104
  20. 2023-02-08 100.372 2023-02-06 101.85
  21. 2023-02-09 100.659 2023-02-07 101.765
  22. 2023-02-10 100.421 2023-02-08 101.806
  23. 2023-02-13 100.418 2023-02-09 101.905
  24. 2023-02-14 100.202 2023-02-10 101.675
  25. 2023-02-15 99.913 2023-02-13 101.491
  26. 2023-02-16 99.832 2023-02-14 101.304
  27. 2023-02-17 99.911 2023-02-15 101.242
  28. 2023-02-20 99.791 2023-02-16 101.621
  29. 2023-02-21 99.451 2023-02-17 101.581
  30. 2023-02-22 99.467 2023-02-20 101.545
  31. 2023-02-23 99.642 2023-02-21 101.334
  32. 2023-02-24 99.278 2023-02-22 101.246
  33. 2023-02-27 99.114 2023-02-23 101.857
  34. 2023-02-28 98.784 2023-02-24 101.71
  35. 2023-03-01 98.486 2023-02-27 101.759
  36. 2023-03-02 98.396 2023-02-28 101.649
  37. 2023-03-03 98.467 2023-03-01 101.583
  38. 2023-03-06 98.276 2023-03-02 101.426
  39. 2023-03-07 98.495 2023-03-03 101.666
  40. 2023-03-08 98.572 2023-03-06 101.919
  41. 2023-03-09 98.747 2023-03-07 102.048
  42. 2023-03-10 99.489 2023-03-08 101.915
  43. NA NA 2023-03-09 101.927
  44. NA NA 2023-03-10 101.775
  45. NA NA NA NA
  46. NA NA NA NA"
  47. library(tidyverse)
  48. library(dplyr)
  49. DF <- read.table(text = lines, header = TRUE)

然后,让我将日期格式化为正确的格式:

  1. library(zoo)
  2. # 将日期格式化为POSIXct格式
  3. DF$date1 <- as.POSIXct(DF$date1)
  4. DF$date2 <- as.POSIXct(DF$date2)

如果您想根据您的需求创建两个不同的数据集,可以这样做:

  1. df1 <- DF %>% select(date1, Var1) %>% na.omit() %>% set_names(c("Date", "Var"))
  2. df2 <- DF %>% select(date2, Var2) %>% na.omit() %>% set_names(c("Date", "Var"))

然后,将这些分开的数据集创建成zoo对象:

  1. zoo1 <- zoo(df1$Var, order.by = df1$Date)
  2. zoo2 <- zoo(df2$Var, order.by = df2$Date)

或者,如果您想合并这些变量,可以这样做:

  1. # 合并上面创建的两个数据框架
  2. mergedDf <- merge(df1, df2, by = "Date")
  3. # 创建zoo对象
  4. zooObject <- zoo(mergedDf$Var.x, order.by = mergedDf$Date)

希望这有所帮助。

英文:

first let me create a dataframe from your data:

  1. lines &lt;- &quot;date1 Var1 date2 Var2
  2. 2023-01-13 100.325 2023-01-11 99.748
  3. 2023-01-16 100.378 2023-01-12 99.832
  4. 2023-01-17 100.826 2023-01-13 99.878
  5. 2023-01-18 100.933 2023-01-16 99.762
  6. 2023-01-19 100.641 2023-01-17 99.484
  7. 2023-01-20 100.148 2023-01-18 99.743
  8. 2023-01-23 99.972 2023-01-19 99.419
  9. 2023-01-24 100.256 2023-01-20 99.364
  10. 2023-01-25 100.348 2023-01-23 99.533
  11. 2023-01-26 100.146 2023-01-24 99.711
  12. 2023-01-27 100.063 2023-01-25 99.798
  13. 2023-01-30 99.649 2023-01-26 100.481
  14. 2023-01-31 99.822 2023-01-27 100.708
  15. 2023-02-01 99.885 2023-01-30 100.57
  16. 2023-02-02 101.121 2023-01-31 100.773
  17. 2023-02-03 100.854 2023-02-01 100.999
  18. 2023-02-06 100.5 2023-02-02 102.037
  19. 2023-02-07 100.272 2023-02-03 102.104
  20. 2023-02-08 100.372 2023-02-06 101.85
  21. 2023-02-09 100.659 2023-02-07 101.765
  22. 2023-02-10 100.421 2023-02-08 101.806
  23. 2023-02-13 100.418 2023-02-09 101.905
  24. 2023-02-14 100.202 2023-02-10 101.675
  25. 2023-02-15 99.913 2023-02-13 101.491
  26. 2023-02-16 99.832 2023-02-14 101.304
  27. 2023-02-17 99.911 2023-02-15 101.242
  28. 2023-02-20 99.791 2023-02-16 101.621
  29. 2023-02-21 99.451 2023-02-17 101.581
  30. 2023-02-22 99.467 2023-02-20 101.545
  31. 2023-02-23 99.642 2023-02-21 101.334
  32. 2023-02-24 99.278 2023-02-22 101.246
  33. 2023-02-27 99.114 2023-02-23 101.857
  34. 2023-02-28 98.784 2023-02-24 101.71
  35. 2023-03-01 98.486 2023-02-27 101.759
  36. 2023-03-02 98.396 2023-02-28 101.649
  37. 2023-03-03 98.467 2023-03-01 101.583
  38. 2023-03-06 98.276 2023-03-02 101.426
  39. 2023-03-07 98.495 2023-03-03 101.666
  40. 2023-03-08 98.572 2023-03-06 101.919
  41. 2023-03-09 98.747 2023-03-07 102.048
  42. 2023-03-10 99.489 2023-03-08 101.915
  43. NA NA 2023-03-09 101.927
  44. NA NA 2023-03-10 101.775
  45. NA NA NA NA
  46. NA NA NA NA
  47. NA NA NA NA&quot;
  48. library(tidyverse)
  49. library(dplyr)
  50. DF &lt;- read.table(text = lines, header = TRUE)

Then, let me format the dates in proper format:

  1. library(zoo)
  2. # format dates to POSIXct format
  3. DF$date1 &lt;- as.POSIXct(DF$date1)
  4. DF$date2 &lt;- as.POSIXct(DF$date2)

One way is to create two different datasets (looking at your requirement):

  1. df1 &lt;- DF %&gt;% select(date1, Var1) %&gt;% na.omit() %&gt;% set_names(c(&quot;Date&quot;, &quot;Var&quot;))
  2. df2 &lt;- DF %&gt;% select(date2, Var2) %&gt;% na.omit() %&gt;% set_names(c(&quot;Date&quot;, &quot;Var&quot;))

The create the separate zoo objects out of these:

  1. zoo1 &lt;- zoo(df1$Var, order.by = df1$Date)
  2. zoo2 &lt;- zoo(df2$Var, order.by = df2$Date)

Or if you want to merge these variables, you could do:

  1. # merge both the dataframes created above
  2. mergedDf &lt;- merge(df1, df2, by = &quot;Date&quot;)
  3. # create the zoo object
  4. zooObject &lt;- zoo(mergedDf$Var.x, order.by = mergedDf$Date)

Let me know if this helps.

答案3

得分: 0

以下是翻译好的部分:

在问题中,NA总是位于开头,因此使用Note末尾的Lines来定义N作为注释字符。

  1. library(zoo)
  2. z <- read.zoo(text = Lines, header = TRUE, comment.char = "N")

注释

  1. Lines <- "date1 Var1 date2 Var2
  2. 2023-01-13 100.325 2023-01-11 99.748
  3. 2023-01-16 100.378 2023-01-12 99.832
  4. ...
  5. (后续部分省略)
英文:

In the question NA is always at the beginning so using Lines from the Note at the end define N as a comment character.

  1. library(zoo)
  2. z &lt;- read.zoo(text = Lines, header = TRUE, comment.chaqr = &quot;N&quot;)

Note

  1. Lines &lt;- &quot;date1 Var1 date2 Var2
  2. 2023-01-13 100.325 2023-01-11 99.748
  3. 2023-01-16 100.378 2023-01-12 99.832
  4. 2023-01-17 100.826 2023-01-13 99.878
  5. 2023-01-18 100.933 2023-01-16 99.762
  6. 2023-01-19 100.641 2023-01-17 99.484
  7. 2023-01-20 100.148 2023-01-18 99.743
  8. 2023-01-23 99.972 2023-01-19 99.419
  9. 2023-01-24 100.256 2023-01-20 99.364
  10. 2023-01-25 100.348 2023-01-23 99.533
  11. 2023-01-26 100.146 2023-01-24 99.711
  12. 2023-01-27 100.063 2023-01-25 99.798
  13. 2023-01-30 99.649 2023-01-26 100.481
  14. 2023-01-31 99.822 2023-01-27 100.708
  15. 2023-02-01 99.885 2023-01-30 100.57
  16. 2023-02-02 101.121 2023-01-31 100.773
  17. 2023-02-03 100.854 2023-02-01 100.999
  18. 2023-02-06 100.5 2023-02-02 102.037
  19. 2023-02-07 100.272 2023-02-03 102.104
  20. 2023-02-08 100.372 2023-02-06 101.85
  21. 2023-02-09 100.659 2023-02-07 101.765
  22. 2023-02-10 100.421 2023-02-08 101.806
  23. 2023-02-13 100.418 2023-02-09 101.905
  24. 2023-02-14 100.202 2023-02-10 101.675
  25. 2023-02-15 99.913 2023-02-13 101.491
  26. 2023-02-16 99.832 2023-02-14 101.304
  27. 2023-02-17 99.911 2023-02-15 101.242
  28. 2023-02-20 99.791 2023-02-16 101.621
  29. 2023-02-21 99.451 2023-02-17 101.581
  30. 2023-02-22 99.467 2023-02-20 101.545
  31. 2023-02-23 99.642 2023-02-21 101.334
  32. 2023-02-24 99.278 2023-02-22 101.246
  33. 2023-02-27 99.114 2023-02-23 101.857
  34. 2023-02-28 98.784 2023-02-24 101.71
  35. 2023-03-01 98.486 2023-02-27 101.759
  36. 2023-03-02 98.396 2023-02-28 101.649
  37. 2023-03-03 98.467 2023-03-01 101.583
  38. 2023-03-06 98.276 2023-03-02 101.426
  39. 2023-03-07 98.495 2023-03-03 101.666
  40. 2023-03-08 98.572 2023-03-06 101.919
  41. 2023-03-09 98.747 2023-03-07 102.048
  42. 2023-03-10 99.489 2023-03-08 101.915
  43. NA NA 2023-03-09 101.927
  44. NA NA 2023-03-10 101.775
  45. NA NA NA NA
  46. NA NA NA NA
  47. NA NA NA NA&quot;

huangapple
  • 本文由 发表于 2023年4月13日 18:55:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76004585.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定