英文:
Unable to use read.zoo due to the presence of NAs
问题
I have a large dataset of irregular multivariate timeseries that I want to convert with read.zoo.
一些最后几行填充了NAs。当我运行read.zoo包括带有NAs的行时,我收到以下错误消息:"index has bad entries at data rows: 43 44 ..."
当我检查is.na()时,NA单元格显示为TRUE。我尝试了来自这里的na.fill解决方案,但它不起作用。
下面是一个包含两个变量Var1和Var2以及它们的日期date1和date2的数据集摘录:
2023-01-13 100.325 2023-01-11 99.748
2023-01-16 100.378 2023-01-12 99.832
2023-01-17 100.826 2023-01-13 99.878
...
(后续数据省略)
...
2023-03-10 99.489 2023-03-08 101.915
NA NA 2023-03-09 101.927
NA NA 2023-03-10 101.775
NA NA NA NA
NA NA NA NA
NA NA NA NA
英文:
I have a large dataset of irregular multivariate timeseries that I want to convert with read.zoo.
Some of the last rows are populated with NAs. When I run read.zoo including the rows with the NAs, I get the following error message: "index has bad entries at data rows: 43 44 ...".
When I check is.na() the NA cells indicate TRUE. And I tried the na.fill solution from here, but it doesn't work.
Below is an extract of the dataset with two variables Var1 and Var2 with their respective dates date1 and date2:
date1 Var1 date2 Var2
2023-01-13 100.325 2023-01-11 99.748
2023-01-16 100.378 2023-01-12 99.832
2023-01-17 100.826 2023-01-13 99.878
2023-01-18 100.933 2023-01-16 99.762
2023-01-19 100.641 2023-01-17 99.484
2023-01-20 100.148 2023-01-18 99.743
2023-01-23 99.972 2023-01-19 99.419
2023-01-24 100.256 2023-01-20 99.364
2023-01-25 100.348 2023-01-23 99.533
2023-01-26 100.146 2023-01-24 99.711
2023-01-27 100.063 2023-01-25 99.798
2023-01-30 99.649 2023-01-26 100.481
2023-01-31 99.822 2023-01-27 100.708
2023-02-01 99.885 2023-01-30 100.57
2023-02-02 101.121 2023-01-31 100.773
2023-02-03 100.854 2023-02-01 100.999
2023-02-06 100.5 2023-02-02 102.037
2023-02-07 100.272 2023-02-03 102.104
2023-02-08 100.372 2023-02-06 101.85
2023-02-09 100.659 2023-02-07 101.765
2023-02-10 100.421 2023-02-08 101.806
2023-02-13 100.418 2023-02-09 101.905
2023-02-14 100.202 2023-02-10 101.675
2023-02-15 99.913 2023-02-13 101.491
2023-02-16 99.832 2023-02-14 101.304
2023-02-17 99.911 2023-02-15 101.242
2023-02-20 99.791 2023-02-16 101.621
2023-02-21 99.451 2023-02-17 101.581
2023-02-22 99.467 2023-02-20 101.545
2023-02-23 99.642 2023-02-21 101.334
2023-02-24 99.278 2023-02-22 101.246
2023-02-27 99.114 2023-02-23 101.857
2023-02-28 98.784 2023-02-24 101.71
2023-03-01 98.486 2023-02-27 101.759
2023-03-02 98.396 2023-02-28 101.649
2023-03-03 98.467 2023-03-01 101.583
2023-03-06 98.276 2023-03-02 101.426
2023-03-07 98.495 2023-03-03 101.666
2023-03-08 98.572 2023-03-06 101.919
2023-03-09 98.747 2023-03-07 102.048
2023-03-10 99.489 2023-03-08 101.915
NA NA 2023-03-09 101.927
NA NA 2023-03-10 101.775
NA NA NA NA
NA NA NA NA
NA NA NA NA
答案1
得分: 1
The solution was provided by @G. Grothendieck in another post here:
将 as.data.frame(x) 替换为 na.omit(as.data.frame(x))
英文:
The solution was provided by @G. Grothendieck in another post here:
Replace as.data.frame(x) with na.omit(as.data.frame(x))
答案2
得分: 0
首先,让我为您从您的数据创建一个数据框架:
lines <- "date1 Var1 date2 Var2
2023-01-13 100.325 2023-01-11 99.748
2023-01-16 100.378 2023-01-12 99.832
2023-01-17 100.826 2023-01-13 99.878
2023-01-18 100.933 2023-01-16 99.762
2023-01-19 100.641 2023-01-17 99.484
2023-01-20 100.148 2023-01-18 99.743
2023-01-23 99.972 2023-01-19 99.419
2023-01-24 100.256 2023-01-20 99.364
2023-01-25 100.348 2023-01-23 99.533
2023-01-26 100.146 2023-01-24 99.711
2023-01-27 100.063 2023-01-25 99.798
2023-01-30 99.649 2023-01-26 100.481
2023-01-31 99.822 2023-01-27 100.708
2023-02-01 99.885 2023-01-30 100.57
2023-02-02 101.121 2023-01-31 100.773
2023-02-03 100.854 2023-02-01 100.999
2023-02-06 100.5 2023-02-02 102.037
2023-02-07 100.272 2023-02-03 102.104
2023-02-08 100.372 2023-02-06 101.85
2023-02-09 100.659 2023-02-07 101.765
2023-02-10 100.421 2023-02-08 101.806
2023-02-13 100.418 2023-02-09 101.905
2023-02-14 100.202 2023-02-10 101.675
2023-02-15 99.913 2023-02-13 101.491
2023-02-16 99.832 2023-02-14 101.304
2023-02-17 99.911 2023-02-15 101.242
2023-02-20 99.791 2023-02-16 101.621
2023-02-21 99.451 2023-02-17 101.581
2023-02-22 99.467 2023-02-20 101.545
2023-02-23 99.642 2023-02-21 101.334
2023-02-24 99.278 2023-02-22 101.246
2023-02-27 99.114 2023-02-23 101.857
2023-02-28 98.784 2023-02-24 101.71
2023-03-01 98.486 2023-02-27 101.759
2023-03-02 98.396 2023-02-28 101.649
2023-03-03 98.467 2023-03-01 101.583
2023-03-06 98.276 2023-03-02 101.426
2023-03-07 98.495 2023-03-03 101.666
2023-03-08 98.572 2023-03-06 101.919
2023-03-09 98.747 2023-03-07 102.048
2023-03-10 99.489 2023-03-08 101.915
NA NA 2023-03-09 101.927
NA NA 2023-03-10 101.775
NA NA NA NA
NA NA NA NA"
library(tidyverse)
library(dplyr)
DF <- read.table(text = lines, header = TRUE)
然后,让我将日期格式化为正确的格式:
library(zoo)
# 将日期格式化为POSIXct格式
DF$date1 <- as.POSIXct(DF$date1)
DF$date2 <- as.POSIXct(DF$date2)
如果您想根据您的需求创建两个不同的数据集,可以这样做:
df1 <- DF %>% select(date1, Var1) %>% na.omit() %>% set_names(c("Date", "Var"))
df2 <- DF %>% select(date2, Var2) %>% na.omit() %>% set_names(c("Date", "Var"))
然后,将这些分开的数据集创建成zoo对象:
zoo1 <- zoo(df1$Var, order.by = df1$Date)
zoo2 <- zoo(df2$Var, order.by = df2$Date)
或者,如果您想合并这些变量,可以这样做:
# 合并上面创建的两个数据框架
mergedDf <- merge(df1, df2, by = "Date")
# 创建zoo对象
zooObject <- zoo(mergedDf$Var.x, order.by = mergedDf$Date)
希望这有所帮助。
英文:
first let me create a dataframe from your data:
lines <- "date1 Var1 date2 Var2
2023-01-13 100.325 2023-01-11 99.748
2023-01-16 100.378 2023-01-12 99.832
2023-01-17 100.826 2023-01-13 99.878
2023-01-18 100.933 2023-01-16 99.762
2023-01-19 100.641 2023-01-17 99.484
2023-01-20 100.148 2023-01-18 99.743
2023-01-23 99.972 2023-01-19 99.419
2023-01-24 100.256 2023-01-20 99.364
2023-01-25 100.348 2023-01-23 99.533
2023-01-26 100.146 2023-01-24 99.711
2023-01-27 100.063 2023-01-25 99.798
2023-01-30 99.649 2023-01-26 100.481
2023-01-31 99.822 2023-01-27 100.708
2023-02-01 99.885 2023-01-30 100.57
2023-02-02 101.121 2023-01-31 100.773
2023-02-03 100.854 2023-02-01 100.999
2023-02-06 100.5 2023-02-02 102.037
2023-02-07 100.272 2023-02-03 102.104
2023-02-08 100.372 2023-02-06 101.85
2023-02-09 100.659 2023-02-07 101.765
2023-02-10 100.421 2023-02-08 101.806
2023-02-13 100.418 2023-02-09 101.905
2023-02-14 100.202 2023-02-10 101.675
2023-02-15 99.913 2023-02-13 101.491
2023-02-16 99.832 2023-02-14 101.304
2023-02-17 99.911 2023-02-15 101.242
2023-02-20 99.791 2023-02-16 101.621
2023-02-21 99.451 2023-02-17 101.581
2023-02-22 99.467 2023-02-20 101.545
2023-02-23 99.642 2023-02-21 101.334
2023-02-24 99.278 2023-02-22 101.246
2023-02-27 99.114 2023-02-23 101.857
2023-02-28 98.784 2023-02-24 101.71
2023-03-01 98.486 2023-02-27 101.759
2023-03-02 98.396 2023-02-28 101.649
2023-03-03 98.467 2023-03-01 101.583
2023-03-06 98.276 2023-03-02 101.426
2023-03-07 98.495 2023-03-03 101.666
2023-03-08 98.572 2023-03-06 101.919
2023-03-09 98.747 2023-03-07 102.048
2023-03-10 99.489 2023-03-08 101.915
NA NA 2023-03-09 101.927
NA NA 2023-03-10 101.775
NA NA NA NA
NA NA NA NA
NA NA NA NA"
library(tidyverse)
library(dplyr)
DF <- read.table(text = lines, header = TRUE)
Then, let me format the dates in proper format:
library(zoo)
# format dates to POSIXct format
DF$date1 <- as.POSIXct(DF$date1)
DF$date2 <- as.POSIXct(DF$date2)
One way is to create two different datasets (looking at your requirement):
df1 <- DF %>% select(date1, Var1) %>% na.omit() %>% set_names(c("Date", "Var"))
df2 <- DF %>% select(date2, Var2) %>% na.omit() %>% set_names(c("Date", "Var"))
The create the separate zoo objects out of these:
zoo1 <- zoo(df1$Var, order.by = df1$Date)
zoo2 <- zoo(df2$Var, order.by = df2$Date)
Or if you want to merge these variables, you could do:
# merge both the dataframes created above
mergedDf <- merge(df1, df2, by = "Date")
# create the zoo object
zooObject <- zoo(mergedDf$Var.x, order.by = mergedDf$Date)
Let me know if this helps.
答案3
得分: 0
以下是翻译好的部分:
在问题中,NA总是位于开头,因此使用Note末尾的Lines
来定义N作为注释字符。
library(zoo)
z <- read.zoo(text = Lines, header = TRUE, comment.char = "N")
注释
Lines <- "date1 Var1 date2 Var2
2023-01-13 100.325 2023-01-11 99.748
2023-01-16 100.378 2023-01-12 99.832
...
(后续部分省略)
英文:
In the question NA is always at the beginning so using Lines
from the Note at the end define N as a comment character.
library(zoo)
z <- read.zoo(text = Lines, header = TRUE, comment.chaqr = "N")
Note
Lines <- "date1 Var1 date2 Var2
2023-01-13 100.325 2023-01-11 99.748
2023-01-16 100.378 2023-01-12 99.832
2023-01-17 100.826 2023-01-13 99.878
2023-01-18 100.933 2023-01-16 99.762
2023-01-19 100.641 2023-01-17 99.484
2023-01-20 100.148 2023-01-18 99.743
2023-01-23 99.972 2023-01-19 99.419
2023-01-24 100.256 2023-01-20 99.364
2023-01-25 100.348 2023-01-23 99.533
2023-01-26 100.146 2023-01-24 99.711
2023-01-27 100.063 2023-01-25 99.798
2023-01-30 99.649 2023-01-26 100.481
2023-01-31 99.822 2023-01-27 100.708
2023-02-01 99.885 2023-01-30 100.57
2023-02-02 101.121 2023-01-31 100.773
2023-02-03 100.854 2023-02-01 100.999
2023-02-06 100.5 2023-02-02 102.037
2023-02-07 100.272 2023-02-03 102.104
2023-02-08 100.372 2023-02-06 101.85
2023-02-09 100.659 2023-02-07 101.765
2023-02-10 100.421 2023-02-08 101.806
2023-02-13 100.418 2023-02-09 101.905
2023-02-14 100.202 2023-02-10 101.675
2023-02-15 99.913 2023-02-13 101.491
2023-02-16 99.832 2023-02-14 101.304
2023-02-17 99.911 2023-02-15 101.242
2023-02-20 99.791 2023-02-16 101.621
2023-02-21 99.451 2023-02-17 101.581
2023-02-22 99.467 2023-02-20 101.545
2023-02-23 99.642 2023-02-21 101.334
2023-02-24 99.278 2023-02-22 101.246
2023-02-27 99.114 2023-02-23 101.857
2023-02-28 98.784 2023-02-24 101.71
2023-03-01 98.486 2023-02-27 101.759
2023-03-02 98.396 2023-02-28 101.649
2023-03-03 98.467 2023-03-01 101.583
2023-03-06 98.276 2023-03-02 101.426
2023-03-07 98.495 2023-03-03 101.666
2023-03-08 98.572 2023-03-06 101.919
2023-03-09 98.747 2023-03-07 102.048
2023-03-10 99.489 2023-03-08 101.915
NA NA 2023-03-09 101.927
NA NA 2023-03-10 101.775
NA NA NA NA
NA NA NA NA
NA NA NA NA"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论