Missing data using plot() in R: should I use na.omit(), !is.na(), approx()? If so then how?

huangapple go评论60阅读模式
英文:

Missing data using plot() in R: should I use na.omit(), !is.na(), approx()? If so then how?

问题

The problem you're facing is that you want to plot multiple variables over time, but some of them have missing data. You've tried various solutions, but encountered errors. Here's a corrected version of your R code with explanations in Chinese:

# 问题
# 我有三个随时间记录的变量。第一个(黑色)在每个时间段记录,第二个(蓝色)在每隔一个时间段记录,第三个(红色)在除一个时间段外的每个时间段记录。我尝试在R中绘制这些数据:

# 创建数据框
test <- data.frame(time=c(1:5), black=c(3, 3, 3, 3, 3), blue=c(1, NA, 3, NA, 5), red=c(5, 4, NA, 2, 1))

# 绘制黑色线
plot(test$time, test$black, type="l", col="black")

# 添加蓝色线
lines(test$time, na.approx(test$blue), col="blue")

# 添加红色线
lines(test$time, na.approx(test$red), col="red")

这段代码解决了你的问题,使用na.approx()函数来填充缺失值并绘制连续的线条。希望这对你有所帮助!

英文:

The problem

I have three variables recorded over time. The first (black) is recorded at every time period, the second (blue) every other time period, the third (red) at every time period except one. I try to plot these in R:

test &lt;- data.frame(time=c(1:5), black=c(3, 3, 3, 3, 3), blue=c(1, NA, 3, NA, 5), red=c(5, 4, NA, 2, 1))

plot(test$time, test$black, type=&quot;l&quot;, col=&quot;black&quot;)
lines(test$time, test$blue, col=&quot;blue&quot;)
lines(test$time, test$red, col=&quot;red&quot;)

The result is a plot in which 'black' is the only continuous line, 'blue' is completely absent, and 'red' is absent between time 2 and time 4. I would like all three lines to be continuous.

Attempted solutions

From https://stackoverflow.com/questions/15533212/how-to-connect-dots-where-there-are-missing-values

plot(na.omit(test), test$time, test$black, type=&quot;l&quot;, col=&quot;black&quot;)

Returns "Error in match.fun(panel) : 'test$black' is not a function, character or symbol".

na.omit(test)
plot(test$time, test$black, type=&quot;l&quot;, col=&quot;black&quot;)
lines(test$time, test$blue, col=&quot;blue&quot;)
lines(test$time, test$red, col=&quot;red&quot;)

The plot is the same as in my original problem, and actually omits every time period in which one of the variables is missing data, so actual data (in this example, for black) is omitted alongside every period in which there is missing data for any of the other variables.

From https://stackoverflow.com/questions/64054445/how-to-i-draw-a-line-plot-and-ignore-missing-values-in-r

plot(type=&quot;l&quot;, test$time, test$black, col=&quot;black&quot;)
lines(which(!is.na(test$blue)), na.omit(test$blue), test$time, test$blue, col=&quot;blue&quot;)
lines(test$time, test$red, col=&quot;red&quot;)

Returns "Error in plot.xy(xy.coords(x, y), type = type, ...) : invalid plot type". Even amending the first line to plot(test$time, test$black, col=&quot;black&quot;) does not resolve this error.

From https://stackoverflow.com/questions/15533212/how-to-connect-dots-where-there-are-missing-values

plot(approx(test, xout=seq_along(test))$y, type=&quot;l&quot;, test$time, test$black, col=&quot;black&quot;)

Returns "Error in xy.coords(x, y, xlabel, ylabel, log) :  'x' and 'y' lengths differ".

From https://stackoverflow.com/questions/42590545/r-plotting-a-line-with-missing-na-values

There it is commented that na.omit() or na.approx() "seem to work only if I would plot 'A' separately in a stand-alone plot, they do not seem to work in conjunction with 'Time' and 'B' and 'C' all in the same plot" and that this as a "super weird bug". They suggest:

plot(test$time[!is.na(test$black)],test$black[!is.na(test$black)],type=&quot;l&quot;)
lines(test$time,test$blue, type=&quot;l&quot;,col=&quot;blue&quot;)
lines(test$time, test$red, type=&quot;l&quot;, col=&quot;red&quot;)

The plot is the same as in my original problem. If I change the coding for 'blue' to (test$time, test$blue, type=&quot;p&quot;, col=&quot;blue&quot;) then I get a single point at time point 3, but not the line that I would expect.

Also from https://stackoverflow.com/questions/42590545/r-plotting-a-line-with-missing-na-values

xlim &lt;- range(test$time)
ylim &lt;- range(subset[-1], na.rm = TRUE)

Quickly returns "Error in subset[-1] : object of type 'closure' is not subsettable".

ok &lt;- ! is.na(test$black)
plot(black ~ time, time, time = ok, type = &quot;l&quot;, xlim = xlim, ylim = ylim)

Quickly returns "Error in FUN(X[[i]], ...) : invalid 'envir' argument of type 'closure'". I also cannot see how 'blue' or 'red' data would enter into this plot, even if it didn't return an error.

So is there any way to use plot() for plotting multiple variables over time when one of them has missing data?

答案1

得分: 2

如果您想在数据缺失时连接数据点,您可以使用zoo库:

install.packages('zoo')
library(zoo)

# 创建数据框
test <- data.frame(time = 1:5, black = c(3, 3, 3, 3, 3), blue = c(1, NA, 3, NA, 5), red = c(5, 4, NA, 2, 1))

# 插值缺失值
test$blue <- na.approx(test$blue)
test$red <- na.approx(test$red)

# 绘制数据
plot(test$time, test$black, type = "l", col = "black", ylim = range(na.omit(test[-1])))
lines(test$time, test$blue, col = "blue")
lines(test$time, test$red, col = "red")

Missing data using plot() in R: should I use na.omit(), !is.na(), approx()? If so then how?

英文:

If you want to connect the dots when you have missing data, you can use library(zoo)

install.packages(&#39;zoo&#39;)
library(zoo)

# Create the data frame
test &lt;- data.frame(time = 1:5, black = c(3, 3, 3, 3, 3), blue = c(1, NA, 3, NA, 5), red = c(5, 4, NA, 2, 1))

# Interpolate missing values
test$blue &lt;- na.approx(test$blue)
test$red &lt;- na.approx(test$red)

# Plot the data
plot(test$time, test$black, type = &quot;l&quot;, col = &quot;black&quot;, ylim = range(na.omit(test[-1])))
lines(test$time, test$blue, col = &quot;blue&quot;)
lines(test$time, test$red, col = &quot;red&quot;)

Missing data using plot() in R: should I use na.omit(), !is.na(), approx()? If so then how?

答案2

得分: 1

使用approx来对每个y轴向量进行插值。要绘制插值结果,首先打开一个空白图,然后使用mapply循环将每列和线条颜色传递给插值和绘图代码。

test &lt;- data.frame(time = 1:5, 
                   black = c(3, 3, 3, 3, 3), 
                   blue = c(1, NA, 3, NA, 5), 
                   red = c(5, 4, NA, 2, 1))

clrs &lt;- names(test)[-1]
xlim &lt;- range(test$time)
ylim &lt;- range(test[-1], na.rm = TRUE)
plot(NA, NA, type = &quot;n&quot;, xlim = xlim, ylim = ylim)
mapply(\(y, col) {
  dat &lt;- approx(x = test$time, y)
  lines(y ~ x, data = dat, col = col)
}, test[-1], clrs)

Missing data using plot() in R: should I use na.omit(), !is.na(), approx()? If so then how?<!-- -->

#> $black
#> NULL
#>
#> $blue
#> NULL
#>
#> $red
#> NULL

<sup>创建于2023-06-15,使用reprex v2.0.2。</sup>

英文:

Use approx to interpolate each y-axis vector. To plot the interpolated results start by opening an empty plot then use a mapply loop to pass each column and line color to the interpolating and plotting code.

test &lt;- data.frame(time = 1:5, 
                   black = c(3, 3, 3, 3, 3), 
                   blue = c(1, NA, 3, NA, 5), 
                   red = c(5, 4, NA, 2, 1))

clrs &lt;- names(test)[-1]
xlim &lt;- range(test$time)
ylim &lt;- range(test[-1], na.rm = TRUE)
plot(NA, NA, type = &quot;n&quot;, xlim = xlim, ylim = ylim)
mapply(\(y, col) {
  dat &lt;- approx(x = test$time, y)
  lines(y ~ x, data = dat, col = col)
}, test[-1], clrs)

Missing data using plot() in R: should I use na.omit(), !is.na(), approx()? If so then how?<!-- -->

#&gt; $black
#&gt; NULL
#&gt; 
#&gt; $blue
#&gt; NULL
#&gt; 
#&gt; $red
#&gt; NULL

<sup>Created on 2023-06-15 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年6月15日 15:57:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76480297.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定