线性回归模型在R中计算出现错误。

huangapple go评论79阅读模式
英文:

Linear regression model incorrectly calculated in R

问题

我有3个不同的数据集,我以这种方式绘制它们:

线性回归模型在R中计算出现错误。

每个数据集都是从文件导入到数据框中的(分别称为vueslikescommentaires),并包含日期和相应日期的数据(观看次数、点赞或评论)。
现在,我想在我的图表上绘制线性模型(likes ~ views和comments ~ views)。

从红色线开始,我输入了以下代码:

abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=c(likes,vues)),col="red")

这是RStudio绘制的图形:

线性回归模型在R中计算出现错误。

现在我不明白问题是来自数据集还是其他地方,但如果我删除data参数,或者只选择其中一个数据集,它仍然会执行完全相同的操作,即以下操作:

abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=likes),col="red")
abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=vues),col="red")
abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30),col="red")

这是我的数据:

> vues$X2022.12.30
  [1]   15900    8245    4531  546800    7149   10600    7774   45600  157100
 [10]  348300   15000    7363   24000    6073    6469    5848   13100  185600
 [19]   18700    7622  483800    6373   12000    7839   17100   10800    9846
 [28]    5671   10100    8330    9031  183000   17600    5153  117700   39600
 [37]   10300   27900   11200   29500  387800   15000    8968  465800   72500
 [46]    9501    5816    9761    5814   16200  269700    8905   16300   14700
 [55]  149600    7547  422600   40700   71100   18900  942000   12100   13400
 [64]  551900   16500   12000    8648  131900   10700   18400  183700   13500
 [73]   21500 1203000   14300   14700  108400    5233  388800  368400 1411000
 [82]  286400   17900  261500 1049000   13500   11200   74300 1312000    6044
 [91]   22200    9467    5975  143200    4552  502700    3971    9755   32000
[100]   46800    8844   31600    3671   60700    8249   20100   14500    3475
[109]    5745    2420  193700    2305   13500   90200    5746    5520   29200
[118]    7803    2502    4559    2120    3233  242100    5616    1371    1109
[127]    2123    2097    4019    1444    1515    2350   34600    2642  148000
[136]    2139  541400   13700   52600  421700    9876    3671   33600    6388
[145]   12300    3014   50200    2033   45900    5878    2221    1479
> likes$X2022.12.30
  [1]   1572    935    229  39000    471    944    472   2149  15400  42000   1346
 [12]    517   1977    488    569    462   1940  17200   2121    588  84800    587
 [23]    987    618   1229    862    947    278   1048    628    795  19200   1529
 [34]    319   9050   3119    868   2840    780   1912  40100   1130    759  47800
 [45]   4197    815    470    786    502   1068  33200    698   1145   1442  11200
 [56]    534  41600   3740   5119   2376  91700    904    983  20800    812    869
 [67]    571   6653    807   1356   7332   1005   1597 104700   1171    982  14300
 [78]    367  14900  29800 103500  11900   1073  22700  67700    872    894   3673
 [89] 116800    251   2229    593    392  20400    267  29200    449    569   1933
[100]   2260   1031   3035    311   6370   1014    812    956    241    641    116
[

<details>
<summary>英文:</summary>

I have 3 different datasets that I have been plotting this way:

[![Aucune description](https://i.stack.imgur.com/HCP4r.png)](https://i.stack.imgur.com/HCP4r.png)

Each dataset was imported from a file to a data frame (respectively called `vues`, `likes` and `commentaires`), and contains the date and the corresponding data (either views, likes or comments) for each date.
Now, I&#39;d like to plot both linear models onto my graph (likes \~ views and comments \~ views).

Starting with the red one, I entered the following code:

    abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=c(likes,vues)),col=&quot;red&quot;)

And this is what RStudio plots:

[![Aucune description](https://i.stack.imgur.com/Nnapc.png)](https://i.stack.imgur.com/Nnapc.png)

Now I don&#39;t understand if the problem comes from the dataset or somewhere else, but if I remove the `data` parameter, or just choose one of the two datasets, it still does the exact same thing, i.e. the following:

    abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=likes),col=&quot;red&quot;)
    abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=vues),col=&quot;red&quot;)
    abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30),col=&quot;red&quot;)

Here is my data:

> vues$X2022.12.30
[1] 15900 8245 4531 546800 7149 10600 7774 45600 157100
[10] 348300 15000 7363 24000 6073 6469 5848 13100 185600
[19] 18700 7622 483800 6373 12000 7839 17100 10800 9846
[28] 5671 10100 8330 9031 183000 17600 5153 117700 39600
[37] 10300 27900 11200 29500 387800 15000 8968 465800 72500
[46] 9501 5816 9761 5814 16200 269700 8905 16300 14700
[55] 149600 7547 422600 40700 71100 18900 942000 12100 13400
[64] 551900 16500 12000 8648 131900 10700 18400 183700 13500
[73] 21500 1203000 14300 14700 108400 5233 388800 368400 1411000
[82] 286400 17900 261500 1049000 13500 11200 74300 1312000 6044
[91] 22200 9467 5975 143200 4552 502700 3971 9755 32000
[100] 46800 8844 31600 3671 60700 8249 20100 14500 3475
[109] 5745 2420 193700 2305 13500 90200 5746 5520 29200
[118] 7803 2502 4559 2120 3233 242100 5616 1371 1109
[127] 2123 2097 4019 1444 1515 2350 34600 2642 148000
[136] 2139 541400 13700 52600 421700 9876 3671 33600 6388
[145] 12300 3014 50200 2033 45900 5878 2221 1479


> likes$X2022.12.30
[1] 1572 935 229 39000 471 944 472 2149 15400 42000 1346
[12] 517 1977 488 569 462 1940 17200 2121 588 84800 587
[23] 987 618 1229 862 947 278 1048 628 795 19200 1529
[34] 319 9050 3119 868 2840 780 1912 40100 1130 759 47800
[45] 4197 815 470 786 502 1068 33200 698 1145 1442 11200
[56] 534 41600 3740 5119 2376 91700 904 983 20800 812 869
[67] 571 6653 807 1356 7332 1005 1597 104700 1171 982 14300
[78] 367 14900 29800 103500 11900 1073 22700 67700 872 894 3673
[89] 116800 251 2229 593 392 20400 267 29200 449 569 1933
[100] 2260 1031 3035 311 6370 1014 812 956 241 641 116
[111] 6543 113 503 5505 450 410 2067 494 76 350 155
[122] 122 11400 350 51 42 109 96 200 62 53 98
[133] 1207 153 15500 101 56900 718 4498 23600 619 248 1803
[144] 437 983 234 4188 147 2623 591 176 138


And here is the code I used for plotting the graph if that is relevant:
plot.new()
par(mar=c(4,4,4,4))
par(new=TRUE)
par(bg=&quot;#FFECDE&quot;)
rect(par(&quot;usr&quot;)[1], par(&quot;usr&quot;)[3],
par(&quot;usr&quot;)[2], par(&quot;usr&quot;)[4],
col = c(&quot;#E1DEFF&quot;))
par(new=TRUE)
plot(vues$X2022.12.30,likes$X2022.12.30,col=&quot;red&quot;,axes=FALSE,xlab=&quot;&quot;,ylab=&quot;&quot;,
main=&quot;Nombre de j&#39;aime et de commentaires en fonction du nombre de vues&quot;,
pch=-0x2022,bg=&quot;red&quot;)
axis(2,ylim=c(0,120000),col=&quot;red&quot;, col.axis=&quot;red&quot;,at=seq(0, 120000, by=20000)) 
mtext(&quot;Nombre de j&#39;aime&quot;,side=2,line=2.5,col=&quot;red&quot;)
box()
par(new=TRUE)
plot(vues$X2022.12.30,commentaires$X2022.12.30,col=&quot;blue&quot;,axes=FALSE,xlab=&quot;&quot;,
ylab=&quot;&quot;,ylim=c(0,1500),pch=-0x2022,bg=&quot;blue&quot;)
axis(4,col=&quot;blue&quot;,col.axis=&quot;blue&quot;,at=seq(0, 1500, by=250)) 
mtext(&quot;Nombre de commentaires&quot;,side=4,line=2.5,col=&quot;blue&quot;)
axis(1,xlim=c(0,1500000),ylim=c(0,145000),col=&quot;black&quot;,col.axis=&quot;black&quot;,
at=seq(0, 1400000, by=100000)) 
mtext(&quot;Nombre de vues&quot;,side=1,line=2.5,col=&quot;black&quot;)
legend(x=&quot;topleft&quot;,legend=c(&quot;J&#39;aime&quot;,&quot;Commentaires&quot;),
text.col=c(&quot;black&quot;,&quot;black&quot;),pch=c(-0x2022,-0x2022),col=c(&quot;red&quot;,&quot;blue&quot;),
bg=c(&quot;#C9FFF1&quot;),inset=0.02)
</details>
# 答案1
**得分**: 2
使用 `par(new=TRUE)` 并叠加评论数据会改变y轴刻度;`abline()` 仍然假定旧的刻度仍然有效。
简单的解决方案是在添加评论数据之前使用 `abline()` 添加回归线。
示例:
```r
dd &lt;- data.frame(vues=
c(15900,8245,4531,546800,7149,10600,7774,45600,157100,
348300,15000,7363,24000,6073,6469,5848,13100,185600,
18700,7622,483800,6373,12000,7839,17100,10800,9846,
5671,10100,8330,9031,183000,17600,5153,117700,39600,
10300,27900,11200,29500,387800,15000,8968,465800,72500,
9501,5816,9761,5814,16200,269700,8905,16300,14700,
149600,7547,422600,40700,71100,18900,942000,12100,13400,
551900,16500,12000,8648,131900,10700,18400,183700,13500,
21500,1203000,14300,14700,108400,5233,388800,368400,1411000,
286400,17900,261500,1049000,13500,11200,74300,1312000,6044,
22200,9467,5975,143200,4552,502700,3971,9755,32000,
46800,8844,31600,3671,60700,8249,20100,14500,3475,
5745,2420,193700,2305,13500,90200,5746,5520,29200,
7803,2502,4559,2120,3233,242100,5616,1371,1109,
2123,2097,4019,1444,1515,2350,34600,2642,148000,
2139,541400,13700,52600,421700,9876,3671,33600,6388,
12300,3014,50200,2033,45900,5878,2221,1479),
likes = c(1572,935,229,39000,471,944,472,2149,15400,42000,1346,
517,1977,488,569,462,1940,17200,2121,588,84800,587,
987,618,1229,862,947,278,1048,628,795,19200,1529,
319,9050,3119,868,2840,780,1912,40100,1130,759,47800,
4197,815,470,786,502,1068,33200,698,1145,1442,11200,
534,41600,3740,5119,2376,91700,904,983,20800,812,869,
571,6653,807,1356,7332,1005,1597,104700,1171,982,14300,
367,14900,29800,103500,11900,1073,22700,67700,872,894,3673,
116800,251,2229,593,392,20400,267,29200,449,569,1933,
2260,1031,3035,311,6370,1014,812,956,241,641,116,
6543,113,503,5505,450,410,2067,494,76,350,155,
122,11400,350,51,42,109,96,200,62,53,98,
1207,153,15500,101,56900,718,4498,23600,619,248,1803,
437,983,234,4188,147,2623,591,176,138))
set.seed(101)
dd$other &lt;- runif(nrow(dd), min=0, max = 1500)
plot(likes~vues, data =dd)
abline(lm(likes~vues, data =dd))
par(new=TRUE)
plot(other~vues, data = dd, axes=FALSE, col = 2)
abline(lm(likes~vues, data =dd), col =4, lwd =2)

Note: The code portion is not translated, as requested.

英文:

Using par(new=TRUE) and overplotting the commentaries data changes the y-axis scale; abline() is still assuming the old scale is in effect.

The simple solution would be to use abline() to add the regression line before you add the commentaries data.

Example:

dd &lt;- data.frame(vues=
c(15900,8245,4531,546800,7149,10600,7774,45600,157100,
348300,15000,7363,24000,6073,6469,5848,13100,185600,
18700,7622,483800,6373,12000,7839,17100,10800,9846,
5671,10100,8330,9031,183000,17600,5153,117700,39600,
10300,27900,11200,29500,387800,15000,8968,465800,72500,
9501,5816,9761,5814,16200,269700,8905,16300,14700,
149600,7547,422600,40700,71100,18900,942000,12100,13400,
551900,16500,12000,8648,131900,10700,18400,183700,13500,
21500,1203000,14300,14700,108400,5233,388800,368400,1411000,
286400,17900,261500,1049000,13500,11200,74300,1312000,6044,
22200,9467,5975,143200,4552,502700,3971,9755,32000,
46800,8844,31600,3671,60700,8249,20100,14500,3475,
5745,2420,193700,2305,13500,90200,5746,5520,29200,
7803,2502,4559,2120,3233,242100,5616,1371,1109,
2123,2097,4019,1444,1515,2350,34600,2642,148000,
2139,541400,13700,52600,421700,9876,3671,33600,6388,
12300,3014,50200,2033,45900,5878,2221,1479),
likes = c(1572,935,229,39000,471,944,472,2149,15400,42000,1346,
517,1977,488,569,462,1940,17200,2121,588,84800,587,
987,618,1229,862,947,278,1048,628,795,19200,1529,
319,9050,3119,868,2840,780,1912,40100,1130,759,47800,
4197,815,470,786,502,1068,33200,698,1145,1442,11200,
534,41600,3740,5119,2376,91700,904,983,20800,812,869,
571,6653,807,1356,7332,1005,1597,104700,1171,982,14300,
367,14900,29800,103500,11900,1073,22700,67700,872,894,3673,
116800,251,2229,593,392,20400,267,29200,449,569,1933,
2260,1031,3035,311,6370,1014,812,956,241,641,116,
6543,113,503,5505,450,410,2067,494,76,350,155,
122,11400,350,51,42,109,96,200,62,53,98,
1207,153,15500,101,56900,718,4498,23600,619,248,1803,
437,983,234,4188,147,2623,591,176,138))
set.seed(101)
dd$other &lt;- runif(nrow(dd), min=0, max = 1500)

plot(likes~vues, data =dd)
abline(lm(likes~vues, data =dd))
par(new=TRUE)
plot(other~vues, data = dd, axes=FALSE, col = 2)
abline(lm(likes~vues, data =dd), col =4, lwd =2)

huangapple
  • 本文由 发表于 2023年1月9日 00:36:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75049537.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定