线性回归模型在R中计算出现错误。

huangapple go评论107阅读模式
英文:

Linear regression model incorrectly calculated in R

问题

我有3个不同的数据集,我以这种方式绘制它们:

线性回归模型在R中计算出现错误。

每个数据集都是从文件导入到数据框中的(分别称为vueslikescommentaires),并包含日期和相应日期的数据(观看次数、点赞或评论)。
现在,我想在我的图表上绘制线性模型(likes ~ views和comments ~ views)。

从红色线开始,我输入了以下代码:

  1. abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=c(likes,vues)),col="red")

这是RStudio绘制的图形:

线性回归模型在R中计算出现错误。

现在我不明白问题是来自数据集还是其他地方,但如果我删除data参数,或者只选择其中一个数据集,它仍然会执行完全相同的操作,即以下操作:

  1. abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=likes),col="red")
  2. abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=vues),col="red")
  3. abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30),col="red")

这是我的数据:

  1. > vues$X2022.12.30
  2. [1] 15900 8245 4531 546800 7149 10600 7774 45600 157100
  3. [10] 348300 15000 7363 24000 6073 6469 5848 13100 185600
  4. [19] 18700 7622 483800 6373 12000 7839 17100 10800 9846
  5. [28] 5671 10100 8330 9031 183000 17600 5153 117700 39600
  6. [37] 10300 27900 11200 29500 387800 15000 8968 465800 72500
  7. [46] 9501 5816 9761 5814 16200 269700 8905 16300 14700
  8. [55] 149600 7547 422600 40700 71100 18900 942000 12100 13400
  9. [64] 551900 16500 12000 8648 131900 10700 18400 183700 13500
  10. [73] 21500 1203000 14300 14700 108400 5233 388800 368400 1411000
  11. [82] 286400 17900 261500 1049000 13500 11200 74300 1312000 6044
  12. [91] 22200 9467 5975 143200 4552 502700 3971 9755 32000
  13. [100] 46800 8844 31600 3671 60700 8249 20100 14500 3475
  14. [109] 5745 2420 193700 2305 13500 90200 5746 5520 29200
  15. [118] 7803 2502 4559 2120 3233 242100 5616 1371 1109
  16. [127] 2123 2097 4019 1444 1515 2350 34600 2642 148000
  17. [136] 2139 541400 13700 52600 421700 9876 3671 33600 6388
  18. [145] 12300 3014 50200 2033 45900 5878 2221 1479
  1. > likes$X2022.12.30
  2. [1] 1572 935 229 39000 471 944 472 2149 15400 42000 1346
  3. [12] 517 1977 488 569 462 1940 17200 2121 588 84800 587
  4. [23] 987 618 1229 862 947 278 1048 628 795 19200 1529
  5. [34] 319 9050 3119 868 2840 780 1912 40100 1130 759 47800
  6. [45] 4197 815 470 786 502 1068 33200 698 1145 1442 11200
  7. [56] 534 41600 3740 5119 2376 91700 904 983 20800 812 869
  8. [67] 571 6653 807 1356 7332 1005 1597 104700 1171 982 14300
  9. [78] 367 14900 29800 103500 11900 1073 22700 67700 872 894 3673
  10. [89] 116800 251 2229 593 392 20400 267 29200 449 569 1933
  11. [100] 2260 1031 3035 311 6370 1014 812 956 241 641 116
  12. [
  13. <details>
  14. <summary>英文:</summary>
  15. I have 3 different datasets that I have been plotting this way:
  16. [![Aucune description](https://i.stack.imgur.com/HCP4r.png)](https://i.stack.imgur.com/HCP4r.png)
  17. Each dataset was imported from a file to a data frame (respectively called `vues`, `likes` and `commentaires`), and contains the date and the corresponding data (either views, likes or comments) for each date.
  18. Now, I&#39;d like to plot both linear models onto my graph (likes \~ views and comments \~ views).
  19. Starting with the red one, I entered the following code:
  20. abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=c(likes,vues)),col=&quot;red&quot;)
  21. And this is what RStudio plots:
  22. [![Aucune description](https://i.stack.imgur.com/Nnapc.png)](https://i.stack.imgur.com/Nnapc.png)
  23. Now I don&#39;t understand if the problem comes from the dataset or somewhere else, but if I remove the `data` parameter, or just choose one of the two datasets, it still does the exact same thing, i.e. the following:
  24. abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=likes),col=&quot;red&quot;)
  25. abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30,data=vues),col=&quot;red&quot;)
  26. abline(lm(likes$X2022.12.30 ~ vues$X2022.12.30),col=&quot;red&quot;)
  27. Here is my data:

> vues$X2022.12.30
[1] 15900 8245 4531 546800 7149 10600 7774 45600 157100
[10] 348300 15000 7363 24000 6073 6469 5848 13100 185600
[19] 18700 7622 483800 6373 12000 7839 17100 10800 9846
[28] 5671 10100 8330 9031 183000 17600 5153 117700 39600
[37] 10300 27900 11200 29500 387800 15000 8968 465800 72500
[46] 9501 5816 9761 5814 16200 269700 8905 16300 14700
[55] 149600 7547 422600 40700 71100 18900 942000 12100 13400
[64] 551900 16500 12000 8648 131900 10700 18400 183700 13500
[73] 21500 1203000 14300 14700 108400 5233 388800 368400 1411000
[82] 286400 17900 261500 1049000 13500 11200 74300 1312000 6044
[91] 22200 9467 5975 143200 4552 502700 3971 9755 32000
[100] 46800 8844 31600 3671 60700 8249 20100 14500 3475
[109] 5745 2420 193700 2305 13500 90200 5746 5520 29200
[118] 7803 2502 4559 2120 3233 242100 5616 1371 1109
[127] 2123 2097 4019 1444 1515 2350 34600 2642 148000
[136] 2139 541400 13700 52600 421700 9876 3671 33600 6388
[145] 12300 3014 50200 2033 45900 5878 2221 1479

> likes$X2022.12.30
[1] 1572 935 229 39000 471 944 472 2149 15400 42000 1346
[12] 517 1977 488 569 462 1940 17200 2121 588 84800 587
[23] 987 618 1229 862 947 278 1048 628 795 19200 1529
[34] 319 9050 3119 868 2840 780 1912 40100 1130 759 47800
[45] 4197 815 470 786 502 1068 33200 698 1145 1442 11200
[56] 534 41600 3740 5119 2376 91700 904 983 20800 812 869
[67] 571 6653 807 1356 7332 1005 1597 104700 1171 982 14300
[78] 367 14900 29800 103500 11900 1073 22700 67700 872 894 3673
[89] 116800 251 2229 593 392 20400 267 29200 449 569 1933
[100] 2260 1031 3035 311 6370 1014 812 956 241 641 116
[111] 6543 113 503 5505 450 410 2067 494 76 350 155
[122] 122 11400 350 51 42 109 96 200 62 53 98
[133] 1207 153 15500 101 56900 718 4498 23600 619 248 1803
[144] 437 983 234 4188 147 2623 591 176 138

  1. And here is the code I used for plotting the graph if that is relevant:
  2. plot.new()
  3. par(mar=c(4,4,4,4))
  4. par(new=TRUE)
  5. par(bg=&quot;#FFECDE&quot;)
  6. rect(par(&quot;usr&quot;)[1], par(&quot;usr&quot;)[3],
  7. par(&quot;usr&quot;)[2], par(&quot;usr&quot;)[4],
  8. col = c(&quot;#E1DEFF&quot;))
  9. par(new=TRUE)
  10. plot(vues$X2022.12.30,likes$X2022.12.30,col=&quot;red&quot;,axes=FALSE,xlab=&quot;&quot;,ylab=&quot;&quot;,
  11. main=&quot;Nombre de j&#39;aime et de commentaires en fonction du nombre de vues&quot;,
  12. pch=-0x2022,bg=&quot;red&quot;)
  13. axis(2,ylim=c(0,120000),col=&quot;red&quot;, col.axis=&quot;red&quot;,at=seq(0, 120000, by=20000))
  14. mtext(&quot;Nombre de j&#39;aime&quot;,side=2,line=2.5,col=&quot;red&quot;)
  15. box()
  16. par(new=TRUE)
  17. plot(vues$X2022.12.30,commentaires$X2022.12.30,col=&quot;blue&quot;,axes=FALSE,xlab=&quot;&quot;,
  18. ylab=&quot;&quot;,ylim=c(0,1500),pch=-0x2022,bg=&quot;blue&quot;)
  19. axis(4,col=&quot;blue&quot;,col.axis=&quot;blue&quot;,at=seq(0, 1500, by=250))
  20. mtext(&quot;Nombre de commentaires&quot;,side=4,line=2.5,col=&quot;blue&quot;)
  21. axis(1,xlim=c(0,1500000),ylim=c(0,145000),col=&quot;black&quot;,col.axis=&quot;black&quot;,
  22. at=seq(0, 1400000, by=100000))
  23. mtext(&quot;Nombre de vues&quot;,side=1,line=2.5,col=&quot;black&quot;)
  24. legend(x=&quot;topleft&quot;,legend=c(&quot;J&#39;aime&quot;,&quot;Commentaires&quot;),
  25. text.col=c(&quot;black&quot;,&quot;black&quot;),pch=c(-0x2022,-0x2022),col=c(&quot;red&quot;,&quot;blue&quot;),
  26. bg=c(&quot;#C9FFF1&quot;),inset=0.02)
  27. </details>
  28. # 答案1
  29. **得分**: 2
  30. 使用 `par(new=TRUE)` 并叠加评论数据会改变y轴刻度;`abline()` 仍然假定旧的刻度仍然有效。
  31. 简单的解决方案是在添加评论数据之前使用 `abline()` 添加回归线。
  32. 示例:
  33. ```r
  34. dd &lt;- data.frame(vues=
  35. c(15900,8245,4531,546800,7149,10600,7774,45600,157100,
  36. 348300,15000,7363,24000,6073,6469,5848,13100,185600,
  37. 18700,7622,483800,6373,12000,7839,17100,10800,9846,
  38. 5671,10100,8330,9031,183000,17600,5153,117700,39600,
  39. 10300,27900,11200,29500,387800,15000,8968,465800,72500,
  40. 9501,5816,9761,5814,16200,269700,8905,16300,14700,
  41. 149600,7547,422600,40700,71100,18900,942000,12100,13400,
  42. 551900,16500,12000,8648,131900,10700,18400,183700,13500,
  43. 21500,1203000,14300,14700,108400,5233,388800,368400,1411000,
  44. 286400,17900,261500,1049000,13500,11200,74300,1312000,6044,
  45. 22200,9467,5975,143200,4552,502700,3971,9755,32000,
  46. 46800,8844,31600,3671,60700,8249,20100,14500,3475,
  47. 5745,2420,193700,2305,13500,90200,5746,5520,29200,
  48. 7803,2502,4559,2120,3233,242100,5616,1371,1109,
  49. 2123,2097,4019,1444,1515,2350,34600,2642,148000,
  50. 2139,541400,13700,52600,421700,9876,3671,33600,6388,
  51. 12300,3014,50200,2033,45900,5878,2221,1479),
  52. likes = c(1572,935,229,39000,471,944,472,2149,15400,42000,1346,
  53. 517,1977,488,569,462,1940,17200,2121,588,84800,587,
  54. 987,618,1229,862,947,278,1048,628,795,19200,1529,
  55. 319,9050,3119,868,2840,780,1912,40100,1130,759,47800,
  56. 4197,815,470,786,502,1068,33200,698,1145,1442,11200,
  57. 534,41600,3740,5119,2376,91700,904,983,20800,812,869,
  58. 571,6653,807,1356,7332,1005,1597,104700,1171,982,14300,
  59. 367,14900,29800,103500,11900,1073,22700,67700,872,894,3673,
  60. 116800,251,2229,593,392,20400,267,29200,449,569,1933,
  61. 2260,1031,3035,311,6370,1014,812,956,241,641,116,
  62. 6543,113,503,5505,450,410,2067,494,76,350,155,
  63. 122,11400,350,51,42,109,96,200,62,53,98,
  64. 1207,153,15500,101,56900,718,4498,23600,619,248,1803,
  65. 437,983,234,4188,147,2623,591,176,138))
  66. set.seed(101)
  67. dd$other &lt;- runif(nrow(dd), min=0, max = 1500)
  68. plot(likes~vues, data =dd)
  69. abline(lm(likes~vues, data =dd))
  70. par(new=TRUE)
  71. plot(other~vues, data = dd, axes=FALSE, col = 2)
  72. abline(lm(likes~vues, data =dd), col =4, lwd =2)

Note: The code portion is not translated, as requested.

英文:

Using par(new=TRUE) and overplotting the commentaries data changes the y-axis scale; abline() is still assuming the old scale is in effect.

The simple solution would be to use abline() to add the regression line before you add the commentaries data.

Example:

  1. dd &lt;- data.frame(vues=
  2. c(15900,8245,4531,546800,7149,10600,7774,45600,157100,
  3. 348300,15000,7363,24000,6073,6469,5848,13100,185600,
  4. 18700,7622,483800,6373,12000,7839,17100,10800,9846,
  5. 5671,10100,8330,9031,183000,17600,5153,117700,39600,
  6. 10300,27900,11200,29500,387800,15000,8968,465800,72500,
  7. 9501,5816,9761,5814,16200,269700,8905,16300,14700,
  8. 149600,7547,422600,40700,71100,18900,942000,12100,13400,
  9. 551900,16500,12000,8648,131900,10700,18400,183700,13500,
  10. 21500,1203000,14300,14700,108400,5233,388800,368400,1411000,
  11. 286400,17900,261500,1049000,13500,11200,74300,1312000,6044,
  12. 22200,9467,5975,143200,4552,502700,3971,9755,32000,
  13. 46800,8844,31600,3671,60700,8249,20100,14500,3475,
  14. 5745,2420,193700,2305,13500,90200,5746,5520,29200,
  15. 7803,2502,4559,2120,3233,242100,5616,1371,1109,
  16. 2123,2097,4019,1444,1515,2350,34600,2642,148000,
  17. 2139,541400,13700,52600,421700,9876,3671,33600,6388,
  18. 12300,3014,50200,2033,45900,5878,2221,1479),
  19. likes = c(1572,935,229,39000,471,944,472,2149,15400,42000,1346,
  20. 517,1977,488,569,462,1940,17200,2121,588,84800,587,
  21. 987,618,1229,862,947,278,1048,628,795,19200,1529,
  22. 319,9050,3119,868,2840,780,1912,40100,1130,759,47800,
  23. 4197,815,470,786,502,1068,33200,698,1145,1442,11200,
  24. 534,41600,3740,5119,2376,91700,904,983,20800,812,869,
  25. 571,6653,807,1356,7332,1005,1597,104700,1171,982,14300,
  26. 367,14900,29800,103500,11900,1073,22700,67700,872,894,3673,
  27. 116800,251,2229,593,392,20400,267,29200,449,569,1933,
  28. 2260,1031,3035,311,6370,1014,812,956,241,641,116,
  29. 6543,113,503,5505,450,410,2067,494,76,350,155,
  30. 122,11400,350,51,42,109,96,200,62,53,98,
  31. 1207,153,15500,101,56900,718,4498,23600,619,248,1803,
  32. 437,983,234,4188,147,2623,591,176,138))
  33. set.seed(101)
  34. dd$other &lt;- runif(nrow(dd), min=0, max = 1500)
  35. plot(likes~vues, data =dd)
  36. abline(lm(likes~vues, data =dd))
  37. par(new=TRUE)
  38. plot(other~vues, data = dd, axes=FALSE, col = 2)
  39. abline(lm(likes~vues, data =dd), col =4, lwd =2)

huangapple
  • 本文由 发表于 2023年1月9日 00:36:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75049537.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定