{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Basics of Simple Linear Regression\n", "\n", "本课程前置需要装的包:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "hide-output" ], "vscode": { "languageId": "r" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "载入需要的程辑包:s20x\n", "\n" ] } ], "source": [ "require(s20x)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 分析数据过程\n", "\n", "### 读取数据\n", "\n", "读取数据表格,`header=TRUE` 表示第一行是表头,`sep=\",\"` 表示分隔符是逗号。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 6 × 15
GradePassExamDegreeGenderAttendAssignTestBCMCColourStage1Years.SinceRepeat
<chr><chr><int><chr><chr><chr><dbl><dbl><int><int><int><chr><chr><dbl><chr>
1CYes42BSc Male Yes17.2 9.1 51312Blue C2.5Yes
2BYes58BCom FemaleYes17.213.6121217YellowA2.0No
3AYes81OtherFemaleYes17.214.5141725Blue A3.0No
4AYes86OtherFemaleYes19.619.1151727YellowA0.0No
5DNo 35OtherMale No 8.0 8.2 4 115Blue C3.0No
6AYes72BCom FemaleYes18.412.7151720Blue A1.5No
\n" ], "text/latex": [ "A data.frame: 6 × 15\n", "\\begin{tabular}{r|lllllllllllllll}\n", " & Grade & Pass & Exam & Degree & Gender & Attend & Assign & Test & B & C & MC & Colour & Stage1 & Years.Since & Repeat\\\\\n", " & & & & & & & & & & & & & & & \\\\\n", "\\hline\n", "\t1 & C & Yes & 42 & BSc & Male & Yes & 17.2 & 9.1 & 5 & 13 & 12 & Blue & C & 2.5 & Yes\\\\\n", "\t2 & B & Yes & 58 & BCom & Female & Yes & 17.2 & 13.6 & 12 & 12 & 17 & Yellow & A & 2.0 & No \\\\\n", "\t3 & A & Yes & 81 & Other & Female & Yes & 17.2 & 14.5 & 14 & 17 & 25 & Blue & A & 3.0 & No \\\\\n", "\t4 & A & Yes & 86 & Other & Female & Yes & 19.6 & 19.1 & 15 & 17 & 27 & Yellow & A & 0.0 & No \\\\\n", "\t5 & D & No & 35 & Other & Male & No & 8.0 & 8.2 & 4 & 1 & 15 & Blue & C & 3.0 & No \\\\\n", "\t6 & A & Yes & 72 & BCom & Female & Yes & 18.4 & 12.7 & 15 & 17 & 20 & Blue & A & 1.5 & No \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 6 × 15\n", "\n", "| | Grade <chr> | Pass <chr> | Exam <int> | Degree <chr> | Gender <chr> | Attend <chr> | Assign <dbl> | Test <dbl> | B <int> | C <int> | MC <int> | Colour <chr> | Stage1 <chr> | Years.Since <dbl> | Repeat <chr> |\n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| 1 | C | Yes | 42 | BSc | Male | Yes | 17.2 | 9.1 | 5 | 13 | 12 | Blue | C | 2.5 | Yes |\n", "| 2 | B | Yes | 58 | BCom | Female | Yes | 17.2 | 13.6 | 12 | 12 | 17 | Yellow | A | 2.0 | No |\n", "| 3 | A | Yes | 81 | Other | Female | Yes | 17.2 | 14.5 | 14 | 17 | 25 | Blue | A | 3.0 | No |\n", "| 4 | A | Yes | 86 | Other | Female | Yes | 19.6 | 19.1 | 15 | 17 | 27 | Yellow | A | 0.0 | No |\n", "| 5 | D | No | 35 | Other | Male | No | 8.0 | 8.2 | 4 | 1 | 15 | Blue | C | 3.0 | No |\n", "| 6 | A | Yes | 72 | BCom | Female | Yes | 18.4 | 12.7 | 15 | 17 | 20 | Blue | A | 1.5 | No |\n", "\n" ], "text/plain": [ " Grade Pass Exam Degree Gender Attend Assign Test B C MC Colour Stage1\n", "1 C Yes 42 BSc Male Yes 17.2 9.1 5 13 12 Blue C \n", "2 B Yes 58 BCom Female Yes 17.2 13.6 12 12 17 Yellow A \n", "3 A Yes 81 Other Female Yes 17.2 14.5 14 17 25 Blue A \n", "4 A Yes 86 Other Female Yes 19.6 19.1 15 17 27 Yellow A \n", "5 D No 35 Other Male No 8.0 8.2 4 1 15 Blue C \n", "6 A Yes 72 BCom Female Yes 18.4 12.7 15 17 20 Blue A \n", " Years.Since Repeat\n", "1 2.5 Yes \n", "2 2.0 No \n", "3 3.0 No \n", "4 0.0 No \n", "5 3.0 No \n", "6 1.5 No " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 146
  2. 15
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 146\n", "\\item 15\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 146\n", "2. 15\n", "\n", "\n" ], "text/plain": [ "[1] 146 15" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 42
  2. 58
  3. 81
  4. 86
  5. 35
  6. 72
  7. 42
  8. 25
  9. 36
  10. 48
  11. 29
  12. 54
  13. 49
  14. 52
  15. 28
  16. 34
  17. 51
  18. 81
  19. 80
  20. 41
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 42\n", "\\item 58\n", "\\item 81\n", "\\item 86\n", "\\item 35\n", "\\item 72\n", "\\item 42\n", "\\item 25\n", "\\item 36\n", "\\item 48\n", "\\item 29\n", "\\item 54\n", "\\item 49\n", "\\item 52\n", "\\item 28\n", "\\item 34\n", "\\item 51\n", "\\item 81\n", "\\item 80\n", "\\item 41\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 42\n", "2. 58\n", "3. 81\n", "4. 86\n", "5. 35\n", "6. 72\n", "7. 42\n", "8. 25\n", "9. 36\n", "10. 48\n", "11. 29\n", "12. 54\n", "13. 49\n", "14. 52\n", "15. 28\n", "16. 34\n", "17. 51\n", "18. 81\n", "19. 80\n", "20. 41\n", "\n", "\n" ], "text/plain": [ " [1] 42 58 81 86 35 72 42 25 36 48 29 54 49 52 28 34 51 81 80 41" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "course.df <- read.table(\"../data/STATS20x.txt\", header = TRUE, sep = \"\\t\")\n", "head(course.df) # 看前面大约10行的内容\n", "dim(course.df) # 看有多少行、多少列\n", "course.df$Exam[1:20] # 看前20行的Exam列" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 绘图观测数据\n", "\n", "对数据进行绘图分析,着重分析 `Exam` 和 `Test` 两个变量之间的关系。\n", "\n", "首先应当粗略查看两者的关系,如线性、二次、曲线、正弦等" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "Plot with title \"Plot of Exam vs. Test (lowess+/-sd)\"" ] }, "metadata": { "image/png": { "height": 420, "width": 420 } }, "output_type": "display_data" } ], "source": [ "library(s20x)\n", "trendscatter(Exam ~ Test, data = course.df)\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 进行初步拟合\n", "\n", "可以看到整体大致呈线性关系,故我们采用线性回归模型。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0gAAANICAMAAADKOT/pAAAANlBMVEUAAAAAAP9NTU1oaGh8fHyMjIyampqnp6eysrK9vb3Hx8fQ0NDZ2dnh4eHp6enw8PD/AAD////xw1/KAAAACXBIWXMAABJ0AAASdAHeZh94AAAfJElEQVR4nO3d22LiuBJGYbWBkIRwev+XnXCMkwEs27+kKml9FzM9eyfYBFZjlwUJRwCzhdI7ANSAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJEAgQ0gBcGbCs1wfToFNAEqEBAgQEiBASIAAIQEChAQIEBIgQEiAACEBAoQECBASIEBIgAAhAQKEBAgQEiBASIAAIQEChAQIEBJwNunt4j/fneVbDG4C6DtXNCMlQgKOt6ccIQFzhD//nnwDab/F4CaAHkICBAgJUOAcCRBgagdIcB0JKI6QAAFCAgQICRAgJECAkIBYLwZ7hATEeXmpiZCAOC8XPxASEOX1cjxCAqIQEiBASIAC50iAAFM7QILrSEBahAQIEBIgQEiAACEBAoQE2+Z9lIL2Vl5tIMu3GNwEXJj74T7KW3m9iSzfYnATcGHux80pbyViE6m/xeAm4MHsD0AV3krcNtJ+i8FNwANCmoeQcEZI8xASLjhHmoWQcMHUbhZCwg3XkWYgJDhDSIAAIQEChAQIEBIgQEgtST67MoupHWQyXE0xiutIEMpwfd+oDPf8MP5bCMmnHCvObEp/zw8HXpGaQUjJ7vmBQ7uGEFKqe36YdtuE5BTnSMk3kfpbDG6iQUztBr5oyo/mNmQgpJZwHenFFxwn/C1zuA/rCAk4Tjz66828CQmYOI/oXzsiJEAw2CMkgJAAjbHnSH+XBBEScBw7IT/8b2kdIQFnERPyf//+nb/owQpVQgLi/Lt4vNKbkIAo4RrS4ycnIQFR/v27vyQ9QEhAFEICBAgJlcu0FvdVR4QE77K9O4SQULOM71d8mhEhwbs876D//1KGJ7sxAiHBkBwhDWZESPAuQ0gxn1lHSHDOxme6EBKcy/iZLi/m7IQE91JeR+p/LMOrYgkJeOrw/49lICRgpF9ThtdTDUICnvg9rSMkQICQAAXOkYCRHq1lYGoHjPJsSRDXkYB4E36NJSEBCoQECBAS0DPlsO6EkIC7iDcePUFISCFyHalmuals0erkjDKH9PW+Cier9VeqTcCCyHc2aN4AoXsbxYyOcoa0X4QfyySbgA2R77XTvCWvuTf2rUP3uT3/abfpwjrFJmBC5Lu/NW8SF73VfM6r0cTNT93jLmzvf96GLsUmYIK/kKYPGf7uRtpvuXxfePYfsk3ABHchzc6IVySk4OwcSdBR5nOkze78J86RKhc9tTvLs7HEco6/l72p3WKfZBMwIqoPTUhpP/wkeh+yfMvV1/p8HalbvXMdCTYG14rDuhNWNqAU0eB6jvnTuhs7IYW+NJuAKeVDkmWUN6TdW+jej8ePRehejhp4RWpD8ZCEHWVdItSdXms+3lkihAsL50gqWcff369D6y687Y/7NeNvFB1cK1+NTrJekD1/dzgPvrkgi2O5wbVuyHCTfYnQ9SfHEiGUI8+oyCvS6Z97XpFQTIKOSpwjrffXP+s3ARTC1A4Q4DoSWqKfMlzZWdmQeRNIyubqlGQZERJSsPHOhv9JlxEhIYWalixEIiTIFV9EVwAhQc5iSCkP604ICXL2Qko4ZbgiJOhZO0dKnhEh4S/NhygcLU3tDhnG8YSEPlUCtq4jZQibkNBn7aBMI8O9IiT02BsTzHSeMuS4V4SEnspCug7rCAmZ1RXSbVhHSMiNc6RZm0j9LQY3UR/JoMzl1G5wY0ztEEv2XHF3Henxxg5/v4jrSIhh6pgs68482lj6JUGP9yL1txjcRGVMTQmy7syjjWXPiJBqQUi9jRXoiJAqQUiF7zkhVYJzpIsSr0b97af9FoObqI2pBdfROyP6fX333wSUf8hw34ss32JwE/UxteA68ldfHgXx90IqlhEhoSDNEeDPrRTsiJBQjGZMwLCh7CZQHCElR0gtkIZ0IKQym0B5unOkQ+mOCAlJ6KZ2MUu7D+UH/4QEPd11pKhbOhTPiJCQgm5lg6kFG68QEuR0g7ThWyp57aiPkCCXL6RyS4L+IiTIZQvJTEaEhBQynSMZ6oiQmpJrXWt/QfbcW7r9wzhCake+J6UuJGOL2p8jpHbkGyWn35KdKcMVITUj3+rO5FsylxEhNaSekOxlREgNqSckiwipHTWdI5lDSO2Im9qpP49k7sb+fInFw7oTQmrJ8PNW/nkkMzf250sMThmuCAl9uvfaRdxKxJf9/hKzGRESftGMCSJvJeLLfn+J4Y4ICX22Q7KMkNBDSFMREvqsniPZnTJcERL6VFO7qFuJGO5dbsl8RoSEv1Rvf4i7ijQ8Jf/+AvsZERIKqmkBBCGhFD+ThAiEhFKiQvJwWHdCSCglIiQHU4YrQkIxg+dIbjIiJEwwfs32sy+6/eMxRx0REsYav2b71RfW8lATEkYavWZ7Ek+vRieEhHGyrJDzM2S4ISSMkyMkdxkREsbKEJLDjggJY+U5R/KGkPCb4pfoufnEbh1CQp/m17rOmGt7PKw7IST0FT4o8zetuyEk9BRekO02I0LCL2VDctwRIaGvqrcIZUVI6Ct2juT51eiEkNBXaHDtd8hwQ0j4rcSCbPcZERIMqKAjQgIUCAkQICSU5H/KcEVIKKeajAgJaUTN/urJiJCQAm+jSPQtBjeBhHhjX6JvMbgJpBOzYq+mw7oTQoLccEgVTRmuCAlygyFVlxEhIYWBc6QKOyIkJMDULtG3GNwEkqrnQ70jERJyqm/KcEVIyKfajAgJGdWbESEBEoQECBBSNVJ+kPB8NR/WnRBSJWx/tH3FU4YrQqpExILrcmuyq8+IkGoRseC63KeoNtARIVXCdEgtIKQ6mA2phVejE0KqhM1zpPqHDDeEVAljU7vLoL2ZjAipIoauI12TbagjQkICfPhJom8xuAmk0+J8kJAgdzs9aulhJCTIheu0rqWHkZCgd5kyNPUoElJLRFO7oZs58OEnib7F4CYaJHp6R90MH36S5FsMbqJBoqH065tp6dpRHyE1QzSUfnkz7SwJ+ouQmpEhpGYzIqSGpA+p4Y4IqSFZzpFaRUjtCBdGbqYyhNSOlCG1O2W4IqR2pDu0az4jQmpIumEDGRFSQ3JcR2oXITWDkFIipHYkOUfisO6CkHxQzJsTTO2YMtxkDenrfXV+DFbrr1SbqJNm3bY+JDK6yxjSfhF+LJNsolaagzL5oR0d/cgY0jp0n9vzn3abLqxTbKJSmvN7hg0pZQypC9v7n7ehS7GJShGSfRlD+nV0PvCZoBM3USmDIR0OPEa/8IrkgbVzpMOBh+iPvOdIm935T5wj/TY4Sss5tYvYmQOrv/8n5/h72ZvaLfZJNuFRtg8TiQkp6rP4eRvF/+W9jrQ+X0fqVu9cR/ohOuLSbCjia7LtryusbCgt2xQsZkPDX3PItr++2Akp9KXZhEmuQupNGVp6jCLkDGm/Po3q3hchLD8TbcIhTyEdhr+kVRlD2nXfrzT7jiVCf/g5RzoMf0mzMob0Flb773+87b6bemP8fZftg7JjNmTsN2j6kXVlw/76j++jPC7I9mQ7KYzZ0JOvOQx/SdNyLxHqQu8/5JtAIrzvaEjWQ7vt8fh+WSe0f32SREi2kNGgjCFtQ7feHlfdd0mbRdik2ASSoKNhOcffm+7nQtF7mk0AZeS9IPv5dn6X7Op9l2wTQAl2VjZk3gTiPDqs003t6pn/ERJeeDSt011HqumKFCHhuYdTBt3KhprWSBASnnrRkeIxqmrVHiFhHEJ6iJDw0NNrR4T0ECHhgVdLgjhHeoSQqiEcJb9cyqD7jAmmdqnV8bPNSvmkHFoSpPvUI64jpVXLTzcjU4dJpnYmD0Kqg6kTd1M7kwkh1UH13JW88YiQUn2LwU1URvPcFb1/j5BSfYvBTdRGcVoie98R50iJvsXgJrLRDKYyfTy4iKmdyYOQ0lJ9/H3ErZgaJZvamRwIKS3NQU6GQyXeTj4PISWlOe1Of/LOpwTNRUhJOQmJjGYjpKR8hERH8xFSWm7OkTAPIaWVcWqXT3MjuQhzQ1rfP6tOtUf/24Rzma4jTTRhymCsaiNmhrRO87vBeJTymDSs4zjzkZkhhfAh25Unm0Ayk4YMLS6kizA7JNmePNsEbCGkh2Yf2u1lu/JkE7CFkB6aO2xYLgc+xnsSHqTkpl874hzpkbkhbRg2WDHmMZizJCjq8W5uQj4zpHemdkaMGkrPWsoQ8Xg3OCGfGVLH1M6IMQdc85YERWypwaM/pnZ1yDcCiNhSi/OI2Yd2TO1MiH7uzl6gSkgPzR02vC+/VLvybBOIEPncFbzviJAemn1ox7DBhqjTEsn7JThHeoSQnnM1wo15FA6SexQxkmNql+hbDG5ikLMnQ86hdESPrv4SUiCkZ5wdnnDAVZYqpK/V3D0Z3ERezk6YB3f34O0eOTM3pHWt50jOnnYDu3ua1jm7R87MXv19s5Ht0tHEY+3safd6dw/DX4J5Zi8R+jwuw263DNLLSRYe63A+YbawJ1FenQAdhr+kkHpmEoIlQu/fr0bbsJTt0tHGg+1uancc2l1z98jcDs0gCGlzWrha3TmSt1ekZ3+7H4a/pBiDL5GTzQxp9X1otwuL41d1IdVxRmH6o4jr+BFfzQxpcwpoeRo2vMl26WjiR1vFo2w5o0p+xDezF62e/usthLVofx5soowaHmXbHVXxI75jZcMzNR3AG1XTj5iQnrE1Uso3Jsg4kLD1I55nZkhvt/f17aobf1uacY1/xk2dMmR+btv5Ec81d/zdfZ7//VHd1M6UscdA04d1NR1tZTUzpK8urHbfL0ehq25lgyFjz8rnfmYdD8B4s8+R3kNYh/Au2p2Hm2hevqc3IU01f9jwfVQn/0wuHsdfCMk+0SuS9jISj+MfI85cZl474hxpovnnSMvvc6QV50hJRc/SZi8JqmkindXsRauXo7rPjqldUnFzYsVShnom0lnNDOn+uyj2ta2188j4kqCqsbIBEJgRUvh9jKfYm7+bAJyYHdK1IEIqy/Qbj1pASDUgo+IIKbUMU7BD7IaYyCVDSGlluy7j8cNPakJIaWVbKRCxIVYtJERISaVfu/brM+teboh1dCkRUlKpn7z3KQMhFTYrpF8K75VNiZ+8P8M6QiqMkNJKel7SH3pzjlQWS4TSivpbRvHXkOwXjVkbkVvbnycIKa1kv0jv7yVYUbHWRuTW9ucpQkor0RHX/5cyiI7brB3+WdufpwgpqUQzgP+vCBJNEqwNJKztz3OElFSakB6srCOkwggpqWxTaUIqjJDSyjaV5hypLEJKK2ot6Zh527M3TIjGW9amZNb25ylCSi2mkeEJ+b9//05f8uqNR6ILLtau21jbnycIqbiIo5d/F3y6iV2EVFrMPOIaEh3ZRUilRYT07ybLDmEKQiqNkKpASMUNnyMRkn2EVFzEhFfVUcwEjM9QmYSQDBh6Yh40IcVck+EzVCYiJA/C6TUpy+KHbCsxakNIDuRbjpdtbWB1CMm281IGQrKPkCy7rggiJPsIybD7SoZ8C8Q5R5qIkJ6zM+XVDMqY2iVESM/Yer5oouY6UjKE9EzhIxgWqPpCSAO7UGZX+IVH3hDSwC4U2RUycoeQBnahxK7QkT+E9AxTXoxASM9ETe2yDdMYlBlHSM9FfGzJcbi1iM30buXxlMHWKB4PENIM8gUHz4Z1HGaaR0jTyZfAPf3MOsmGkBIhTZdtLSkh2UdI0xES7ghpBuU50strR5wjmUdIM+imdofDwIJr+a/phRYhzSK6jvQ6I0JygJAMGFwSxKGdeYTkAMMG+wipsJgFqoRkHyGVFbXQm5DsI6SiIt8wwTmSeYT0XPpF2YfIWxn32zHnYSn6JIT0TLbP09H8mtl8v0OWpeiPENIzEYdTs464xnxoXep9KbGlyhDSwC4kWgJ3+NPR3A2J5hEs/JuKkAZ2Ic1TqjdkIKQqENLALiR5SvWHdYRUBUJ6JtvZAudINagvJNVs1t3UTrKulandRFlD+npfnR/t1for1SaUj3KSKyqPPpZBcR1JtkCc60iTZAxpvwg/lkk2Yf64Y+JHEXPAZV7GkNah+9ye/7TbdGGdYhPWz4QnfoQqIwD7MobUhe39z9vQpdhEpc8nQrIvY0i/DqwHDvgnbqLS5xMh2VfZK5LhU4U5n4zPOZJ5ec+RNrvzn9KdI+We2kWb9wuPNBNyJJRz/L3sTe0W+ySbOGa9jhRv9i9q0bzTAsnkvY60Pl9H6lbvya4j6SiPlTL8wiMO7cqqb2WDeBcM7EoMZ7tbHzshhb40mxi1O3/+bZyz3a1PkZAGQzHwdFA9MzP9WmVCKoyQnpGcdOT77eScI5WV9YJs9NGbheeDYmqX8bcqy4aMJg6s/ckY0lfnKiR3TynR2u+jpsfW5Dy026/C8nxF1sWhXZs4Qpwo7znSZwifx0ZCynhYp8PMYqrMw4bdMqz2LYSUb8ogRUhTZZ/avYduU39IPjMipOnyj7+3i+GzYu+Po9eOOEearMR1pDcLIbU4kovbjo2VJd7YWSKUdRMJp7wpXo3yDaUJaaJWQ0q1mTRDhnwHXBzaTdRmSMnOqdOcHOUbATBsmIqQlBINGQjJPkJygJDsazMkb6cCnCOZ12pI+jFYymtHOad2ubZUmUZDkl+YSb0kKN9ImuH3JM2GpOV3KQM0CEmBjppHSIAAIc3FqxGOhDSX0/cdQY2QZonIKNsUTLQhfmPfJIQ0x3BH2a7LiDbE75CdiJDSyrZSQLShiJth8cMjhJRUtrVrog1F3AzL8R4ipInipgyE1ApCmiR2WEdIrSCkKeJn3pwjNaK+kHSzWcUtZfsMBKZ2ZdUWku5R1txSxg8T4TpSSdWFNPP7h29p3FIGjoMaUVlIujPhJ7c0ckkQZ+atIKRRtzR2ZR0htYKQxtzS6BWqhNSKykLKcI5U4lZgXnUh5Z3aDc6vmBU3oraQkl1HejRl0LSGGtQXUhKPh3Uct+GGkGI8HjIwScAdIU1HSLgjpOkICXeENOTFtSPOkXDTbEiRw7SXS4KYbeOm0ZBiExhaysBsGxethhS3GT60DpHaDIkxAcQI6RlejTACIT3GRxFjlDZDGjxHIiOM02pIr6d20R15+2BvDVM7Y0SjIWmeDN4+2FvD1M6Y0WxICtlWNphaQmFqZ8wgpL/GfvhjtjM6I09eUztjByH9NmZaR0i4I6RfRk3rCAl3hNQ3curNORJu6gsp42yWqR1uagtp+qM86Ros15FwUV1IE7+fJUGYpbKQpp4JkxHmIaQTOsJMhAQIVBYSs1mUUV1IY6d2L6YMloZTlvYFD9QW0sin3MuMjuOSTMjSvuCh+kIa4+VnbfX+WZqlfcFDbYcUsQsGdsXUvuAxQhrYBQO7Ympf8Fi7IQ1cO7L05LW0L3is1ZCGlwRZOi+xtC94qNGQIpYyZJyU8Qs0/WszpLglQZmu3fALNGvQZkimcNxWA0IqjUlCFZoLydwbjwipCo2FZC4jQqpEWyHZy+jIOVId2grJpIxTO2Z/yRCSAZmuI3E1KqF2QjJ5WBdJc/THMWRCrYRkcMoQTzOPYKqRUiMhec6IkDxoIyTfHRGSA22E5B3nSObVH9KTVyNX82SmdubVHtKTIYO7Z6ar7ltUeUjPTo44VoJW3SG97oizd8jUHdLQ7RMSRAip9K2gCvWG9PLaEedI0Ko1pIElQe6mdjCu0pCGlzIwT4ZSnSE5XxIEf+oMCcisvpB4NUIBtYXk+n1H8KuykMgIZdQV0oiOIuZtzkZyzna3MnWFFL+B4StAzi4SOdvd6rQa0vBmnC1bcLa71akmpFFThohVcs4W0jnb3fpUEtLIYR0hQayOkMYO6wgJYnWENHEDnCNBpdWQTE3tFINrY1O75mbx/kOaeA3WznUkVQKGnrvGqs7Be0gVLAmq8KCswrs0xHlI/jOqcUxQ4V0a5DukCjqq8VlX4V0a5DukGlT4rKvwLg0ipOIqPKGo8C4NcRuSmylDpl/HZ0qFd2mI05AcZXQcfkoZGlyrVHiXXvMZkpeMmjzIaZPPkNxo8bS7TYSUFCG1wl9Ifg7rjoTUDm8huZkyXHGO1AhnITnLqMlBcJt8heSuo2ODg+A2+QoJMMpPSB5fjdAMLyF5GzKgMU5CIiPY5iMkOoJxPkICjMsa0tf7Kpys1l+pNgEUkTGk/SL8WEZvgsM6OJAxpHXoPrfnP+02XVjHbYJpHVzIGFIXtvc/b0MXtQkygg8ZQ/q1VGbgU06v/6YjOGH8FQnwIe850mZ3/lPUOdKjV6O8C0BZbopoOcffy97UbrF/vYlHQ4a8b0ngDRAYIe91pPX5OlK3eh+6jvTw5Cjvm+R4Sx5GsLmy4UVHuZ7bvEkcY9gJKfS92i4hwZ6cIe3fQlhurjcSNf5++D8SEuzJuUSouyy0u9zI+JA4R4JdWcffH981fXTnZXaTQmJqB6uyXpA9/2vXLXbTQuI6EswqsERov1xODQkwKmNIi3C7CLtYEhLqkjGkj/B2/dMuLAkJVck5/l7f69kMnH4QEpzJekF2u7r9afdGSKiJnZUNMZtgjgajPIXElR2Y5SqkXJsHxnIUEqvfYBchAQKEBAg4ColzJNjlKiSmdrDKU0hcR4JZvkICjCIkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgSaDSkEaoVOoyGdKyIlyLQaUp7NoBlthhT+/BuYiZAAAUICBNoMiXMkiLUaElM7SDUaEteRoNVsSIASIQEChAQIEBIgQEiAACEBAoQECBASIEBIgAAhAQKEBAgQEiBASIAAIQEChAQIEBIgQEiAACEBAkZDApyZ8CzXh5Oaw13+i7tggfQuOPx5ONzlv7gLFhCSe9wFCwjJPe6CBYTkHnfBAkJyj7tgASG5x12wgJDc4y5YQEjucRcsICT3uAsWEJJ73AULCMk97oIFrYcE2ENIgAAhAQKEBAgQEiBASIAAIQEChAQIEBIgQEiAACEBAoQECBASIEBIgAAhAQKEBAg4C2nyZ5xb8XHb93UXuvW+6L5MdLsLbh+Lj8X9R697FHz9HLZuH7yr7W3fl+f7sSi7N5Pc7oLbx2J93u3ulI/wUfD1c9iGVeldmGXbXZ94X6Hbnv7rq/AOjXe/C14fi214259eVt+0j4KvkD7Ce+ldmOMjLK/PwnXYfP/z09/d+bkLXh+L1WX3T/dC+Sh4C+mj9C7MEdbH67NwFXZHl3+p/9wF749F0D4KvkJahc3b99lh6d2Yanu8PQt//8uRn7vg+7HYh6X2UfD1SK4u57fL0vsxnfeQjr2QPD8WH6ejunZDCuHz+y+TteODinpCcv1Y7LrT4Vy7IV3sXY6NL+oJ6cLnY7Hvzi+krYfk8/l3cd31rpaQfN6F5aV+5aPg8cfg88G7+DW12/mb2h0rCGm3WO7Of1A+Cr5+DF04XY/2+fy7uD7v3s9XMDbB49Dr/qLq9LHY3AckykfBV0jr033eX66j+eR+ZcP9Lnh9LHY/g8Z2Vzbsu/PI1ePf41e3I6GF39nx9S54fSzews8iQeGj4Cuk778Bu7BwOXC9uoW0P687LrsvE/XvgsPHIvRCEj4KzkICbCIkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJECAkAABQgIECAkQICQfQs+TL/H2y1zrQkg+DIe04KEsiZ++I09fjAb/T6TGT98RQrKLn74jP618LEJ3+YXim2UIy8312K/YnoGfvSP3UlbnbJbff/q4nDZ9EFJh/OwduZWyCcv9cb8M3y9EXdgej59hwaFdYfz0Hbm1sgr773/uw+r0P23+/J8ogp++I7dWeoPwdQir7bb/f6IIfvqOPAjp+N59/7vbEVJh/PQd+Qmp/79u1gvOkYrjp+/IzznS5v//ByEVxU/fkVsrn6Hbnibfq9PCoM/71G5Xdu/aRkiO3F90luczpNOZ0eflZOnrlFToiu5d2wjJkV8rG8Lb+RXovLLhu6Pj14KQCiIkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJECAkAABQgIECAkQICRAgJAAAUICBAgJECAkQICQAAFCAgQICRAgJECAkAABQgIE/gP70C8VR3cOPgAAAABJRU5ErkJggg==", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 420, "width": 420 } }, "output_type": "display_data" } ], "source": [ "plot(Exam ~ Test, data = course.df)\n", "# 绘制回归直线\n", "examtest.fit <- lm(Exam ~ Test, data = course.df)\n", "# lty = 2 表示虚线,col = \"red\" 表示红色\n", "abline(examtest.fit, lty = 2, col = \"red\")\n", "\n", "points(\n", " 0,\n", " predict(examtest.fit, newdata = data.frame(Test = 0)),\n", " col = \"blue\",\n", " pch = 19\n", ")\n", "points(10, predict(examtest.fit,\n", " newdata = data.frame(Test = 10)\n", "), col = \"blue\", pch = 19)\n", "points(20, predict(examtest.fit,\n", " newdata = data.frame(Test = 20)\n", "), col = \"blue\", pch = 19)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/plain": [ "\n", "Call:\n", "lm(formula = Exam ~ Test, data = course.df)\n", "\n", "Residuals:\n", " Min 1Q Median 3Q Max \n", "-39.980 -6.471 0.826 8.575 33.242 \n", "\n", "Coefficients:\n", " Estimate Std. Error t value Pr(>|t|) \n", "(Intercept) 9.0845 3.2204 2.821 0.00547 ** \n", "Test 3.7859 0.2647 14.301 < 2e-16 ***\n", "---\n", "Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n", "\n", "Residual standard error: 12.05 on 144 degrees of freedom\n", "Multiple R-squared: 0.5868,\tAdjusted R-squared: 0.5839 \n", "F-statistic: 204.5 on 1 and 144 DF, p-value: < 2.2e-16\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "summary(examtest.fit)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "notebookRunGroups": { "groupValue": "" } }, "source": [ "其中:\n", "\n", "- Call:表示回归方程,指明了自变量和因变量\n", "- Risiduals:残差,指明了残差的分布,如最大、最小、中值等\n", "- Coefficients:系数,此处即 $a_i$ 和 $b_i$ 的值\n", "- Residual standard error:残差标准差,即残差的标准差\n", "- Multiple R-squared:多元 $R^2$ 值\n", "- Adjusted R-squared:调整后的 $R^2$ 值\n", "- F-statistic:F 统计量,即 F 统计量。F 统计量的分子是回归平方和,分母是残差平方和。F 统计量的值越大,说明回归平方和越大,即回归模型的拟合效果越好。F 统计量的值越小,说明回归平方和越小,即回归模型的拟合效果越差。p-value 则相反。" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 分析数据是否可以接受\n", "\n", "### 残差观测\n", "\n", "针对指定行分析预测值和残差:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 2
course.df.Test.1.course.df.Exam.1.
<dbl><int>
9.142
\n" ], "text/latex": [ "A data.frame: 1 × 2\n", "\\begin{tabular}{ll}\n", " course.df.Test.1. & course.df.Exam.1.\\\\\n", " & \\\\\n", "\\hline\n", "\t 9.1 & 42\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 2\n", "\n", "| course.df.Test.1. <dbl> | course.df.Exam.1. <int> |\n", "|---|---|\n", "| 9.1 | 42 |\n", "\n" ], "text/plain": [ " course.df.Test.1. course.df.Exam.1.\n", "1 9.1 42 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "1: 43.5363712056028" ], "text/latex": [ "\\textbf{1:} 43.5363712056028" ], "text/markdown": [ "**1:** 43.5363712056028" ], "text/plain": [ " 1 \n", "43.53637 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "1: -1.53637120560281" ], "text/latex": [ "\\textbf{1:} -1.53637120560281" ], "text/markdown": [ "**1:** -1.53637120560281" ], "text/plain": [ " 1 \n", "-1.536371 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data.frame(course.df$Test[1], course.df$Exam[1]) # 原第一行\n", "# 按照 tidyverse 的风格,也可以使用 dplyr 包的 select 函数来选择列\n", "# dplyr::select(course.df[1, ], Exam, Test)\n", "fitted(examtest.fit)[1] # 拟合值\n", "resid(examtest.fit)[1] # 残差" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "检验上,一个成功的拟合模型的残差应当有:\n", "\n", "1. 残差均值接近于 0\n", "2. 残差满足正态分布\n", "3. 没有或排除了异常点\n", "\n", "#### 残差均值接近于 0\n", "\n", "分析残差,看是否符合均值等于0" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "Plot with title \"\"" ] }, "metadata": { "image/png": { "height": 420, "width": 420 } }, "output_type": "display_data" } ], "source": [ "# 其中 which = 1 表示残差直方图(histogram of residuals),\n", "# which = 2 表示残差QQ图(qqplot,即 normal quantile-quantile-plot),\n", "# which = 3 表示残差标准化图\n", "plot(examtest.fit, which = 1)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### 残差满足正态分布\n", "\n", "残差在分布上在符合正态同分布:iid – independence(并且这是根据学生在考试中应该相互独立的表现)。残差应该有大致恒定的散布。这其实是 Equality Of Variance (EOV,方差相等) 原则。\n", "\n", "检查残差是否满足正态分布:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "Plot with title \"\"" ] }, "metadata": { "image/png": { "height": 420, "width": 420 } }, "output_type": "display_data" } ], "source": [ "normcheck(examtest.fit)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/plain": [ "\n", "Call:\n", "lm(formula = Exam ~ Test, data = course2.df)\n", "\n", "Residuals:\n", " Min 1Q Median 3Q Max \n", "-90.251 -6.846 2.638 9.456 33.996 \n", "\n", "Coefficients:\n", " Estimate Std. Error t value Pr(>|t|) \n", "(Intercept) 15.2374 3.7172 4.099 6.88e-05 ***\n", "Test 3.2006 0.3023 10.588 < 2e-16 ***\n", "---\n", "Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n", "\n", "Residual standard error: 14.34 on 145 degrees of freedom\n", "Multiple R-squared: 0.436,\tAdjusted R-squared: 0.4322 \n", "F-statistic: 112.1 on 1 and 145 DF, p-value: < 2.2e-16\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 420, "width": 420 } }, "output_type": "display_data" } ], "source": [ "# 创造一个包含异常点的数据集并验证异常点对回归直线的影响\n", "n <- nrow(course.df)\n", "# 复制一数据集的最后一行\n", "course2.df <- course.df[c(1:n, n), ]\n", "# 修改新数据集的最后一行的 Test 和 Exam 列的值,故意创造一个差异极大的观测值\n", "course2.df[n + 1, c(\"Test\", \"Exam\")] <- c(25, 5)\n", "# 画出散点图\n", "plot(Exam ~ Test, data = course2.df)\n", "## 并标记我们创建的新的观测点\n", "points(25, 5, pch = 19, col = \"red\")\n", "\n", "# 如果有的观测值是异常值,那么回归直线就会受到影响\n", "examtest2.fit <- lm(Exam ~ Test, data = course2.df)\n", "summary(examtest2.fit)\n", "\n", "# 或者直接画图验证该点造成的影响\n", "abline(examtest.fit, lty = 2, lwd = 2, col = \"blue\")\n", "abline(examtest2.fit, lty = 2, lwd = 2, col = \"red\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "对其进行观测值差异分析:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "Plot with title \"Cook's Distance plot\"" ] }, "metadata": { "image/png": { "height": 420, "width": 420 } }, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "Plot with title \"Cook's Distance plot\"" ] }, "metadata": { "image/png": { "height": 420, "width": 420 } }, "output_type": "display_data" } ], "source": [ "# 画出异常值的影响\n", "cooks20x(examtest2.fit)\n", "# 对比原来的值影响\n", "cooks20x(examtest.fit)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### R 方观测\n", "\n", "R Squared 即 R 平方,是回归平方和与总平方和的比值,即 $R^2 = \\frac{SSR}{SST}$,其中 SSR 为回归平方和,SST 为总平方和。R 平方的值越大,说明回归平方和越大,即回归模型的拟合效果越好。R 平方的值越小,说明回归平方和越小,即回归模型的拟合效果越差。\n", "\n", "SSR 即回归平方和,是因变量的预测值与因变量的均值之差的平方和,即 $SSR = \\sum_{i=1}^n (y_i - \\bar{y})^2$,其中 $y_i$ 为第 $i$ 个观测值,$\\bar{y}$ 为因变量的均值。下面将简要介绍 SSR 的计算方法。" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/plain": [ "\n", "Call:\n", "lm(formula = Exam ~ 1, data = course.df)\n", "\n", "Residuals:\n", " Min 1Q Median 3Q Max \n", "-41.877 -12.877 -1.377 15.623 40.123 \n", "\n", "Coefficients:\n", " Estimate Std. Error t value Pr(>|t|) \n", "(Intercept) 52.877 1.546 34.21 <2e-16 ***\n", "---\n", "Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n", "\n", "Residual standard error: 18.68 on 145 degrees of freedom\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "\n", "Call:\n", "lm(formula = Exam ~ Test, data = course.df)\n", "\n", "Residuals:\n", " Min 1Q Median 3Q Max \n", "-39.980 -6.471 0.826 8.575 33.242 \n", "\n", "Coefficients:\n", " Estimate Std. Error t value Pr(>|t|) \n", "(Intercept) 9.0845 3.2204 2.821 0.00547 ** \n", "Test 3.7859 0.2647 14.301 < 2e-16 ***\n", "---\n", "Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n", "\n", "Residual standard error: 12.05 on 144 degrees of freedom\n", "Multiple R-squared: 0.5868,\tAdjusted R-squared: 0.5839 \n", "F-statistic: 204.5 on 1 and 144 DF, p-value: < 2.2e-16\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 消除一次项\n", "examnull.fit <- lm(Exam ~ 1, data = course.df)\n", "summary(examnull.fit)\n", "# 对比之前的 Summary\n", "summary(examtest.fit)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "此时我们可以得到 SS(Null)的值 18.68,以及 SS(Test)的值 12.05。\n", "\n", "R 方的值即 1 - SS(Null)/SS(Test)的值,即 0.5868。" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "置信区间:$[a_i - 2SE(a_i), a_i + 2SE(a_i)]$,即 $[a_i - 2\\sqrt{Var(a_i)}, a_i + 2\\sqrt{Var(a_i)}]$,其中 $Var(a_i)$ 为 $a_i$ 的方差。" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 每一个拟合值的 T 检验\n", "\n", "> 知道看什么,什么意思,怎么看" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/plain": [ "\n", "Call:\n", "lm(formula = Exam ~ Test, data = course.df)\n", "\n", "Residuals:\n", " Min 1Q Median 3Q Max \n", "-39.980 -6.471 0.826 8.575 33.242 \n", "\n", "Coefficients:\n", " Estimate Std. Error t value Pr(>|t|) \n", "(Intercept) 9.0845 3.2204 2.821 0.00547 ** \n", "Test 3.7859 0.2647 14.301 < 2e-16 ***\n", "---\n", "Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n", "\n", "Residual standard error: 12.05 on 144 degrees of freedom\n", "Multiple R-squared: 0.5868,\tAdjusted R-squared: 0.5839 \n", "F-statistic: 204.5 on 1 and 144 DF, p-value: < 2.2e-16\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "summary(examtest.fit)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "可以看出 Test 行的 Pr(P-value)的值小于 2.2x10^-16,远小于 0.05,故拒绝原假设,即拟合值的系(旁边的3颗*也表示可信度极高,即该斜率的线性拟合极好)\n", "\n", "- 零假设 $H_0$:Test 和 Exam 之间的线性关系系数为 0(没有线性关系),即 即 $a_i$ 的系数为 0\n", "- 备择假设 $H_1$:Test 和 Exam 之间的线性关系系数不为 0(有线性关系),即 即 $a_i$ 的系数不为 0\n", "\n", "我们对于斜率的置信程度,是由标准误差决定的,即 $SE(a_i)$,即 $SE(a_i) = \\sqrt{\\frac{SSE}{n-2}}$,其中 SSE 为残差平方和,即 $SSE = \\sum_{i=1}^n (y_i - \\hat{y_i})^2$,其中 $\\hat{y_i}$ 为第 $i$ 个观测值的预测值,即 $\\hat{y_i} = a_i + b_i x_i$,$x_i$ 为第 $i$ 个观测值的自变量值。此处的 $se(a)$ 为 0.2647。于是我们有:\n", "\n", "$$\n", "\\frac{3.7859 - 0}{0.2647} = 14.34\n", "$$\n", "\n", "此结果表示偏离此结果的标准差,这个数字越大,代表我们对于斜率的置信程度越高。" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 利用分析结果做预测\n", "\n", "### 拟合值的置信区间" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A matrix: 2 × 2 of type dbl
2.5 %97.5 %
(Intercept)2.71902015.449907
Test3.262659 4.309189
\n" ], "text/latex": [ "A matrix: 2 × 2 of type dbl\n", "\\begin{tabular}{r|ll}\n", " & 2.5 \\% & 97.5 \\%\\\\\n", "\\hline\n", "\t(Intercept) & 2.719020 & 15.449907\\\\\n", "\tTest & 3.262659 & 4.309189\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 2 × 2 of type dbl\n", "\n", "| | 2.5 % | 97.5 % |\n", "|---|---|---|\n", "| (Intercept) | 2.719020 | 15.449907 |\n", "| Test | 3.262659 | 4.309189 |\n", "\n" ], "text/plain": [ " 2.5 % 97.5 % \n", "(Intercept) 2.719020 15.449907\n", "Test 3.262659 4.309189" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\n", "
A matrix: 2 × 2 of type dbl
0.5 %99.5 %
(Intercept)0.677817117.491110
Test3.0948635 4.476984
\n" ], "text/latex": [ "A matrix: 2 × 2 of type dbl\n", "\\begin{tabular}{r|ll}\n", " & 0.5 \\% & 99.5 \\%\\\\\n", "\\hline\n", "\t(Intercept) & 0.6778171 & 17.491110\\\\\n", "\tTest & 3.0948635 & 4.476984\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 2 × 2 of type dbl\n", "\n", "| | 0.5 % | 99.5 % |\n", "|---|---|---|\n", "| (Intercept) | 0.6778171 | 17.491110 |\n", "| Test | 3.0948635 | 4.476984 |\n", "\n" ], "text/plain": [ " 0.5 % 99.5 % \n", "(Intercept) 0.6778171 17.491110\n", "Test 3.0948635 4.476984" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "confint(examtest.fit)\n", "# Intercept 即截距,Test 即斜率\n", "# 也可以自己修改置信水平\n", "confint(examtest.fit, level = 0.99)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 预测\n", "\n", "1. 准确预测值\n", "2. 预测的均值范围\n", "3. 预测每一个个体的取值范围\n", "\n", "区间估计和点估计的区别:\n", "\n", "- 区间估计:给出一个区间,表示参数的可能取值范围\n", "- 点估计:给出一个点,表示参数的可能取值" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
A matrix: 3 × 3 of type dbl
fitlwrupr
1 9.084463 2.7190215.44991
246.94370344.8091249.07828
384.80294279.9702189.63568
\n" ], "text/latex": [ "A matrix: 3 × 3 of type dbl\n", "\\begin{tabular}{r|lll}\n", " & fit & lwr & upr\\\\\n", "\\hline\n", "\t1 & 9.084463 & 2.71902 & 15.44991\\\\\n", "\t2 & 46.943703 & 44.80912 & 49.07828\\\\\n", "\t3 & 84.802942 & 79.97021 & 89.63568\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 3 × 3 of type dbl\n", "\n", "| | fit | lwr | upr |\n", "|---|---|---|---|\n", "| 1 | 9.084463 | 2.71902 | 15.44991 |\n", "| 2 | 46.943703 | 44.80912 | 49.07828 |\n", "| 3 | 84.802942 | 79.97021 | 89.63568 |\n", "\n" ], "text/plain": [ " fit lwr upr \n", "1 9.084463 2.71902 15.44991\n", "2 46.943703 44.80912 49.07828\n", "3 84.802942 79.97021 89.63568" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
A matrix: 3 × 3 of type dbl
fitlwrupr
1 9.084463-15.56475 33.73368
246.943703 23.03510 70.85231
384.802942 60.50438109.10151
\n" ], "text/latex": [ "A matrix: 3 × 3 of type dbl\n", "\\begin{tabular}{r|lll}\n", " & fit & lwr & upr\\\\\n", "\\hline\n", "\t1 & 9.084463 & -15.56475 & 33.73368\\\\\n", "\t2 & 46.943703 & 23.03510 & 70.85231\\\\\n", "\t3 & 84.802942 & 60.50438 & 109.10151\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 3 × 3 of type dbl\n", "\n", "| | fit | lwr | upr |\n", "|---|---|---|---|\n", "| 1 | 9.084463 | -15.56475 | 33.73368 |\n", "| 2 | 46.943703 | 23.03510 | 70.85231 |\n", "| 3 | 84.802942 | 60.50438 | 109.10151 |\n", "\n" ], "text/plain": [ " fit lwr upr \n", "1 9.084463 -15.56475 33.73368\n", "2 46.943703 23.03510 70.85231\n", "3 84.802942 60.50438 109.10151" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 区间估计\n", "preds.df <- data.frame(Test = seq(0, 20, by = 10))\n", "predict(examtest.fit, newdata = preds.df, interval = \"confidence\")\n", "# 点估计\n", "predict(examtest.fit, newdata = preds.df, interval = \"prediction\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "其中:\n", "\n", "- 区间估计表格的 [2,2:3] 表示所有半期考试10分,期末考试的分数的均值的范围\n", "- 区间估计表格的 [2,2:3] 表示所有半期考试10分个体的分数的范围,落在这个范围即为正常值" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 总结\n", "\n", "遇到此类问题,通用思路(适用于分析x和y两个未知数的某种关系):\n", "\n", "- 绘制数据散点图并简要查看自变量与因变量之间是哪种关系(如果有关系),最好是能够通过工具分析(也可能会有一份研究意图的声明可以被指导)。提出适当的研究方式。在上边的例子中,我们就决定采用了线性模型:\n", "\n", " $$\n", " y = β_0 + β_1x_i + ε_i, ε_i ∼ N(0, σ^2) (where β_1 > 0)\n", " $$\n", "\n", "- 使用 `lm` 函数进行模型拟合。\n", "- 检查我们提出的假设进行合适方式的验证。\n", " - Independence OK? (how were the data collected?)\n", " - EOV Okay? Using `plot(examtest.fit, which = 1)`.\n", " - Normality Okay? Using `normcheck`.\n", " \n", " If these are okay then go to next step.\n", "- 尝试适时删除任何不重要的解释变量(后面会讲)。如果能删除,请检查新的研究方式。\n", "- 确保个别要点不会产生过分的不适当的影响,并尝试删除/纠正它们。Using `cooks20x`.\n", "- 做出结论/预测,讨论极限,并回答相关的研究问题。\n", "\n", "注意:在上述步骤中,在对当前步骤满意之前,切记不要匆忙进行下一步。" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.2.2" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }