class: center, middle, inverse, title-slide # Linear Mixed Effects Models in R ## An introduction for linguistic students ### Chenzi Xu ### University of Oxford ### 2021/12/12 (updated: 2022-02-18)
chenzi.xu@rbind.io
--- class: right, middle ## Find me at... ![:scale 20%](me.png) [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> @ChenziAmy](http://twitter.com/ChenziAmy) [<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> @chenchenzi](https://github.com/chenchenzi) [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> chenzixu.rbind.io](https://chenzixu.rbind.io) --- ## Did you do the **pre-workshop setup**? <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> [https://chenzixu.rbind.io/talk/lme/](https://chenzixu.rbind.io/talk/lme/) ## We assume: <svg viewBox="0 0 581 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"></path></svg> You have basic knowledge about R and R studio. <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M332.8 320h38.4c6.4 0 12.8-6.4 12.8-12.8V172.8c0-6.4-6.4-12.8-12.8-12.8h-38.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h38.4c6.4 0 12.8-6.4 12.8-12.8V76.8c0-6.4-6.4-12.8-12.8-12.8h-38.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-288 0h38.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-38.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h38.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-38.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zM496 384H64V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-32c0-8.84-7.16-16-16-16z"></path></svg> You have basic inferential statistical knowledge. ###You want to know more about linear mixed models ??? high school arithmetic and algebra. Elementary concepts of probability theory are assumed; the sum and the product rules, and the fact that the probabilities of all possible events must sum to 1, are the extent of the knowledge assumed. --- # Outline 1. Inferential Stats Recap 1. Linear Regression 1. Linear Mixed Effects Models 1. Shrinkage in linear mixed models 1. Report Writing 1. Why LME model? --- class: inverse, center, middle # Inferential Stats Recap -- Central limit theorem (~~magic~~) Hypothesis testing: the (humble) t-test --- # Central limit theorem (~~magic~~) .left-column[ ###Definition and Significance .footnote[The picture is from Wikipedia.] ] .right-column[The **sampling distribution of means** is **normal**, provided two conditions are met: (a) the sample size should be **large** enough (b) `\(\mu\)` and `\(\sigma\)` are defined for the probability density or mass function that generated the data. We can derive an estimate of the distribution of (hypothetical) sample means from a single sample of n independent data points CLT is NOT the distribution that generated the data, but the sampling distribution of the sample means under **repeated sampling**. ![:scale 65%](ct.png) ] ??? SE estimates how much error there is, across repeated sampling, in our estimate of the population mean using sample mean. Our estimation will be more precise for larger sample size or lower variability. --- # Hypothesis testing: the (humble) t-test .left-column[ ### NHST procedure ] .right-column[1. Define the null hypothesis `\(\mu_0\)` 1. Given data of size `\(n\)`, estimate `\(\bar{y}\)`, standard deviation `\(s\)`, estimate the standard error `\(SE = s/{\sqrt{n}}\)`. 1. The distance between the sample mean and the hypothesized mean can be written in SE units as follows.$$ t \times SE = \bar{y} -\mu$$ Thus the sample mean is **t** standard errors away from the hypothesized mean: `$$t= \dfrac{\bar{y}-\mu_0}{s/\sqrt{n}}$$` 1. Reject null hypothesis if the observed t-value is “large”. 1. Fail to reject the null hypothesis, if the observed t-value is “small”. ] ??? One of the simplest statistical tests that one can do with continuous data: the t-test. the logic of these more complex tests will essentially be more of the same uncertainty in the estimates of the values as the scale on which distances are measured. --- # Hypothesis testing: the (humble) t-test .pull-left[Under repeated sampling, the observed t-value is seen as an instance from a random variable T. $$ T \sim t(n-1)$$ The PDF of T, the t-distribution, is defined in terms of of n with n-1 degrees of freedom. ![:scale 70%](t-dis.png) **Critical t-value**: the t-value such that `\(P(T>|t|)=0.05\)` ] .pull-right[ **Type I error**: Reject the null when it’s true **Type II error**: Fail to reject the null when it’s false ![:scale 70%](h0.png) **p-value**: the probability of obtaining the observed t-value, or some value more extreme than that, under the assumption that the null is true. The p-value is always interpreted with reference to the pre-defined Type I error. Conventionally, we reject the null if `\(p < 0.05\)` ] ??? Two possible decisions: **reject the null** or **fail to reject the null**. Two possible worlds: the null is **true** or **false**. The first picture is from Vasishth's book; The second picture is from https://vwo.com/blog/errors-in-ab-testing/ The t-test is just a special, limited example of a general linear model. ANOVA is also a special, limited version of the linear model.(categorical predictors. more than two levels.) general linear model: A mathematical model comparing how one or more independent variables affect a continuous dependent variable. df measures number of pieces of **independent** information that go into estimateing some parameters. p-values are NOT the probability the null hypothesis is true, and does not measure strength of an effect. --- class: inverse, center, middle # Linear Regression -- Formal forms A (too) simple case study Understanding output Assumptions check --- # Formal forms A simple example of linear model is `$$Y_i = \beta_0 + \beta_1X_i + e_i$$` Where `\(Y_i\)` is the measured value of the dependent variable for observation i; `\(X_i\)` is the explanatory variable or predictor; `\(Y_i\)` is modelled in terms of an intercept `\(\beta_0\)` plus the effect of predictor `\(X_i\)` weighted by coefficient `\(\beta_1\)` and error term `\(e_i\)`. .left-col[ <img src="lmer_files/figure-html/unnamed-chunk-1-1.png" width="90%" /> ] .right-col[ Simulated data illustrating linear model `\(Y_i = 3+2X_i+e_i\)`. The most common method for fitting a regression line is **least-squares**. A linear model is a model that is **linear in the coefficients**; each coefficient is only allowed to be set to the power of one. Non-linear relationships can be analysed using a linear model.] ??? LME an extension of the General Linear Model that underlies simpler approaches such as ANOVA, t-test, and classical regression. Many people assume that 'linear models' can only capture linear relationships, i.e., relationships that can be described by a straight line (or a plane). This is false. A linear model is a weighted sum of various terms, and each term has a single predictor (or a constant) that is multiplied by a coefficient. --- # A simple case study .pull-left[#### Research question Assume you were interested in whether the voice pitch of males and females differs, and if so, by how much. #### Data ```r pitch = c(233,204,242,130,112,142) sex = c(rep("female",3), rep("male",3)) my.df = data.frame(sex,pitch) my.df ``` ``` # sex pitch # 1 female 233 # 2 female 204 # 3 female 242 # 4 male 130 # 5 male 112 # 6 male 142 ``` ] .pull-right[#### Formula Relationship of interest `\(pitch \sim sex\)` "Pitch predicted by / as a function of sex" `$$pitch \sim sex + \epsilon$$` `\(\epsilon\)` stands for all other things that affect pitch, random or controllable in our experiment. **Structural or systematic** + **random or probabilistic** ```r *model = lm(pitch ~ sex, my.df) contrasts(as.factor(my.df$sex)) ``` ``` ## male ## female 0 ## male 1 ``` **Dummy coding** for categorical factors .footnote[This example is taken from Bodo Winter's tutorial.] ] ??? Dummy coding for categorical factors contrasts(as.factor(my.df$sex)) --- # Understanding output .pull-left[ ```r summary(model) ``` ``` # # Call: # lm(formula = pitch ~ sex, data = my.df) # # Residuals: # 1 2 3 4 5 6 # 6.667 -22.333 15.667 2.000 -16.000 14.000 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 226.33 10.18 22.224 2.43e-05 *** # sexmale -98.33 14.40 -6.827 0.00241 ** # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # # Residual standard error: 17.64 on 4 degrees of freedom *# Multiple R-squared: 0.921, Adjusted R-squared: 0.9012 *# F-statistic: 46.61 on 1 and 4 DF, p-value: 0.002407 ``` ] .pull-right[ `\(R^2\)`: "variance accounted for or explained by" Multiple R-Squared: how much variance in our data is accounted for by our model Adjusted R-Squared: how much variance is “explained” and how many fixed effects are used **p-value**: The probability of our data, given that our null hypothesis "sex has no effect on pitch" is true. #### Reporting: We constructed a linear model of pitch as a function of sex. This model was significant `\((F(1,4)=46.61, p<0.01)\)`. ] --- # Understanding output .pull-left[ ```r summary(model) ``` ``` # # Call: # lm(formula = pitch ~ sex, data = my.df) # # Residuals: # 1 2 3 4 5 6 # 6.667 -22.333 15.667 2.000 -16.000 14.000 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 226.33 10.18 22.224 2.43e-05 *** *# sexmale -98.33 14.40 -6.827 0.00241 ** # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # # Residual standard error: 17.64 on 4 degrees of freedom # Multiple R-squared: 0.921, Adjusted R-squared: 0.9012 # F-statistic: 46.61 on 1 and 4 DF, p-value: 0.002407 ``` ] .pull-right[ ```r mean(my.df[my.df$sex=="female",]$pitch) ``` ``` # [1] 226.3333 ``` ```r mean(my.df[my.df$sex=="male",]$pitch) ``` ``` # [1] 128 ``` ![:scale 80%](lm.png) ] --- # Assumptions check ### (1) Linearity The fitted values (line) are the predicted means, and residuals are the **vertical deviations** from this line. When the residual plot shows a **curvy** or non-linear patterns, it indicates a violation of the linearity assumption. .pull-left[ ```r plot(fitted(model),residuals(model)) ``` ![:scale 75%](curve.png) ] .pull-right[ What to do if your residual plot indicates nonlinearity? - Reconsider the fixed effects - Perform non-linear transformation of your variables - Turn to different class of models such as logistic models ] ??? conditions that have to be satisfied in order for the linear model to be meaningful. --- # Assumptions check .pull-left[### (2) Absence of collinearity When two predictors are **correlated** with each other, they are **collinear**. - Preempt the problem in the design stage - Use dimension reduction techniques such as Principal Component Analysis that transform correlated variables into a smaller set of variables ] .pull-right[### (3) Homoskedasticity The variance of your data should be approximately equal across the range of the predicted values. - Transform your data ![:scale 65%](hetero.png) ] --- # Assumptions check .pull-left[### (4) Absence of influential data points ```r dfbeta(model) ``` ``` # (Intercept) sexmale # 1 3.333333e+00 -3.333333 # 2 -1.116667e+01 11.166667 # 3 7.833333e+00 -7.833333 # 4 -1.359740e-16 1.000000 # 5 1.087792e-15 -8.000000 # 6 -9.518180e-16 7.000000 ``` ] .pull-right[ **DFbeta** values: the coefficients to be adjusted if a particular data point is excluded (“leave-one-out diagnostics”). - Eyeball the DFbeta values: any sign changed? different by half of the absolute value? - Run the analysis **with** influential data and then again **without** them. Compare and discuss the results. `dfbeta()` for linear models doesn’t work for mixed models. You can check out the package **influence.ME** (Nieuwenhuis, te Grotenhuis, & Pelzer, 2012), or program a for loop that does the leave-one-out diagnostics by hand. ] --- # Assumptions check ### (5) Normality of residuals - a histogram or a Q-Q plot of the residuals ```r hist(residuals(model)) qqnorm(residuals(model)) ``` ### (6) Independence - Preempt the problem in the design stage - Resolve non-independencies at the analysis stage. `\(\rightarrow\)` **Mixed models** --- class: inverse, center, middle # Linear Mixed Effects Model -- Data Model Assumption Understanding the output Model selection --- # Data #### Research Question We're interested in the relationship between pitch and politeness (Winter & Grawunder, 2012). - Politeness: formal/polite and informal register (categorical factor) - multiple measures per subject (**inter-dependent!**) ```r data = read.csv("http://www.bodowinter.com/tutorial/politeness_data.csv") head(data) ``` ``` # subject gender scenario attitude frequency # 1 F1 F 1 pol 213.3 # 2 F1 F 1 inf 204.5 # 3 F1 F 2 pol 285.1 # 4 F1 F 2 inf 259.7 # 5 F1 F 3 pol 203.9 # 6 F1 F 3 inf 286.9 ``` ```r which(!complete.cases(data)) ``` ``` # [1] 39 ``` ??? a few missing values provide no problems for our mixed model analyses --- # Model Linear regression `$$pitch \sim politeness + \epsilon$$` Multiple regression (fixed effects only) `$$pitch \sim politeness + sex + \epsilon$$` Add a **random** effect for subject: assume a different "baseline" pitch value for each subject. We can assume different random intercepts for each subject. `$$pitch \sim politeness + sex + (1|subject) + \epsilon$$` ```r boxplot(frequency~subject, data) ``` **Crossed random effects** for subjects and for items. Different scenarios for formal and informal speech (by-item variation) `$$pitch \sim politeness + sex + (1|subject) + (1|item) + \epsilon$$` --- # Model in R ```r library(lme4) data = data %>% mutate(attitude=as.factor(attitude), gender=as.factor(gender), subject=as.factor(subject)) *politeness.model0 = lmer(frequency ~ attitude + (1|subject) + (1|scenario), data=data) *politeness.model = lmer(frequency ~ attitude + gender + (1|subject) + (1|scenario), data=data) summary(politeness.model) ``` ``` # Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest'] *# Formula: frequency ~ attitude + gender + (1 | subject) + (1 | scenario) # Data: data # # REML criterion at convergence: 775.5 # # Scaled residuals: # Min 1Q Median 3Q Max # -2.2591 -0.6236 -0.0772 0.5388 3.4795 # # Random effects: # Groups Name Variance Std.Dev. # scenario (Intercept) 219.5 14.81 # subject (Intercept) 615.6 24.81 # Residual 645.9 25.41 # Number of obs: 83, groups: scenario, 7; subject, 6 # # Fixed effects: # Estimate Std. Error df t value Pr(>|t|) # (Intercept) 256.846 16.116 5.432 15.938 9.06e-06 *** # attitudepol -19.721 5.584 70.054 -3.532 0.000735 *** # genderM -108.516 21.013 4.007 -5.164 0.006647 ** # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # # Correlation of Fixed Effects: # (Intr) atttdp # attitudepol -0.173 # genderM -0.652 0.004 ``` --- # Understanding the output .left-column[#### politeness.model0 (top) politeness.model (bottom) Scenario (item) has much less variability than subject. Residual: The “random” deviations from the predicted values that are not due to subjects and items, `\(\epsilon\)`. The variation that’s associated with the random effect “subject” dropped considerably `\(\rightarrow\)` gender was confounded with the variation that is due to subject. ] .right-column[#### Random effects ```r Random effects: Groups Name Variance Std.Dev. scenario (Intercept) 219 14.80 * subject (Intercept) 4015 63.36 * Residual 646 25.42 Number of obs: 83, groups: scenario, 7; subject, 6 ``` ```r Random effects: Groups Name Variance Std.Dev. scenario (Intercept) 219.5 14.81 * subject (Intercept) 615.6 24.81 * Residual 645.9 25.41 Number of obs: 83, groups: scenario, 7; subject, 6 ``` ] ??? shifted a considerable amount of the variance that was previously in the random effects component(differences between male and female individuals) to the fixed effects component --- # Understanding the output .left-column[#### politeness.model0 (top) politeness.model (bottom) `attitudepol` is the slope for the categorical effect of politeness. Pitch is lower in polite speech than in informal speech, by about 20 Hz. `lmer()` took whatever comes first in the alphabet to be the reference level. The second `Intercept` represents the female for the informal condition. ] .right-column[#### Fixed effects ```r Fixed effects: Estimate Std. Error t value * (Intercept) 202.588 26.754 7.572 attitudepol -19.695 5.585 -3.527 ``` ```r Fixed effects: Estimate Std. Error t value * (Intercept) 256.846 16.116 15.938 attitudepol -19.721 5.584 -3.532 genderM -108.516 21.013 -5.164 ``` ] --- # Understanding the output #### Coding schemes A categorical variable of `\(n\)` levels is entered into a regression analysis as a sequence of `\(n-1\)` variables. Generally or by default in R, we use dummy coding for them. ```r contrasts(data$attitude) ``` ``` # pol # inf 0 # pol 1 ``` ```r # change the reference level contrasts(data$attitude) = contr.treatment(2, base = 2) contrasts(data$attitude) ``` ``` # 1 # inf 1 # pol 0 ``` There are also other coding schemes such as **effects coding** including **contrast coding**, **sum coding** and **Helmert coding**. Different coding system may help answer different research questions. It also greatly changes the way you interpret your output. --- # Understanding the output #### Statistical significance Unfortunately, p-values for mixed models aren’t as straightforward as they are for the linear model. There are multiple approaches, and there’s a discussion surrounding these, with sometimes wildly differing opinions about which approach is the best. There is a package `lmerTest` that provides `\(p\)` values in type I, II or III anova and summery tables for `lmer` model fits via Satterthwaite's degrees of freedom method or Kenward-Roger method. ```r library(lmerTest) politeness.model = lmer(frequency ~ attitude + gender + (1|subject) + (1|scenario), data=data) anova(politeness.model) ``` ``` # Type III Analysis of Variance Table with Satterthwaite's method # Sum Sq Mean Sq NumDF DenDF F value Pr(>F) *# attitude 8056.2 8056.2 1 70.054 12.473 0.0007351 *** # gender 17225.2 17225.2 1 4.007 26.669 0.0066467 ** # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ??? The F-value is actually the quotient of the ratio: effect variance/error variance; between-group variance/within-group variance. All of this means that the larger an F-value, the better for you to find a “significant” effect, a consistent pattern that is unlikely due to chance. Now, basically, what we’re doing with an ANOVA is the following: we look at how **unexpected** an F-value that we obtained in our study is. A very large F-value means that the between-group variance (the effect variance) exceeds the within-group variance (the error variance) by a substantial amount. The p-value then just gives a number to how likely a particular F-value is going to occur, with lower p-values indicating that the probability of obtaining that particular F-value is pretty low. --- # Understanding the output #### Statistical significance Likelihood Ratio Test as a means to attain `\(p\)` values. Likelihood is the probability of seeing the data you collected given your model. ```r *politeness.null = lmer(frequency ~ gender + (1|subject) + (1|scenario), data=data, REML=FALSE) *politeness.full = lmer(frequency ~ attitude + gender + (1|subject) + (1|scenario), data=data, REML=FALSE) anova(politeness.null, politeness.full) ``` ``` # Data: data # Models: # politeness.null: frequency ~ gender + (1 | subject) + (1 | scenario) # politeness.full: frequency ~ attitude + gender + (1 | subject) + (1 | scenario) # npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) # politeness.null 5 816.72 828.81 -403.36 806.72 *# politeness.full 6 807.10 821.61 -397.55 795.10 11.618 1 0.0006532 *** # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` “… politeness affected pitch ( `\(\chi^2(1)=11.62, p=0.00065\)` ), lowering it by about `\(19.7 Hz ± 5.6\)` (standard errors)...” --- # Model selection .pull-left[ ```r coef(politeness.model) ``` ``` # $scenario # (Intercept) attitude1 genderM # 1 223.6187 19.72111 -108.5163 # 2 243.7081 19.72111 -108.5163 # 3 248.5330 19.72111 -108.5163 # 4 257.7546 19.72111 -108.5163 # 5 235.1891 19.72111 -108.5163 # 6 224.9513 19.72111 -108.5163 # 7 226.1215 19.72111 -108.5163 # # $subject # (Intercept) attitude1 genderM # F1 223.2175 19.72111 -108.5163 # F2 247.5443 19.72111 -108.5163 # F3 240.6137 19.72111 -108.5163 # M3 265.5072 19.72111 -108.5163 # M4 242.5036 19.72111 -108.5163 # M7 203.3646 19.72111 -108.5163 # # attr(,"class") # [1] "coef.mer" ``` ] .pull-right[ #### Random intercept model We accounted for baseline-differences in pitch, but the effect of politeness is the same for all subjects and items. What if the **effect of politeness** might be different for different subjects? ] --- # Model selection #### Random slope model ```r politeness.model1 = lmer(frequency~attitude + gender + (1+attitude|subject) + (1+attitude|scenario), data = data) coef(politeness.model1) ``` .pull-left[Random intercept model ```r $subject (Intercept) attitude2 genderM F1 242.9386 -19.72111 -108.5163 F2 267.2654 -19.72111 -108.5163 F3 260.3348 -19.72111 -108.5163 M3 285.2283 -19.72111 -108.5163 M4 262.2248 -19.72111 -108.5163 M7 223.0858 -19.72111 -108.5163 ``` ] .pull-right[Random slope model (correlated) ```r $subject (Intercept) attitude2 genderM F1 243.2804 -20.49940 -111.1058 F2 267.1173 -19.30447 -111.1058 F3 260.2849 -19.64697 -111.1058 M3 287.1024 -18.30263 -111.1058 M4 264.6698 -19.42716 -111.1058 M7 226.3911 -21.34605 -111.1058 ``` ] Despite **individual variation**, there is also **consistency** in how politeness affects the voice: for all of our speakers, the voice tends to go down when speaking politely, but for some people it goes down slightly more so than for others. ??? The effect of politeness (`attitudepol`) is different for each subject and item. It’s always negative and many of the values are quite similar to each other. --- # Model selection #### Interaction terms An interaction, **`attitude:sex`**, occurs when one variable changes how another variable affects the response variable. Alternative notation: **`attitude*sex`** `= attitude + sex + attitude:sex` #### Correlation between intercepts and slopes Using **`||`** in `(1 + attitude || subject)` for example, you are forcing the correlation between the random slopes and the intercepts to be zero. #### Overfitting overfitting occurs when your model has too small a sample size and too many predictor variables. You may get a warning / error that your model wouldn’t converge. --- # Model selection ### How to choose your model? **Depend on your research question and data structure** .left-col[ - Exploratory VS confirmatory context - Design variables (theoretical interests) VS control variables - Data-driven (e.g. improve the model fit) VS Design-driven - Top-down VS Bottom-up ] .right-col[ - Some prefer to proceed incrementally, taking time to examine data. - One should always be cautious of creating a model that is too complex. - Some researchers have shown via simulations that mixed models without random slopes have a relatively high Type I error rate (Schielzeth & Forstmeier, 2009; Barr, Levy, Scheepers, & Tilly, 2013) - Barr et al. (2013) recommend that you should “keep it maximal” with respect to your random effects structure, at least for controlled experiments. ] ??? We were not interested in gender differences, but they are well worth controlling for. This is why we had random slopes for the effect of attitude (by subjects and item) but not gender. If you have predictors that represent control variables, over which you do not intend to perform statistical tests, it is unlikely that random slopes are needed. --- # Model selection ### Limitations #### "Singular fit" warning or "not converging" error - We need to simplify our model and remove the random slopes or insignificant terms ( `step()` does it automatically for you). But what if we believe there might be some merit to having subject-specific slopes? #### Bayes to the rescue **Bayesian mixed effects method** are powerful alternatives to more commonly used Frequentist approaches. ??? There are two major approaches to fit the model parameters. The principle of maximum likelihood: pick parameter values that maximize the probability of your data Y. Bayesian inference: put a probability distribution on the model parameters and update it on the basis of what parameters best explain the data. --- class: inverse, middle, center # Shrinkage in linear mixed models -- Data Complete pooling No pooling Partial pooling --- # Data #### Research question The effects of **sleep deprivation** on psychomotor performance / reaction time (Belenky et al., 2003) .pull-left[#### Multilevel data with continuous predictor
] .pull-right[#### Plot the data ```r sleep2 <- sleepstudy %>% filter(Days >= 2) %>% mutate(days_deprived = Days - 2) ggplot(sleep2, aes(x = days_deprived, y = Reaction)) + geom_point() + scale_x_continuous(breaks = 0:7) + facet_wrap(~Subject) + labs(y = "Reaction Time", x = "Days deprived of sleep (0 = baseline)") ``` .footnote[Documentation for the dataset `?sleepstudy`] ] --- # Complete pooling .left-column[ **One-size-fit-all**: this approach estimates a single intercept and slope for the entire dataset, ignoring the fact that different subjects might vary in their intercepts or slopes. It assumes that all observations are independent. $$ RT = \beta_0 + \beta_1 Day + e$$ ] .right-column[ ```r *cp_model <- lm(Reaction ~ days_deprived, sleep2) summary(cp_model) ``` ```r ggplot(sleep2, aes(x = days_deprived, y = Reaction)) + geom_abline(intercept = coef(cp_model)[1], slope = coef(cp_model)[2], color = '#f4cae2', size = 1.5) + geom_point() + scale_x_continuous(breaks = 0:7) + facet_wrap(~Subject, nrow = 3) + labs(y = "Reaction Time", x = "Days deprived of sleep (0 = baseline)") ``` ] --- # Complete pooling .left-column[ **One-size-fit-all**: this approach estimates a single intercept and slope for the entire dataset, ignoring the fact that different subjects might vary in their intercepts or slopes. It assumes that all observations are independent. $$ RT = \beta_0 + \beta_1 Day + e$$ ] .right-column[ ![](cp-plot.png) ] ??? ![](lmer_files/figure-html/cp-plot-1.png) --- # No pooling .left-column[ This approach fits separate lines for each participant. The estimate for each participant will be completely uninformed by the estimates for the other participants. There is no overall population intercept and slope that is being estimated. (1) Run separate regressions for each participant (2) Run **multiple regression** ] .right-column[ ```r sleep2 %>% pull(Subject) %>% is.factor() *np_model <- lm(Reaction ~ days_deprived + Subject + days_deprived:Subject, * data = sleep2) summary(np_model) all_intercepts <- c(coef(np_model)["(Intercept)"], coef(np_model)[3:19] + coef(np_model)["(Intercept)"]) all_slopes <- c(coef(np_model)["days_deprived"], coef(np_model)[20:36] + coef(np_model)["days_deprived"]) ids <- sleep2 %>% pull(Subject) %>% levels() %>% factor() np_coef <- tibble(Subject = ids, intercept = all_intercepts, slope = all_slopes) ``` ] --- # No pooling .left-column[ **multiple regression** This approach fits separate lines for each participant. The estimate for each participant will be completely uninformed by the estimates for the other participants. There is no overall population intercept and slope that is being estimated. ] .right-column[ ```r ggplot(sleep2, aes(x = days_deprived, y = Reaction)) + geom_abline(data = np_coef, mapping = aes(intercept = intercept, slope = slope), color = '#f4cae2', size = 1.5) + geom_point() + theme_bw() + scale_x_continuous(breaks = 0:7) + facet_wrap(~Subject, nrow=3) + labs(y = "Reaction Time", x = "Days deprived of sleep (0 = baseline)") ``` ] --- # No pooling .left-column[ **multiple regression** This approach fits separate lines for each participant. The estimate for each participant will be completely uninformed by the estimates for the other participants. There is no overall population intercept and slope that is being estimated. ] .right-column[ ![](np-plot.png) ] --- # No pooling .left-column[ #### multiple regression This approach fits separate lines for each participant. The estimate for each participant will be completely uninformed by the estimates for the other participants. There is no overall population intercept and slope that is being estimated. ] .right-column[ **A one-sample t-test:** ```r np_coef %>% pull(slope) %>% t.test() ``` ``` # # One Sample t-test # # data: . *# t = 6.1971, df = 17, p-value = 9.749e-06 *# alternative hypothesis: true mean is not equal to 0 # 95 percent confidence interval: # 7.542244 15.328613 # sample estimates: # mean of x # 11.43543 ``` The mean slope of 11.435 is significantly different from zero, `\(t(17) = 6.197, p <.001\)`. ] --- # Partial pooling .left-column[It would be desirable to improve our estimates for individual participants by taking advantage of what we know about the other participants. It would improve generalization to the population. We are interested in the 18 subjects as examples drawn from a larger population of potential subjects. ] .right-column[LME model estimates values for the population, and pulls the estimates for individual subjects toward those values, a statistical phenomenon known as `shrinkage`. #### Level 1 (for subject `\(i\)`): `$$RT_i = \beta_{0i} + \beta_{1i} Day_i +e_i$$` #### Level 2 (for subject `\(i\)`): `$$\beta_{0i} = \gamma_0 + Subject_{0i}$$` `$$\beta_{1i} = \gamma_1 + Subject_{1i}$$` ] ??? Partial pooling happens when you treat a factor as a random instead of fixed in your analysis. --- # Partial pooling .left-column[It would be desirable to improve our estimates for individual participants by taking advantage of what we know about the other participants. It would improve generalization to the population. We are interested in the 18 subjects as examples drawn from a larger population of potential subjects. ] .right-column[ ```r *pp_mod <- lmer(Reaction ~ days_deprived + (days_deprived | Subject), sleep2) summary(pp_mod) newdata <- crossing( Subject = sleep2 %>% pull(Subject) %>% levels() %>% factor(), days_deprived = 0:7) newdata2 <- newdata %>% mutate(Reaction = predict(pp_mod, newdata)) ``` ```r ggplot(sleep2, aes(x = days_deprived, y = Reaction)) + geom_line(data = newdata2, color = '#f4cae2', size = 1.5) + geom_point() + theme_bw() + scale_x_continuous(breaks = 0:7) + facet_wrap(~Subject, nrow = 3) + labs(y = "Reaction Time", x = "Days deprived of sleep (0 = baseline)") ``` ] --- # Partial pooling .left-column[ It would be desirable to improve our estimates for individual participants by taking advantage of what we know about the other participants. It would improve generalization to the population. We are interested in the 18 subjects as examples drawn from a larger population of potential subjects. ] .right-column[ ![](pp-plot.png) ] --- # Partial pooling .pull-left[**When partial-pooling goes wrong...** 1. Vanishing confidence intervals (the model is too complex for the data to support). 1. Unintended Shrinkage.<sup>1</sup> - In 2016, George et al. showed that partial pooling can lead to artificially low predicted mortality rates for smaller hospitals. 1. The number of random effect levels has a strong impact on the uncertainty of the estimate variance.<sup>2</sup> ] .pull-right[The estimates are pulled or "**shrunk**" closer to the overall population average. <img src="animated.gif" style="display: block; margin: auto 0 auto auto;" /> The animated GIF is from Isabella R. Ghement. Twitter: @IsabellaGhement ] .footnote[<sup>1</sup> https://towardsdatascience.com/when-mixed-effects-hierarchical-models-fail-pooling-and-uncertainty-77e667823ae8 <sup>2</sup> https://www.muscardinus.be/2018/09/number-random-effect-levels/] --- # Report writing ### Goal: Reproducible 1. Describe and **justify** your model (summarize the model fit). - Specify all fixed and random effects, interactions, intercepts, and slopes - Specify **sample size**: number of observations, and number of levels for each grouping factor 1. Mention that you checked assumptions, and that the **assumptions are satisfied**. 1. Fixed effects: Report **significance** (test statistic, degrees of freedom, `\(p\)`-value) and **actual estimates** for coefficients (effect size: magnitude and direction, standard errors, and confidence intervals). Report Post-hoc tests (`emmeans()`). 1. Random effects: Report the variances, standard deviations, and any correlations for each component 1. **Cite** the relevant packages. - (recommended) Basic descriptive statistics (e.g. mean and standard deviation of each group) - (recommended) Plot showing distribution of the data ??? Post-hoc tests test for differences between (combinations of) levels of a factor x, after fitting the model. --- class: inverse # Why LME model? ### Multilevel / Hierarchical / Nested data - Within-subject factor? Repeated measures? - Psedoreplications (multiple measurements within the same condition)? - Multiple stimulus items? ### Science (causal) model before statistical model - Data types: numerical continuous data or categorical data? - Operational hypothesis? ??? In the sleepstudy data, you had the within-subject factor of Day (which is more a numeric variable, actually, than a factor; but it has multiple values varying within each participant). generalized linear model (GLM) η determines the predicted mean μ of Y (link function) The predicted mean is just the linear predictor (linear regression) take a step back (McElreath, 2022) --- # References Winter, B. (2013). Linear models and linear mixed effects models in R with linguistic applications. arXiv:1308.5499. [http://arxiv.org/pdf/1308.5499.pdf] Shravan Vasishth, Daniel Schad, Audrey Bürki, Reinhold Kliegl. (on going) Linear Mixed Models in Linguistics and Psychology: A Comprehensive Introduction. [https://vasishth.github.io/Freq_CogSci/] Barr, Dale J. (2021). Learning statistical models through simulation in R: An interactive textbook. Version 1.0.0. Retrieved from https://psyteachr.github.io/stat-models-v1. McElreath, R. (2020). Statistical Rethinking: A Bayesian course with examples in R and STAN. CRC Press. #### Further Readings: Casella, George, and Roger L Berger. 2002. Statistical Inference. Second Edition. Cengage Learning. Miller, I., and M. Miller. 2004. John e. Freund’s Mathematical Statistics with Applications. Prentice Hall. Barr, Dale J, Roger Levy, Christoph Scheepers, and Harry J Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78. Bates, Douglas M., Reinhold Kliegl, Shravan Vasishth, and Harald Baayen. 2015. “Parsimonious Mixed Models.” --- class: center, middle # Thanks! ![](feedback.png) Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](https://yihui.org/knitr/), and [R Markdown](https://rmarkdown.rstudio.com).