Created: January 23, 2022
Modified: January 23, 2022
Modified: January 23, 2022
mixed effects
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.- Otter notes: Can I explain what a mixed effects model is from a graphical model standpoint?
- On the inference side, I think it's just hierarchical regression where you integrate out some variables that are called the random effects, and then you put a point mass posterior on other variables that are called the fixed effects? And I don't understand why you make this distinction, or why you wouldn't just integrate out everything. So I want to think it through from the other perspective.
- If I were channeling a statistician or an economist, I would say that a fixed effects model assumes that you have a bunch of inputs, a bunch of features. And the effect of each of them is fixed. There's some effect on your income of being black and it's just the same everywhere. So this is an interesting distinction already: it's about being part of the hierarchy versus not part of hierarchy. If you think that there's an effect of being black and it's the same everywhere, then that makes it not part of the hierarchy. Whereas, if you think that the effect on your income of being black is 'random', meaning that it will experience unmodeled local variations (in different parts of the world, or depending on other indicators, etc.), you would call that a random effect. And from a model point of view I guess what that means is that you put a hierarchy on it.
- Technically, you say that the effect of being black in Tennessee is a random variable drawn from some distribution that you know and the effect of Black in California is also a random variable drawn, I guess, by assumption from the same distribution, but they are going to be different values, they're going to be random effects. We let them be different, but there's also going to be a model based constraint that they are related. Ultimately, they're drawn from the same distribution and we have to estimate what that distribution is. So that's going to pool our information. Okay, so that's my understanding. I should check that it's correct. If I go with this understanding for a moment, then why would you ever build a fixed effects model? When would you ever not recognize that recognize some amount of variation. And I guess the answer is that you only have certain capacity To validate your model and to fit your model that's a very abstract answer. Okay, let me think this through from a different perspective. Let's say I have for every person I have two features. I have their race and I have what state they live in. And I want to predict their income. Obviously, this model doesn't have that many parameters, you could just put a number in each of the, you know, not more than a few hundred buckets. But let's pretend for the moment that it's a hard problem and we need to actually think about models. Now of course, you could just learn a fixed effect. For each of those two things. We could really treat them as independent predictors. I guess technically the fixed effect would be on there would be one there would be 58 For the different states and then there would be however many effects we have for different races. So say that there are only two races white and black then there would be 52 effects to estimate But now imagine the other thing we could do is we could take a what we'd call a feature cross. And some terminology or interaction And we could look at all combinations These two Yeah, we could construct the product of these two things as features So now we have 100 different Features Now if we do this naively we'll need more data because now we have 100 parameters instead of 52 parameters. How does this relate to thinking of race as having an effect that's random depending on the state. It seems like that's a feature cross. And now you have a different effect in every state but it's just saying that we're going to use the machinery of modeling and of random variables. And I guess that we do commit to not fitting point values was that even true?