Don't. It sucks. I could get better stuff out of runing Minstral locally. Apparently the weights for Grok are huge and you would never run it locally. While Minstral can fit on my 12GB graphics card. Heck you can access that for free at huggingface.
I still need to try Claude premium. Everyone on youtube is raving about it. But I've been hearing good things about Grok and it's horrible. Basically youtubers don't know how to review AIs and it's better to value word of mouth from actual programmers.
It does make me hopeful to make my own chat AI when there seems to be a lot of dis-correlation between model size and quality. Obviously trimming a model from a larger one is going to give you worse results but these hundred billion parameter models don't seem to have a huge leg up on 12GB 7 billion parameter models.
I think there is more to gain from coming up with better ways to train things inside a 12GB limit. With the possibility of networked multi-agent AI coming along if there is a real benefit to >12GB then the answer is multiple 12GB models talking to each other. Then you can still run mostly locally if you get your friends to run one agent and you run another.
Long story short don't bother with Grok unless you feel like donating $8 once to the project and don't expect anything out of it.
Each () is a 512 float vector and each word position is the strength that word impacts the tokens final position in the 512 vector space. Each position is basically an exponential decay in impact (really softmax). But each pass helps the token gain more context and understanding of it's true meaning.
This is pretty much what humans do when they read. They use context to reunderstand the meaning of a word, and that reunderstanding of the meaning can inform on the meaning of other words. But the number of passes it uses sets how much a word's meaning can be informed by context. They could reduce that when they get a lot of requests.