AnnouncementsVideosCensorshipReligionFunnyConspiracyAwwwAnarchoCapitalismCryptoThemesIdeas4MatrixAskMatrixHelpTop Subs
1
Don't. It sucks. I could get better stuff out of runing Minstral locally. Apparently the weights for Grok are huge and you would never run it locally. While Minstral can fit on my 12GB graphics card. Heck you can access that for free at huggingface.
I still need to try Claude premium. Everyone on youtube is raving about it. But I've been hearing good things about Grok and it's horrible. Basically youtubers don't know how to review AIs and it's better to value word of mouth from actual programmers.
It does make me hopeful to make my own chat AI when there seems to be a lot of dis-correlation between model size and quality. Obviously trimming a model from a larger one is going to give you worse results but these hundred billion parameter models don't seem to have a huge leg up on 12GB 7 billion parameter models.
I think there is more to gain from coming up with better ways to train things inside a 12GB limit. With the possibility of networked multi-agent AI coming along if there is a real benefit to >12GB then the answer is multiple 12GB models talking to each other. Then you can still run mostly locally if you get your friends to run one agent and you run another.
Long story short don't bother with Grok unless you feel like donating $8 once to the project and don't expect anything out of it.
Comment preview
[-]x0x71(+1|0)
I hope Elon wins his lawsuit and we get the first open source AI Elon paid for.
GPT oscillates so frequently that it astounds me how people who haven't been using it for a year can deal with it. One day it can write flawless shader code, then next it has a 1 message context window. I just woke up, so i would say more - but sleepy. At first, when this technology was first popularized because it was giving real results, i feared that our keepers would hush the technology. Now i see that they really did slip the lid off of Pandora's DNN box - you can't stop the signal, mal.
[-]x0x70(+0|0)
I've noticed that. I've written some amazing code with it once but other times it struggles. I wonder if they tune the parameters based on the system load. Transformers are a cyclic process. It's kind of like stable diffusion where instead of pixels you have tokens that are each 512 float values. Basically it uses a cyclical process to translate an English token stream into an internal representation token stream. The advantage of the internal token stream is its vocabulary has no ambiguity in the meaning of a word (green as in color, green as in envy, green as in inexperienced, green as in golf landscape, green as in street name, green as in town). And each word has each of the most related other words embedded into it. So "the red fox" becomes "(the,fox,red) (red,fox,the) (fox,red,the)" Where in each word each sub word has decreasingly colors that word. So any one word in the sentence pretty much can give you a gist of the meaning of the sentence since it contains all words with different strength. On second passes it would become "((the,fox,red),(fox,red,the),(red,fox,the)),((red,fox,the),(fox,red,the),(the,fox,red)),((fox,red,the),(red,fox,the),(the,fox,red))".
Each () is a 512 float vector and each word position is the strength that word impacts the tokens final position in the 512 vector space. Each position is basically an exponential decay in impact (really softmax). But each pass helps the token gain more context and understanding of it's true meaning.
This is pretty much what humans do when they read. They use context to reunderstand the meaning of a word, and that reunderstanding of the meaning can inform on the meaning of other words. But the number of passes it uses sets how much a word's meaning can be informed by context. They could reduce that when they get a lot of requests.
nice write up, very concise! I've been meaning to really REALLY look into the workings of seq2seq boxes - so i thank you for the writeup. I've been joking with myself that when it gets the big dumb, that china must have woken up and are using their services. Probably not that far off, in all honesty. im curious if they have regional data restrictions such as in china it will self censor if asked about tank man and the such. If so, does that extend to code generation? Does Uganda have better access to a code box then the USA, for example. Not that they would be able to do anything with it, of course.