I can't instinctively process how many R's are in STRAWBERRY. I use my vision to get it though almost immediately.
I feel simple transformers simply don't get access to those modalities that a human would use. I can't use my "talking" centers to count letters in words either.
You just need to pay attention to understand you don't use your language skills to count words.
The author creates art using their own custom library that uses CSS-like syntax to render HTML, SVG, and more recently shaders. The point isn't that this is the best way to do it. It's simply a trick that the author used to do something with their own bespoke library that they were trying to do.
To get a single knowledge-cutoff they spent 16.5h wall-clock hours on a cluster of 128 NVIDIA GH200 GPUs (or 2100 GPU-hours), plus some minor amount of time for finetuning. The prerelease_notes.md in the repo is a great description on how one would achieve that
While I know there's going to be a lot of complications in this, given a quick search it seems like these GPUs are ~$2/hr, so $4000-4500 if you don't just have access to a cluster. I don't know how important the cluster is here, whether you need some minimal number of those for the training (and it would take more than 128x longer or not be possible on a single machine) or if a cluster of 128 GPUs is a bunch less efficient but faster. A 4B model feels like it'd be fine on one to two of those GPUs?
Also of course this is for one training run, if you need to experiment you'd need to do that more.
There should be a countervailing law that the more bullshit is produced the more skeptical the populace becomes. The amount of conspiracy theorists has remained constant even with the advent of the Internet this hasn't changed.
Coffee, the brew, is not a significant source of Quercetin. The grounds, the part you throw away, may be. But capers are like ~200mg per 100g, and outpaces all the other common sources of it, so if you were really big on Quercetin, you'd be looking at anything but coffee.
I feel simple transformers simply don't get access to those modalities that a human would use. I can't use my "talking" centers to count letters in words either.
You just need to pay attention to understand you don't use your language skills to count words.
reply