Hacker Newsnew | past | comments | ask | show | jobs | submit | seunosewa's commentslogin

Probably high.

That's not the same thing as adding rules by yourself based on your experiences with Claude.

Would something like automated array bounds checking prevent the memory issues?

You might be interested in AddressSanitizer.

Then you switch to another name.


yes, when you discover it. but the reason why I said just wondering was I was trying to think of unexpected ways it could effect things, that was the top one I could think of (and not really sure if it is a possibility)


They analyse human perception too, in the form of videos.


Without any of the spatial and physical object perception you train from right after birth, see toddlers playing, or the underlying wired infrastructure we are born with to understand the physical world (there was an HN submission about that not long ago). Edit, found it: https://news.ucsc.edu/2025/11/sharf-preconfigured-brain/

They are not a physical model like humans. Ours is based on deep interactions with the space and the objects (reason why touching things is important for babies), plus mentioned preexisting wiring for this purpose.


Multimodal models have perception.


If s multimodal model were considered human, it would be diagnosed with multiple severe disabilities in its sensory systems.


You should probably ask the AI to write a script to do the task. Any procedure that needs to be perfect should be done by writing deterministic code.


The task is renaming scans of checks (with the barcode/bank account info obscured by a pen) --- the checks are of varying sizes, placement is not exact, the placement and formatting of the information on the check is essentially random, and many of them are (poorly) handwritten).

The LLM is working well enough for my needs (and I'm using a locked-down computer which installing/running development environments/scripts on is awkward), and it's a marked improvement over the previous technique of opening 50 files at a time, noting the Invoice ID, closing the file, typing the Invoice ID as a name, then quitting Adobe Acrobat and re-launching it for the next 50 (if that was not done, eventually Acrobat would reach a state where it would close a file and despite the name having been typed, not save it), then using a .bat file made using concatenation in an Excel column.

It would be nice if it were perfect, but each check has to be manually entered, and the filename updated to match the entry by hand.


If you use cursor you can just attach the documentation. Same thing, different method.


Which agent do you use it with?


I use K2 non thinking in OpenCode for coding typically, and I still haven't found a satisfactory chat interface yet so I use K2 Thinking in the default synthetic.new (my AI subscription) chat UI, which is pretty barebones. I'm gonna start trying K2T in OpenCode as well, but I'm actually not a huge fan of thinking models as coding agents — I prefer faster feedback.


I'm also a synthetic.new user, as a backup (and larger contexts) for my Cerebras Coder subscription (zai-glm-4.6). I've been using the free Chatbox client [1] for like ~6 months and it works really well as a daily driver. I've tested the Romanian football player question with 3 different models (K2 Instruct, Deepseek Terminus, GLM 4.6) just now and they all went straight to my Brave MCP tool to query and replied all correctly the same answer.

The issue with OP and GPT-5.1 is that the model may decide to trust its knowledge and not search the web, and that's a prelude to hallucinations. Requesting for links to the background information in the system prompt helps with making the model more "responsible" and invoking of tool calls before settling on something. You can also start your prompt with "search for what Romanian player..."

Here's my chatbox system prompt

        You are a helpful assistant be concise and to the point, you are writing for smart pragmatic people, stop and ask if you need more info. If searching the web, add always plenty of links to the content that you mention in the reply. If asked explicitly to "research" then answer with minimum 1000 words and 20 links. Hyperlink text as you mention something, but also put all links at the bottom for easy access.
1. https://chatboxai.app


I checked out chatbox and it looks close to what I've been looking for. Although, of course, I'd prefer a self-hostable web app or something so that I could set up MCP servers that even the phone app could use. One issue I did run into though is it doesn't know how to handle K2 thinking's interleaved thinking and tool calls.


I don't use it much, but I tried it out with okara.ai and loved their interface. No other connection to the company


The Chinese are doing it because they don't have access to enough of the latest GPUs to run their own models. Americans aren't doing this because they need to recoup the cost of their massive GPU investments.


I must be missing something important here. How do the Chinese train these models if they don't have access to the GPUs to train them?


I believe they mean distribution (inference). The Chinese model is currently B.Y.O.GPU. The American model is GPUaaS


Why is inference less attainable when it technically requires less GPU processing to run? Kimi has a chat app on their page using K2 so they must have figured out inference to some extent.


That entirely depends on the number of users.

Inference is usually less gpu-compute heavy, but much more gpu-vram heavy pound-for-pound compared to training. General rule of thumb is that you need 20x more vram for training a model with X params, than for inference for that same size model. So assuming batch size b, then serving more than 20*b users would tilt vram use on the side of inference.

This isn't really accurate; it's an extremely rough rule of thumb and ignores a lot of stuff. But it's important to point out that inference is quickly adding to costs for all AI companies. Deepseek claims that they used $5.6mil to train Deepseek R1; that's about 10-20 trillion tokens at their current pricing- or 1 million users sending just 100 requests at full context size.


> it technically requires less GPU processing to run

Not when you have to scale. There's a reason why every LLM SaaS aggressively rate limits and even then still experiences regular outages.


tl;dr the person you originally responded too is wrong.


That's super wrong. A lot of why people flipped out about Deepseek V3 is because of how cheap and how fast their GPUaaS model is.

There is so much misinformation both on HN, and in this very thread about LLMs and GPUs and cloud and it's exhausting trying to call it out all the time - especially when it's happening from folks who are considered "respected" in the field.


> How do the Chinese train these models if they don't have access to the GPUs to train them?

they may be taking some western models: llama, chatgpt-oss, gemma, mistral, etc, and do postraining, which required way less resources.


If they were doing that I expect someone would have found evidence of it. Everything I've seen so far has lead me to believe that these Chinese AI labs are training their own models from scratch.


not sure what kind of evidence it could be..


Just one example: if you know the training data used for a model you can prompt it in a way that can expose whether or not that training data was used.

The NYT used tricks like this as part of their lawsuit against OpenAI: page 30 onwards of https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...


You either don't know which training data was used for say chatgpt oss, or training data can be included into some open dataset like pile or similar. I think this test is very unreliable, and even if someone come to such conclusion, not clear what is the value of such conclusion, and if that someone can be trusted.


My intuition tells me it is vanishingly unlikely that any of the major AI labs - including the Chinese ones - have fine-tuned someone else's model and claimed that they trained it from scratch and got away with it.

Maybe I'm wrong about that, but I've never heard any of the AI training experts (and they're a talkative bunch) raise that as a suspicion.

There have been allegations of distillation - where models are partially trained on output from other models, eg using OpenAI models to generate training data for DeepSeek. That's not the same as starting with open model weights and training on those - until recently (gpt-oss) OpenAI didn't release their model weights.

I don't think OpenAI ever released evidence that DeepSeek had distilled from their models, that story seemed to fizzle out. It got a mention in a congressional investigation though: https://cyberscoop.com/deepseek-house-ccp-committee-report-n...

> An unnamed OpenAI executive is quoted in a letter to the committee, claiming that an internal review found that “DeepSeek employees circumvented guardrails in OpenAI’s models to extract reasoning outputs, which can be used in a technique known as ‘distillation’ to accelerate the development of advanced model reasoning capabilities at a lower cost.”


Additionally, it would be interesting to know if there is dynamics in opposite directions, US corps (oai, xai) can now incorporate Chinese models into their core models as one/several expert towers.


> That's not the same as starting with open model weights and training on those - until recently (gpt-oss) OpenAI didn't release their model weights.

there was obviously llama.


What 1T parameter base model have you seen from any of those labs?


its moe, each expert tower can be branched from some smaller model.


That's not how MoE works, you need to train the FFN directly or else the FFN gate would have no clue how to activate the expert.


This is false. You can buy whole H100 clusters in China and Alibaba, Bytedance, Tencent etc have enough cards for training and inference.

Shenzhen 2025 https://imgur.com/a/r6tBkN3


And Europeans don't it because quite frankly, we're not really doing anything particularly impressive with AI sadly.


At ECAI conference last week there was a panel discussion and someone had a great quote, "in Europe we are in the golden age of AI regulation, while the US and China are in the actual golden age of AI".


To misquote the French president, "Who could have predicted?".

https://fr.wikipedia.org/wiki/Qui_aurait_pu_pr%C3%A9dire


He didn't coin that expression did he? I'm 99% sure I've heard people say that before 2022, but now you made me unsure.


"Who could've predicted?" as a sarcastic response to someone's stupid actions leading to entirely predictable consequences is probably as old as sarcasm itself.


People said it before, but he said it without sarcasm about things that many people could in fact predict.


We could add cookie warnings to AI, everybody loves those


Europe should act and make its own, literal, Moonshot:

https://ifiwaspolitical.substack.com/p/euroai-europes-path-t...


>Moonshot 1: GPT-4 Parity (2027) >Objective: 100B parameter model matching GPT-4 benchmarks, proving European technical viability

This feels like a joke... Parity with a 2024 model in 2027? The Chinese didn't wait, they just did it.

The timeline for #1 LLM is also so far into the future that it is entirely plausible that by 2031, nobody uses transformer based LLMs as we know them today anymore. For reference: The attention paper is only 8 years old. Some wild new architecture could come out in that time that makes catching up meaningless.


Note the EU-Moonshot project is based on own silicon / compute sovereignty.

GPT4 parity on a own silicon trained indigenous model is just an early goal.

Indeed, the ultimate goal is EU LLM supremacy - which means under democratic control.


Europe gave us cookie popups on every single website.


Only ones with invasive spyware cookies. Essential site function cookies do not require a consent banner.


actually Mistral is pretty good and catching up as the other leading models stagnate - the coding and OCR is particularly good


> we're not really doing anything particularly impressive with AI sadly.

Well, that's true... but also nobody else is. Making something popular isn't particularly impressive.


Honestly, do we need to? If the Chinese release SOTA open source models, why should we invest a ton just to have another one? We can just use theirs, that's the beauty of open source.


For the vast majority, they're not "open source" they're "open weights". They don't release the training data or training code / configs.

It's kind of like releasing a 3d scene rendered to a JPG vs actually providing someone with the assets.

You can still use it, and it's possible to fine-tune it, but it's not really the same. There's tremendous soft power in deciding LLM alignment and material emphasis. As these things become more incorporated into education, for instance, the ability to frame "we don't talk about ba sing se" issues are going to be tremendously powerful.


[flagged]


What a load of tripe.


I'm tired of this ol' propaganda trope.

* We're leading the world in fusion research. https://www.pppl.gov/news/2025/wendelstein-7-x-sets-new-perf...

* Our satellites are giving us by far the best understanding of our universe, capturing one third of the visible sky in incredible detail - just check out this mission update video if you want your mind blown: https://www.youtube.com/watch?v=rXCBFlIpvfQ

* Not only that, the Copernicus mission is the world's leading source for open data geoobservation: https://dataspace.copernicus.eu/

* We've given the world mRNA vaccines to solve the Covid crisis and GLP-1 antagonists to solve the obesity crisis.

* CERN and is figuring out questions about the fundamental nature of the universe, with the LHC being by far the largest particle accelerator in the world, an engineering precision feat that couldn't have been accomplished anywhere else.

Pioneering, innovation and drive forward isn't just about the latest tech fad. It's about fundamental research on how our universe works. Everyone else is downstream of us.


Don't worry, we in the US are hot on your heels in the own-goal game ( https://www.space.com/space-exploration/nasa-is-sinking-its-... ).

All you have to do is wait by the Trump River and wait for our body to come floating by.


I’m confused. Who is this “We”? Do you realize how behind in many respects most of Europe is? How it’s been parceled up and destroyed by the EU? Science projects led by a few countries doesn’t cut it.

It’s not propaganda at all. The standards of living there are shit. But enjoy the particle collider, I guess?


We is Europe. Like everywhere else, we are behind in some aspects and ahead in others.

> The standards of living there are shit.

Now you're just trolling. I've lived in both the US and in multiple EU countries. Let me tell you, the standard of living in the US does not hold a candle to the one in the EU.


That was before they started acting to fix the problem. Please check the date.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: