> For ML accelerators to be effective in space, they must withstand the environment of low-Earth orbit. We tested Trillium, Google’s v6e Cloud TPU, in a 67MeV proton beam to test for impact from total ionizing dose (TID) and single event effects (SEEs).
>
> The results were promising. While the High Bandwidth Memory (HBM) subsystems were the most sensitive component, they only began showing irregularities after a cumulative dose of 2 krad(Si) — nearly three times the expected (shielded) five year mission dose of 750 rad(Si). No hard failures were attributable to TID up to the maximum tested dose of 15 krad(Si) on a single chip, indicating that Trillium TPUs are surprisingly radiation-hard for space applications.
> Project Suncatcher is a moonshot exploring a new frontier: equipping solar-powered satellite constellations with TPUs and free-space optical links to one day scale machine learning compute in space.
You should read the linked article, they talk about it there. You radiate the heat into space which takes less surface area than the solar panels and you can just have them back to back.
In general I don't understand this line of thinking. This would be such a basic problem to miss, so my first instinct would be to just look up what solution other people propose. It is very easy to find this online.
Taking a system which was conceptualized about a quarter of a century ago and serves much different needs than what a datacenter in space needs (e.g. very strict thermal band, compared to acceptable temperature range from 20 to 80 degrees) isn't ideal.
The physics is quite simple and you can definitely make it work out. The Stefan Boltzman law works in your favor the higher you can push your temperatures.
If anything a orbital datacenter could be a slightly easier case. Ideally it will be in an orbit which always sees the sun. Most other satellites need to be in the earth shadow from time to time making heaters as well radiators necessary.
These data centers are solar powered, right? So if they are absorbing 100% of the energy on their sun side, by default they'll be able to heat up as much as an object left in the sun, which I assume isn't very hot compared to what they are taking in. How do they crank their temperature up so as to get the Stefan Boltzmann law working in their favor?
I suppose one could get some sub part of the whole satellite to a higher temperature so as to radiate heat efficiently, but that would itself take power, the power required to concentrate heat which naturally/thermodynamically prefers to stay spread out. How much power does that take? I have no idea.
σ is such a small number in Stefan-Boltzman that it makes no difference at all until your radiators get hot enough to start melting.
You not only need absolute huge radiators for a space data centre, you need an active cooling/pumping system to make sure the heat is evenly distributed across them.
I'm fairly sure no one has built a kilometer-sized fridge radiator before, especially not in space.
You can't just stick some big metal fins on a box and call it a day.
Out of curiosity, I plugged in the numbers - I have solar at home, and a 2 m2 panel makes about 500w - i assume the one in orbit will be a bit more efficient without atmosphere and a bit more fancy, making it generate 750w.
If we run the radiators at 80C (a reasonable temp for silicon), that's about 350K, assuming the outside is 0K which makes the radiator be able to radiate away about 1500W, so roughly double.
Depending on what percentage of time we spend in sunlight (depends on orbit, but the number's between 50%-100%, with a 66% a good estimate for LEO), we can reduce the radiator surface area by that amount.
So a LEO satellite in a decaying orbit (designed to crash back onto the Earth after 3 years, or one GPU generation) could work technically with 33% of the solar panel area dedicated to cooling.
Realistically, I'd say solar panels are so cheap, that it'd make more sense to create a huge solar park in Africa and accept the much lower efficiency (33% of LEO assuming 8 hours of sunlight, with a 66% efficiency of LEO), as the rest of the infrastructure is insanely more trivial.
This argument assumes that you only need to radiate away the energy that the solar actively turns into electricity, but you also need to dissipate all the excess heat that wasn’t converted. The solar bolometric flux at the earth is 1300 w/m2, or 2600 for 2 sq m. That works out to an efficiency of ~20% for your home solar, and your assumed value of 750 w yields an efficiency of ~30%, which is reasonable for space-rated solar. But assuming an overall albedo of ~5% that means that you were only accounting for a third of the total energy that needs to be radiated.
Put another way, 2 sq m intercepts 2600 w of solar power but only radiates ~1700 w at 350 k, which means it needs to be run at a higher temperature of nearly 125 celsius to achieve equilibrium.
It receives around 2.5kW[0] of energy (in orbit), of which it converts 500W to electric energy, some small amount is reflected and the rest ends up as heat, so use 1kW/m^2 as your input value.
> If we run the radiators at 80C (a reasonable temp for silicon), that's about 350K, assuming the outside is 0K which makes the radiator be able to radiate away about 1500W, so roughly double.
1500W for 2m^2 is less than 2000kW, so your panel will heat up.
>Depending on what percentage of time we spend in sunlight (depends on orbit, but the number's between 50%-100%, with a 66% a good estimate for LEO), we can reduce the radiator surface area by that amount.
You need enough radiators for peak capacity, not just for the average. It's analogous to how you can't put a smaller heat sink on your home PC just because you only run it 66% of the time.
Yes it's fun. One small note, for the outside temp you can use 3K, the cosmic microwave background radiation temperature. Not that it would meaningfully change your conclusion.
It's definitely a solvable problem. But it is a major cost factor that is commonly handwaved away. It also restricts the size of each individual satellite: moving electricity through wires is much easier than pumping cooling fluid to radiators, so radiators are harder to scale. Not a big deal at ISS scale, but some proposals had square kilometers of solar arrays per satellite
That exactly. It's not that it's impossible. It's that it's heavy to efficiently transport heat to the radiators or requires a lot of tiny sats, which have their with problems.
I think the main draw is its elegance. You have very efficient power from the sun, put that directly into your compute, radiate it out. Energy is ~free, no heavy infrastructure required, just a closed circuit for computing.
Elegance compared to a PV/Storage facility built next door to a data centre?
It doesn't make sense right now, and won't for at least 5-10 years.
By which time, this current round of hype will have burned up ~$1T if it doesn't fall apart from the current internal contradictions and lack of market/customers/uses.
We're still on the uphill ride of the Gartner hype cycle, not even at the "Peak of Inflated Expectations" yet.
I'm guessing they were waiting to figure out more efficient serving before a release, and have decided to eat the inference cost temporarily to stay at the frontier.
the only real benefit is privacy which 99.9% of people dont get about. Almost all serving metrics (cost, throughput, ttft) are better with large gpu clusters. Latency is usually hidden by prefill cost.
The comment you were replying to mentioned this. Yes you cant remove heat via convection, but you can use radiators to emit heat as radiation into space.
Depending on how hot/dense/clocked you run your compute, the Radiators take _less_ surface area than the solar panels, so you can have them back to back and the radiators will take less space.
Obviously there are some unanswered questions but there is clearly a path forward.
Its the other way around too, HLE questions were selected adversarially to reduce the scores. I'd guess even if the questions were never released, and new training data was introduced, the scores would improve.
I think something with more uniform training and inference setups, and otherwise equally hardware friendly, just as easily trainable, and equally expressive could replace transformers.