don't confuse this with universal approximation - yes shallow ReLU networks are dense in functional space, so at the limit you should be able to get any function you want - but they are talking about exact representation with finitely many neurons here.