It just establishes new model assumptions. In the linear regression case, fitting boxcox(y) = βX + ε and predicting with boxcox^(-1)(yhat) produces different predictions than fitting y = βX + ε (minimizing Σ(boxcox(y) - boxcox(yhat))^2 does not minimize Σ(y - yhat)^2 in general). Either model would be an approximation, of course, and it's possible for either to produce more accurate predictions than the other for a given set of observations.
I'm not familiar enough with Prophet to know whether the same logic applies here, though I'd hazard a guess that it does.
Right. In particular, using least squares is justified by the assumption that errors are normally distributed, in which case least squares yields the maximum-likelihood estimate. Because boxcox is intended to transform a random variable into one that is normally distributed, the model assumptions are actually more likely to be satisfied if you do regression on the transformed values. (Which was probably the reason boxcox was invented in the first place.)
I'm not familiar enough with Prophet to know whether the same logic applies here, though I'd hazard a guess that it does.