Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I am not an expert and am still reading thru the article, but why is it such a strong dichotomy? Don't all predictive algorithm also assume a data model? for example aren't hidden Markov models, by assuming constant transition probability make a data assumption?

I'm going to try my best to answer your question from my experiences and background. I am from the statistic school of thought so please keep that in mind for any bias.

I can give you an example of the different mentality in applied math vs statistic with regard to modeling. Then I'll try to expand to machine learning.

So an example will be time series univariate data. Applied Math people will use probability to try to model the process which create the time series data. A statistician will not care about modeling the process that create the data, he/she only care about using all the information from the data to create a model. A very clear example to this times series model is when you do residual analysis to see if the statistic model uses all the information from the data.

I know it sounds superficial but it also drive how each field invent and research different models.

Let's go from statistic vs machine learning thinking. If you lurk in /r/statistic you will see many statisticians will separate ML models vs statistic model (data model in Dr. Breiman's term) by confidence interval. A statistical model gives the confidence interval of that prediction on top of inferences to the parameters and such. ML models does not give a CI. A con to this is giving a prediction without CI is worthless to a statistician because it doesn't tell us how good that prediction is.

Let take linear regression as an example. Many non statistic books will give you equations how to solve for it via a cost function least square. Statistic books give you that and the MLE way of doing it. We also see every prediction is the expected value from a distribution (see here https://stats.stackexchange.com/questions/148803/how-does-li...). And often time most non statistic books aren't going to give you that point of view.

Another example is Deep Learning vs PGM or Bayes Network (heirarchical modeling). From my experiences ML is more empirical driven then statistic.



Thanks. I think you are furthering the point of prediction vs. modeling, right? You can, after all, get a confidence rating for a model


> I think you are furthering the point of prediction vs. modeling, right?

Kinda. I just wanted to point of in general why the models are categorize as statistical model vs non statistical models. Certain area of statistic do prediction aka forecasting too (time series). It's just statistical models gives CI for their prediction and ML usually do not give any, most of the time, it's created empirically and have no theory to get a CI.

As for Confidence Rating, I have no idea what it is I've tried Google but I couldn't find much. What field is this?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: