Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's interesting to me that two of the top three comments right now are talking about gaining performance benefits by switching from Python to C when the actual article in the link claims he gained a speedup by pulling things out of pandas, which is written in C, and using normal Python list operations.

I would like to see all of the actual code he omitted, because I am skeptical how that would happen. It's been a while since I've used pandas for anything, but it should be pretty fast. The only thing I can think is he was maybe trying to run an apply on a column where the function was something doing Python string processing, or possibly the groupby is on something that isn't a categorical variable and needs to be converted on the fly.



> the actual article in the link claims he gained a speedup by pulling things out of pandas, which is written in C, and using normal Python list operations.

Well, he claims he did three things:

(1) avoid repeating a shared step every time the aggregate function was called,

(2) unspecified algorithmic optimizations.

(3) use Python lists instead of pandas dataframes.

(1) is a win that doesn't have anything to do with pandas vs python list ops, (2) is just skipped over any detail but appears to be the meat of the change. Visually, it looks like most of the things the pandas code tries to do just aren't done in the revised code (it's hard to tell because some is hidden behind a function whose purpose and implementation are not provided). It's not at all clear that the move out of pandas was necessary or particularly relevant.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: