Wednesday 4 December 2013

Python: The Need For Speed

Python is a great language, I use it practically every day. But Python has a dirty little secret. It is hideously slow.

I have first hand experience of Python just being too slow. In a real world project I found that the best Python implementation ran in 4 seconds and an unoptimised Java version ran in 128ms: a 30 fold improvement.

That comparison is a little unfair - Java is a compiled statically typed language after all. Unfortunately compared to Javascript on the v8 engine Python speed still sucks. The median result was that Python programmes took 12 times longer to run, with a maximum of 50 times slower.

Now some other Pythonistas have mounted the following arguments:

  1. Most of the time it doesn't matter
  2. It is fast enough for scientific calculations using scipy and numpy
  3. Computation intensive tasks should be re-written as a C plugin
  4. Use pypy
The first argument is usually correct. Does it really matter if a programme runs in 1s instead of 1ms? No. There are a minority of cases where performance does matter. It is in this case that Python is not acceptable.

Points 2 and 3 are really the same. Scipi and numpi are written in C, not native Python. The answer to "how do you write a fast Python programme?" seems to be "write it in C". This sounds like the old joke about the tourist wanting to find their way to Dublin with the response being "If I were you I wouldn't start from here". If I have to write a module in C for a performance critical app, why not just write the entire app in a faster language such as Java or C++

Point 4 is cited by a number of Pythonistas - but most C modules are incompatible with pypy and pypy currently has poor Python 3 support. So we are told to write C modules in responses 2 and 3 but then that paints us into a corner with using pypy.

There is another issue with having multiple interpreters - Python has limited resources which are spread between CPython, pypy, Jython and IronPython.

Python needs to merge CPython and pypy - providing a JIT reference implementation and preventing fragmentation in the Python community. This will significantly reduce the number of cases where a C module needs to be written.

I love Python and it has only once been too slow for my needs. But Van Rossum's statement that "It is usually much more effective to take that one piece and replace that one function or module with a little bit of code you wrote in C or C++..." is a cop-out. Python can do better - pypy is already doing better. 

Python should adopt pypy as the default implementation, so Python can achieve good speeds without having to call out to C.