Wednesday 4 December 2013

Python: The Need For Speed

Python is a great language, I use it practically every day. But Python has a dirty little secret. It is hideously slow.

I have first hand experience of Python just being too slow. In a real world project I found that the best Python implementation ran in 4 seconds and an unoptimised Java version ran in 128ms: a 30 fold improvement.

That comparison is a little unfair - Java is a compiled statically typed language after all. Unfortunately compared to Javascript on the v8 engine Python speed still sucks. The median result was that Python programmes took 12 times longer to run, with a maximum of 50 times slower.

Now some other Pythonistas have mounted the following arguments:

  1. Most of the time it doesn't matter
  2. It is fast enough for scientific calculations using scipy and numpy
  3. Computation intensive tasks should be re-written as a C plugin
  4. Use pypy
The first argument is usually correct. Does it really matter if a programme runs in 1s instead of 1ms? No. There are a minority of cases where performance does matter. It is in this case that Python is not acceptable.

Points 2 and 3 are really the same. Scipi and numpi are written in C, not native Python. The answer to "how do you write a fast Python programme?" seems to be "write it in C". This sounds like the old joke about the tourist wanting to find their way to Dublin with the response being "If I were you I wouldn't start from here". If I have to write a module in C for a performance critical app, why not just write the entire app in a faster language such as Java or C++

Point 4 is cited by a number of Pythonistas - but most C modules are incompatible with pypy and pypy currently has poor Python 3 support. So we are told to write C modules in responses 2 and 3 but then that paints us into a corner with using pypy.

There is another issue with having multiple interpreters - Python has limited resources which are spread between CPython, pypy, Jython and IronPython.

Python needs to merge CPython and pypy - providing a JIT reference implementation and preventing fragmentation in the Python community. This will significantly reduce the number of cases where a C module needs to be written.

I love Python and it has only once been too slow for my needs. But Van Rossum's statement that "It is usually much more effective to take that one piece and replace that one function or module with a little bit of code you wrote in C or C++..." is a cop-out. Python can do better - pypy is already doing better. 

Python should adopt pypy as the default implementation, so Python can achieve good speeds without having to call out to C.

Friday 28 June 2013

ABC iview Isn't Working on Android

On my HTC sensation running 4.0 I had to do Menu->Settings->Advanced->Enable Flash and plug-ins->On demand [Always on would probably be ok too]

I also manually updated to the latest flash player but I don't know if that helped

Update: a project has been released which gets i-view working on Android: https://play.google.com/store/apps/details?id=com.github.aview.app&hl=en

Friday 31 May 2013

Qt4 and Boost On Windows


Instructions

The Qt4 binary on Windows conflicts with the boost pro binaries. The key problem is the -Zc:wchar_t compiler option

I used Microsoft Visual C++ 2010 Express

Download the source code


In "mkspecs\win32-msvc2010\qmake.conf" change:
QMAKE_CFLAGS            = -nologo -Zm200 -Zc:wchar_t-
to
QMAKE_CFLAGS            = -nologo -Zm200 -Zc:wchar_t

Then configure (is an .exe), nmake release. Done! Then add the includes and library into your own application project.

Helpful Links



Monday 22 April 2013

What I Have Learned About Unit Testing

I am just an ordinary programmer trying to do his job. I do not sacrifice to the altar of Test Driven Development or consider myself an uber-programmer. I am just trying to avoid being called at 3am.

I took over the software itself from another programmer. His coding was quite nice with only one or two minor oddities which I am happy to overlook. However the unit tests that were available did not consist of much and were not documented (which is fair enough).

The software has had about 20 minutes downtime in 10 years so the quality expectations were very high. I assumed I would make mistakes and so took multiple steps to try to reduce the risk. This article concentrates on the testing step.

Use the Tools You Get For Free

I used to write an Apache module in C++ which handled about $800 million a year in advertising clicks at its peak, called the clickserver.

One of the most amazing tools for C++ is valgrind, specifically memcheck. It is vital that C++ software be run through valgrind memcheck with representative data. Yes valgrind does vomit up thousands of references to the STL string library which is extremely annoying. The most important thing to pay attention to, in my experience, is uninitialised variables. Uninitialised will almost certainly cause logic errors in your code. Rather than memory leaks which are serious but not as serious as logic errors.

Play It Again Sam

One of the most successful strategies I used when developing the clickserver was to replay real world interactions from real visitors. This made it easy to find any regression issues with the new software.

I would frequently catch one or two major bugs: both regression and in the new features.

The previous programmer wrote a script to replay the log but the difference tool just used diff, which ignored the subtleties of the log format: it should always find a difference (e.g. timestamp). I wrote a more sophisticated tool which would diff the log files and allow elimination of false positives.

There were a few challenges with the replay scenario: environment and false positives.

The output of the clickserver (redirect location, log entry and outgoing cookies) was determined by the input URL, incoming cookies, cached data from the database and configuration. All four inputs must be identical to the original.

The input URL and incoming cookies can be reconstructed from the Apache log, with care to adjust timestamps that were contained in the inputs.

Fortunately the database caches were in file form and were archived. Files were used in case the database went offline and could easily be rolled back to an archive if things were messed up enough. Fortunately that never happened.

The false positives were another challenge. The issue is that bugs are fixed and new features are added in the new software. This will mean that there are differences that make sense. Initially I added an option to my log difference tool to ignore differences in various fields.

The problem is that I only want to ignore various fields if it is definitely a bug fix or new feature. To achieve that I added the rhino engine (ScriptEngine) that is built into Java to allow a more nuanced elimination of false positives. Performance was a real challenge using ScriptEngine. I eventually split the contents of the ignore script into two parts: one function returned a list of fields to ignore (for example completely new fields or those that changed every time) and one did a one-by-one analysis on each result that was a positive match. I did not design the script system to be able to override false negatives.

My Eyes Are Glazing Over...

As anyone who has done any form of in-depth testing can attest to, having too much information to check can cause one's brain to go into neutral. Particularly where we are looking for an error in a thousand or a hundred thousand entries.

Having a closed loop unit test is critical in that regard. An automated unit test should require no human intervention. It sounds obvious but many fall into the trap of creating a bunch of stimuli that a human has to go back to check.

A human should only be involved in filtering out false positives and debugging problems. Of course nothing can be perfect.

Closed Loop Automated Testing

In addition to the log difference utility I also wrote a suite of unit tests. Athough they simply used the interfaces presented by the software (http and log file) and did not test each class directly, which could be considered "functional testing", it used a low-level white box approach which should qualify as "unit testing". I used my knowledge of the internals of the software to create tests which should test the difficult parts of the software.

The tests were written in pyunit (unittest2). The tests were built up from a number of sources:
  1. the (undocumented) unit tests that had been present before
  2. use cases for visitors. Including what the visitor should do and what the visitor can do
  3. protocol tests: both following our protocols, following our protocols in edge cases and breaking our protocols
  4. bugs (and their fixes)
  5. new features
When I initially started the test and documentation project I made a huge spreadsheet with a number of test ideas from all these sources. I did the testing and documentation in unison because I kept finding that a number of "bugs" were not bugs - they were compromises that were not documented anywhere or misunderstandings.

Furthermore writing documentation shook out a number of inconsistencies and misunderstandings. I will go into further detail later.

I wrote a number of support classes, including a python implementation of tail. The end result ran in a few seconds, which was important because I wanted to be able to run it frequently. When the tests were slower I ran them less frequently.

Testing And Documentation?

Fortunately my former employer was forward thinking enough to allow me to do a testing project. I also included documentation in this for two good reasons. Firstly it was something else that needed to be done.

More importantly documentation and testing are intertwined. What is the correct behaviour? What if there are conflicts between what makes sense as "correct" behaviour in two different instances? I once heard that generally each software feature is trivial, it is the interactions between those features that generates complexity.

Furthermore in my experience good documentation improves a design. Having requirements, design, technical documentation or user documentation that has condition upon condition is usually a code smell for bad design. It requires users, programmers and application support hold more information in their minds.

Some people argue that the code is the documentation. If that is the case then the code by definition cannot have any bugs because it is being tested against itself. Furthermore less technical readers cannot read code. Even technical documentation may be read by application support, testers or sysadmins.

Documentation is required to describe a "contract" that the software will adhere to. The protocols and business requirements also form a "contract" that must be included. These contracts are then used in testing. These contracts are not enough to completely describe the software. I am not an advocate of design by contract (I will write about this in another post) but I am happy to pinch an idea here and there.

So documentation gives you something to test against.

The Future: Statistical Methods

Just before I left my last work I was investigating using statistical methods for monitoring new software which was deployed.

I started using Student's t-test but found that our data was too noisy to get a decent separation.

We knew anecdotally that our data was affected by a number of factors: local time of day, day of the week, day of the month, day of the year, client expenditure and many others.

Thus a multivariate analysis would have been likely to be helpful but I didn't have a chance to try it.

Conclusion

Using these steps, along with other quality assurance measures, helped keep the clickserver reliable with only a small number of bugs making it into production.

Sunday 21 April 2013

I Will Fix Your Computer If...

Most professions have a concept of doing pro-bono work. Programming is an awesome job where we get to sit in the office all day. So we should give something back to the community. However we need some ground rules to make sure our time is not drained on a single job.

Computer

  1. Licence:
    1. Does it have a valid Windows licence?
    2. Will you buy a valid Windows licence?
    3. Can I install Ubuntu?
    4. Sorry can't help!
  2. Is it a hardware problem?
    1. A disk drive, RAM or RTC battery?
    2. Are you willing to donate it to be refurbished and given to someone?
    3. Sorry, can't help!
  3. Reformatting - the Windows disc
    1. I will take a backup
    2. Do you have the OS disks?
    3. Do you have a recovery partition?
    4. Have you not deleted the recovery partition?
    5. Will you buy a Windows disc?
    6. Sorry can't help!
  4. Re-installation - other software
    1. Do you have a valid license and disc / legit download?
    2. Will you buy it?
    3. Sorry can't help! (Warez is a PITA)
  5. Crapware / Warez
    1. Is this the second time?
    2. Can I restrict administrator rights?
    3. Can I install Ubuntu?
    4. Sorry can't help!

 Network

  1. Are basic settings correct?
  2. Is it a wireless problem?
    1. Is the modem / router wireless N?
    2. Replace the modem / router
  3. Does replacing the modem help?

Sunday 31 March 2013

Policy Positions - Part 2


Economics

My position is that a market economy is the best known way to run an economy, as imperfect as a market economy can be. The government has a role to play in the case of market failure and where it is more efficient to do something collectively, for example roads.

The Australian economy should be run to benefit Australian society. Wealthier Australians should not be punished, but they should contribute based on their capacity.

The government should target lower income earners directly, rather than relying on a "trickle down" effect. Middle class welfare should be replaced with tax cuts, while still retaining subsidies for lower income earners - in order to break poverty traps.

Having stated that we should end this middle class welfare - should Medicare and education be means tested? The case against education being means tested is very strong - education gives an enormous benefit to the economy by minimising structural unemployment, as well as an enormous benefit to the individual.

Health is another area that requires government intervention due to market failure. What would you pay for the medical care to prevent your death? A person's life has infinite utility to that person - so surely it is rational that a person would pay whatever they could for medical care, even if the cost price was much lower.

One only needs to look at healthcare in the US. It is not like private healthcare in Australia. Since there is no competition across the country from a public health system prices are exorbitant and service is worse.

In terms of economic management I support mainstream evidence based quantitative economics. I support saving during the expanding portion of the business cycle and stimulus during the contracting portion of the business cycle. Economists must be consulted in the running of the economy, they are the experts, but they should present options from which the Australian people may choose.

Welfare

Welfare assists the most vulnerable in our society. The elderly, the unemployed, single mothers and the disabled. The largest of welfare spending is on the elderly (2012-2013 budget).

The most important component of welfare is not welfare itself. It is important to remove structural impediments to people working. For example education, childcare and low tax rates for lower income earners.

Issues where welfare should be reformed involve changing the structure of the rules to encourage work. For example a number of welfare recipients would lose money by working - after reduced benefits, tax,  childcare, transport.

A number of people eligible for disability pensions have expressed to me that it is very hard to work after receiving the disability pension. Even if the recipient of a disability pension is well enough to work for a short period, it is then very difficult to get back onto the disability pension and they are treated as liars for the period when they were unable to work.

There is a need to ensure that welfare payments are sustainable which may include a small punitive element.

Industrial Relations

Industrial relations should be concerned with matching the needs of workers with the needs of employers as efficiently as possible. Policies which maximise the value of employee's pay while minimising the cost to employers. Improving productivity and increasing mobility should be encouraged.

War

In my opinion there are no good wars. However there are wars that we need to fight. I am against sending Australians to fight unless absolutely necessary.

I support the Australian troops - they are simply doing their job.

The nature of war is that there will be collateral damage. If a soldier rounds the corner and needs to make a split second decision whether a person is a threat - mistakes are inevitable.

Some people say that we should "stay the course" or "complete the mission" for the sake of those soldiers that Australia has already lost. The problem is that the loss is ongoing.

Furthermore the scope of the mission creeps over time. Iraq's mission was to find weapons of mass destruction - none were there, then to remove Saddam Hussein, then to establish democracy, then to rebuild the country then ad nauseum. Afghanistan's mission was to find Osama Bin Laden. Bin Laden was found - in Pakistan. Again the democracy and rebuilding mission creep was added. All the time a continuing cost was paid with the lives of our soldiers.

Friday 22 February 2013

Policy Positions - Part 1

General Approach

Asking the right questions and asking many of them is my approach to difficult problems. No-one can know everything so it is vital to consult experts and stakeholders.

I believe in personal freedom and economic freedom with government intervening where is sensible to do so. I believe our country and economy should be run to benefit all Australians - rich and poor. I do not believe in "punishing" the rich but some Australians have a better capacity to contribute.

Education

Education is vital to having a well functioning economy and for preventing poverty traps.

Education helps to minimise "structural unemployment" - where workers can't find jobs they are qualified for and employers can't find work that are qualified in a different field.

Workers can go to TAFE to learn a skill that will qualify them for a job - benefiting employers and workers.

The United States is having problems with structural unemployment - there are jobs but those people who are unemployed are not qualified for them.

Education also helps people escape from poverty traps. Education can help someone find a job or find a higher paying job. If a person has no job, or a low paying job, how are they going to afford an education - so affordable education is also vital.

Education is particularly important for people who have been out of the workforce for a period of time - such as parents who have cared for pre-school children or workers who have been injured.

Furthermore a large proportion of inmates in gaol are illiterate - ensuring a high literacy rate is important for reducing crime.

Health

Affordable quality health care is vitally important. If a person is too sick to work, how would they earn the money for treatment?

I have talked to people from countries that do not have public healthcare systems - they will stockpile at least A$100,000 for medical treatment. This is money that is not being productively used in the economy. Furthermore without a competing public health system it often costs hundreds of dollars just to see a GP.

NBN

The NBN is a critical piece of infrastructure. The isolation of Australia makes telecommunications more valuable. The Nationals even suggested a plan similar to the NBN because it is the most isolated locations in the bush that will obtain the most benefit.

A great number of pie-in-the-sky uses for the NBN have been mentioned but there are a great deal of uses everyday for the NBN:
  • backups will take minutes instead of days - critical business data, personal photos or personal videos won't be lost
  • better reliability means everything you do today - internet banking, ordering online, communicating - is less likely to be interrupted by an internet outage
  • better reliability means you can rely on cheaper services such as VOIP or skype
  • cheaper access for business that already pay around $3000 a month for a fibre connection that is inferior to the NBN
  • cheaper multi-line phone lines for business using VOIP - the NBN has both the capacity and the reliability
  • businesses can easily send large files to customers or other businesses, for example:
    • sending product photos for catalogues
    • sending scanned plans or documents with high detail
  • businesses can export overseas in fields that are data intensive such as software engineering, graphic design, music and filmmaking
Unfortunately mobile cannot deliver the speed, reliability and capacity that the NBN will. Fast mobile internet would require a mobile tower on every street corner. The technology already exists to deliver up to 1Gbps over fibre economically. However there is currently no economically viable mobile technology that can provide the same speed.

Fibre to the Node (FTTN) will still rely on the aging copper telephone lines which were never designed to carry high speed internet. The telephone lines are becoming unreliable as they become older. The experimental VDSL2 can achieve speeds of up to 100Mbps over 300m but there is no current technology which can achieve 1Gbps over telephone lines.

Note that there is some confusion - mobile is not wifi. Some people use "wireless" to mean either. Mobile cannot currently get anywhere close to the NBN. Wifi can reach those speeds - but it needs to be connected to your home internet connection.

Friday 15 February 2013

Book Review - Forward the Foundation

Forward the Foundation

Forward the foundation is an interesting look at a future where "psychohistorians" are able to make vague predictions about the future through mathematical equations. Furthermore there are a group of people who were able to "push" ideas to other people - the Foundation is an organisation founded in the end to organise all these people into "pushing" all the people of the federation into the behaviour required by psychohistory.

Psychohistory seems to be a science which parallels economics, with a mirror of the current hysteria surrounding economics - at some times hysterically positive and at other times hysterically negative.

I felt the book ended somewhat weakly with the psychics ensuring the populace followed the tenets of psychohistory. A few other elements were a somewhat unbelievable - such as how the administration of millions of planets with billions of people on each was managed.

So a book with some interesting ideas and engaging enough to see it to the end.

Title: Foward the Foundation
Author: Isaac Asimov
ISBN: 9780385269421
Rating: 3/5