Make Sure Your Programs Crash
- Type:
- Talk
- Audience level:
- Intermediate
- Category:
- Best Practices/Patterns
March 9th 3:20 p.m. – 3:50 p.m.
Description
With Python, segmentation faults and the like simply don't happen -- programs do not crash. However, the world is a messy, chaotic place. What happens when your programs crash? I will talk about how to make sure that your application survives crashes, reboots and other nasty problems.
Abstract
Handling crashes is divided into two parts -- resilience (making sure that your software maintains correctness in the face of crashes) and speed of recovery (optimizing the time it takes back to get back to full working condition). I will talk about techniques to allow for resilience -- separating master data from cache data, minimizing the amount of master data, using atomic file operations, using databases and persisting structures in the right order. Then I will talk about speedy recovery techniques, among them process separation, working while restarting and more. I will conclude with surveying the options in testing all of these things so that the crashes are made to happen in the development/testing environment.
Outline:
- Ways Python programs can crash
- Infinite loops
- Getting stuck
- Memory leaks
- Exceptions
- Catching exceptions considered scary
- Threads dead-locks
- Minimizing effects of a crash
- Atomic file operations
- Databases
- Vertical process splitting
- Horizontal process splitting
- Limiting process lifetime
- Detecting crashes
- Process death
- Process inresponsiveness
- Test communication
- Helper checker processes
- Restarting processes
- Minimize master data
- Boot-up speed
- Order of start-up and communication
- Testing by killing processes
- Testing by pausing processes
- Conclusions
- Python processes can still crash
- Plan for crashes
- Test your plan for crashes