March 7th 9 a.m. – 12:20 p.m.
Relational databases are often the bread-and-butter of large-scale data storage, yet they are often poorly understood by Python programmers. Organizations even split programmers into SQL and front-end teams, each of which jealously guards its turf. These tutorials will take what you already know about Python programming, and advance into a new realm: SQL programming and database design.
The class will consist of six 25-minute lessons, each of which features a 10-minute lecture, 10 minutes of interesting exercises, and a 5-minute wrap-up in which the instructor recaps the exercises by giving his own answers. The focus will be on keeping things simple so that each building block is grasped clearly. The six lessons will be laid out something like this:
1. Tables, INSERT, and SELECT.
- Create a simple sqlite3 table with the DB-API interface provided by the Python Standard Library.
- Use INSERT to fill the table with data.
- Concatenate INSERT statements to increase the speed and reduce the number of database round-trips required during a bulk data load.
- Read back table rows with SELECT.
- Add dynamic expressions to the rows returned by SELECT.
- Quote values correct to avoid SQL injection attacks.
- Avoid “gotchya” differences between Python and SQL data types, with particular attention to Unicode, date-times, and the behavior of NULL verses None.
2. WHERE and the importance of being indexed.
- Use WHERE to select specific rows from a table.
- Run quick performance checks that demonstrate that WHERE usually requires the entire table to be read into memory and scanned.
- Add a simple index to shortcut specific WHERE clauses and return their results more quickly.
- Check whether an index is being used, and learn several reasons why apparently useful indexes get ignored by the database.
- Add aggregate indexes that yield performance increases for very specific WHERE clauses.
- Investigate how our data distrubtion — for example, whether a particular column has thousands of different values, or merely thousands of instances of a handful of values — can impact the wisdom and performance of various query plans.
3. FOREIGN KEY and JOIN
- Use a foreign key to relate rows in one table with rows in another.
- Add JOIN clauses to a SELECT statement to assemble query-result rows that are built from pieces of several tables.
- Diagnose performance problems with JOIN by observing the cost of full N×M scans that compare every row from one table with every row from another.
- Think about the indexes that a query plan could take advantage of behind the scenes.
- Create indexes that let the database take shortcuts when doing common JOINs.
4. Post-processing.
- Use ORDER BY to control the rows which are returned first by a given query.
- Combine OFFSET and LIMIT to return "paged" results suitable for displaying on a limited display, like a web page or GUI window
- Observe how indexes affect the performance of ORDER BY / LIMIT.
- Use GROUP BY to support aggregate operations such as sums, averages, maxima, and minima.
- Filter aggregate results with the HAVING clause.
- The exercises will present small Python scripts that post-process data, and ask students to write the equivalent GROUP BY / HAVING expressions to remove the need for the Python post-processing.
5. Modifying tables.
- Write WHERE clauses for UPDATE and DELETE using the same patterns already learned for SELECT.
- Use transactions in combinations with UPDATE and DELETE to prevent inconsistent database states from becoming visible to other clients.
6. ORMs, Objects, and Tables.
- Create tables of objects using the SQLAlchemy declarative schema in combination with classes.
- Understand the main differences between SQLAlchemy and the Django ORM, including the idea of explicit saves versus a unit-of-work pattern.
- See how ORM query syntaxes mix down to SQL statements.
- Determine when an ORM will be helpful, versus when straight SQL migth be a better solution for a particular problem.
Of course, mastery of these topics cannot be conveyed in a single three-hour course! The tutorial will have succeeded if students learn the main moving parts that are involved in a relationally-backed Python application, if they have gotten some practice with SQL and the kind of tasks that it seeks to simplify, and if they have a foundation upon which to build when they are next faced with writing or modifying Python code that interfaces with a SQL database.