Graph Analysis from the Ground Up

Type:
Tutorial
Audience level:
Novice
Category:
Science
March 7th 1:20 p.m. – 4:40 p.m.

Description

Graphs are a fundamental datatype - but typical developers don't get as much exposure to using and working with graphs as with other datatypes like tables and queues. This is a from-the-ground up working session; by the end, attendees should have the tools and experience to model and analyze problems with graphs.

Abstract

Summary

This tutorial is intended to bring somebody with Python experience but limited or no experience using graph-based algorithms to a place where they:

  1. Understand the basics of graph theory and why it can be helpful;
  2. Are familiar with the available tools for dealing with graphs;
  3. Recognize how to model a problem in terms of a graph; and
  4. Have a first hands-on experience applying the theory and the tools to solve an interesting real-world problem.

To do this, the tutorial is divided into four sections, each corresponding to one of the objectives above. Each portion will have a hands-on exercise pertaining to the exact subject, with part 4 as a crowning workshop bringing together various skills and points raised throughout the session; after having a few minutes to work on their own code and ask questions, the class as a whole will walk through a solution.

Part 1: Graph Theory (50 mins)

  • Theoretical background (15 mins): What are graphs? Where do they show up? Why do we care?
  • Reasoning about graphs (35 mins): Nodes, edges, cycles, and traversals; operations on graphs.
  • Practical exercise: Path operations and the filesystem tree

Part 2: Graph Tools (35 mins)

  • Representing graphs in Python (15 mins)
  • Dict-based representation
  • NetworkX, iGraph, etc.
  • Reading and writing graphs (10 mins)
  • In-memory representations
  • On-disk representations
  • Graph databases
  • Visualizing graphs (10 mins)
  • Matplotlib, Gephi, etc.
  • Practical exercises:
  • Converting the filesystem tree to use a graph representation
  • Using the same representation to model a map

10 min break

Part 3: Graphs in Practice (35 mins)

  • Recognizing word problems that suggest a graph-based solution (5 mins)
  • Problem modeling (15 mins): What are your nodes? What are your edges?
  • Performing computations on graphs (15 mins): Graph algorithms in code; Gremlin; flow-based programming
  • Practical exercises:
  • Build a graph representing a module dependency graph
  • Answering questions by traversing the graph
  • High-level visualization

Part 4: Hands-on with Graphs (50 mins)

  • Problem set: Analysis of Python-Dev
  • Data cleaning (10 mins)
  • Graph construction (10 mins)
  • Centrality (10 mins)
  • Clique analysis (10 mins)
  • Visualization (10 mins)