Pragmatic Unicode, or, How do I stop the pain?

E
Type:
Talk
Audience level:
Novice
Category:
Best Practices/Patterns
March 10th 11:45 a.m. – 12:15 p.m.

Description

Python has great Unicode support, but it's still your responsibility to handle it properly. I'll do a quick overview of what Unicode is, but only enough to get your program working properly. I'll describe strategies to make your code work, and keep it working, without getting too far afield in Unicode la-la-land.

Abstract

Python has great Unicode support, but it's still your responsibility to handle it properly. Even expert programmers get tripped up with the encodings and decodings that can happen implicitly, throwing errors in unexpected places.

This talk will present a quick overview of what Unicode is, why it exists, and how it works, but only enough to get your program working properly. Unicode can be intricate and fascinating, but really, who cares? You just want your code to work without throwing a UnicodeEncodeError every time an accented character sneaks in somehow.

I'll describe strategies to make your code work, and keep it working, without getting too far afield in Unicode la-la-land.

How Unicode is handled is one of the biggest changes in Python 3. I'll touch on what those changes are, and how you can use them to keep even your Python 2 code running smoothly.

Outline

  • Bytes vs. text
  • ASCII, 8859-1, etc.
  • Unicode
  • Encodings
  • Python 2: str vs unicode
  • encode and decode
  • implicit conversions!!
  • Python 3: bytes vs str
  • Everybody's happy!