Integration of SoicaLite queries into Python programs
Jiwon Seo
- Audience level:
- Intermediate
- Category:
- Other
Description
PySociaLite is a Python integrated query language for data processing. The query language(SociaLite) supports simple annotations for partitioning data over distributed machines; then, SociaLite makes it easy to process distributed data without explicit communication code. Python integration makes SociaLite very powerful, making thousands of existing Python code accessible in SociaLite queries.
Abstract
We will demonstrate PySociaLite, Python integrated SociaLite queries. SociaLite is a query language for distributed data processing. As a declarative query language like SQL, SociaLite can succinctly express data processing logic. Furthermore, SociaLite compiler automatically compiles queries into efficient parallel code that runs on distributed multi-core machines.
With its Python integration, PySociaLite allows SociaLite queries to be directly embedded in Python (Jython) programs. This integration allows SociaLite queries to access Python functions and variables, and implement part of the queries in Python code.
The embedding of SociaLite query inside Python code is indicated by a pair of backtiks (`). The embedded queries are preprocessed (using PyParsing) and is rewritten into a function call with the queries as a string parameter. Python-level functions/variables are prefixed with a dollar sign ($), and is also recognized by the preprocessor, and passed as parameters to the function call.
The inter-operability -- the ability to invoke Python function inside SociaLite queries and the ability to access SociaLite tables in Python -- makes PySociaLite a very powerful data processing language.