DIY Command Line Tool in Python
Josh Shields
- Audience level:
- Novice
- Category:
- Python Core (language, stdlib, etc.)
Description
Abstract
The current progress of py_sitemapper can be seen at: https://github.com/jshields/py_sitemapper
This tool is meant to take a URL as input and output an XML sitemap. py_sitemapper was written with a focus on leveraging the standard library in order to minimize development time and maximize usefulness. It's an example to show how command line tools can be written in Python. One of the goals of py_sitemapper is to use established patterns, following best practice, in a way that can be repeated to write other CLI tools.
Standard library modules used in this project: - HTMLParser - argparse - logging - re - requests - sys
Modules written for this project: - cli: Command Line Interface wrapper to drive functionality based on user input. - parse: Parser for web pages. Manage a session and comb HTML for links, in the case of this project. - sitemap: Contains a Sitemap class that has XML content and can be exported.
Discussion points: - What tools have you written? - How does writing BASH scripts and programs compare to Python? - The example tool here, py_sitemapper, provides information about hyperlinks from the hypertext on a website, but how can useful information be harvested by a web crawler from sites that rely on frameworks rather than plain HTML?