We've covered a lot of ground, and written a lot of code, in this tutorial. The code has been written to be easy to understand, rather than production-quality, so we'll finish up by tightening up a few things.
Improving the command-line interface and output
We start off by adding a proper command-line interface to the scripts, and colorizing the output.
Run each script with --help to get information about the arguments it accepts.
Trace messages are also now done using Python logging.
The code loads everything into memory before printing out what it's found, which won't work too well for very large repo's, so things have been changed to output results progressively.
Decompressing zlib streams has also been optimized.
Finally, an option has been added to objects.py to verify that the objects are being retrieved correctly, by comparing each one with the output of git cat-file. This can take some time on large repo'sEvery revision of every file will be retrieved, then git will be run several times to verify the result, all of which will take time, especially if there are large files., but is a good validation of the code we've written.
A similar option has also been added to logs.py.
|↵1||Every revision of every file will be retrieved, then git will be run several times to verify the result, all of which will take time, especially if there are large files.|