We've covered a lot of ground so far, taking a detailed look at how git stores objects in a repo, but the hard stuff is behind us, and it's all plain sailing from here
To summarize, git stores stuff in its repo as "objects" of various types:
commit: | represents a commit in the repo history, and have an associated tree object |
tree: | describes the working tree at the time of a commit, and can have child tree objects (for sub-directories) and/or blob objects (for files) |
blob: | used to store the actual file content being tracked |
tag: | used to store annotated tags |
So, to retrieve an object from a repo, we need to:
- first check if it's there as a loose object (the file path is derived from the object's name)
- otherwise, check all the packs to see if it's in one of them
def find_object( obj_name ): """Find the specified object in the git repo.""" # check if the object is loose fname = ".git/objects/{}/{}".format( obj_name[:2], obj_name[2:] ) if os.path.isfile( fname ): # yup - get it directly from there with open( fname, "rb" ) as fp: obj_data = fp.read() obj_data = zlib.decompress( obj_data ) # extract the object type and size pos = obj_data.find( b"\0" ) obj_type, obj_size = obj_data[:pos].split() obj_type = obj_type.decode( "ascii" ) obj_size = int( obj_size ) # get the object data obj_data = obj_data[ pos+1 : ] assert len(obj_data) == int( obj_size ) return obj_type, obj_data, fname, None # check all the packs obj_type, obj_data, pack_fname, offset = find_pack_object( obj_name ) if obj_type: # found it return obj_type, obj_data, pack_fname, offset return None, None, None, None def find_pack_object( obj_name ): """Find the specified pack object in a repo.""" # search all packs in the repo fspec = ".git/objects/pack/*.pack" for pack_fname in glob.glob( fspec ): offset = _find_obj_in_pack( pack_fname, obj_name ) if offset is not None: # found it - read the object obj_type, obj_data = _read_obj_in_pack_data( pack_fname, offset ) return obj_type, obj_data, pack_fname, offset return None, None, None, None
Source code
The objects.py script can now be run from the command-line. Pass in an object name as an argument, and it will be retrieved from the repo and dumped.
Have your say