Awasu » Git Guts: Retrieving objects from a repo
Friday 7th January 2022 8:41 PM

We've covered a lot of ground so far, taking a detailed look at how git stores objects in a repo, but the hard stuff is behind us, and it's all plain sailing from here :-)

To summarize, git stores stuff in its repo as "objects" of various types:

commit:

represents a commit in the repo history, and have an associated tree object
tree:

describes the working tree at the time of a commit, and can have child tree objects (for sub-directories) and/or blob objects (for files)
blob:

used to store the actual file content being tracked
tag:

used to store annotated tags

So, to retrieve an object from a repo, we need to:

  • first check if it's there as a loose object (the file path is derived from the object's name)
  • otherwise, check all the packs to see if it's in one of them
    def find_object( obj_name ):
        """Find the specified object in the git repo."""

        # check if the object is loose
        fname = ".git/objects/{}/{}".format( obj_name[:2], obj_name[2:] )
        if os.path.isfile( fname ):

            # yup - get it directly from there
            with open( fname, "rb" ) as fp:
                obj_data = fp.read()
            obj_data = zlib.decompress( obj_data )

            # extract the object type and size
            pos = obj_data.find( b"\0" )
            obj_type, obj_size = obj_data[:pos].split()
            obj_type = obj_type.decode( "ascii" )
            obj_size = int( obj_size )

            # get the object data
            obj_data = obj_data[ pos+1 : ]
            assert len(obj_data) == int( obj_size )
            return obj_type, obj_data, fname, None

        # check all the packs
        obj_type, obj_data, pack_fname, offset = find_pack_object( obj_name )
        if obj_type:
            # found it
            return obj_type, obj_data, pack_fname, offset

        return None, None, None, None

    def find_pack_object( obj_name ):
        """Find the specified pack object in a repo."""

        # search all packs in the repo
        fspec = ".git/objects/pack/*.pack"
        for pack_fname in glob.glob( fspec ):
            offset = _find_obj_in_pack( pack_fname, obj_name )
            if offset is not None:
                # found it - read the object
                obj_type, obj_data = _read_obj_in_pack_data( pack_fname, offset )
                return obj_type, obj_data, pack_fname, offset

        return None, None, None, None

Source code

The objects.py script can now be run from the command-line. Pass in an object name as an argument, and it will be retrieved from the repo and dumped.



Have your say