References ("refs") are just aliases for a commit. For example, HEAD means "the commit I am currently on", and master means "the last commit on the master branch."
These are stored in files under .git/refs/:
- the tip of each branch is stored under heads/
- tags are stored under tags/
- the tip of each remote branch that is being tracked is stored under remote/
Thankfully , they're trivial to parse - they just contain the commit hash:
def find_loose_refs( ref_type ): """Find loose references of the specified type.""" # find loose references refs_dir = os.path.join( ".git/refs/", ref_type ) if not os.path.isdir( refs_dir ): return [] refs = [] for fname in os.listdir( refs_dir ): fname2 = os.path.join( refs_dir, fname ) with open( fname2, "r", encoding="utf-8" ) as fp: ref = fp.read().rstrip() refs.append( ( fname, ref, os.stat(fname2).st_mtime ) ) return refs
When a repo is packed, all the refs are gathered up and combined into a single file, at .git/packed-refs.
This is also pretty easy to parse. The only peculiarity is a line that starts with a caret (^), which means that the line is specifying the target commit for an annotated tag in the previous line:
def find_packed_refs(): """Find packed references.""" # parse the packed refs fname = ".git/packed-refs" if not os.path.isfile( fname ): return [] refs = [] with open( fname, "r", encoding="ascii" ) as fp: for line in fp: line = line.strip() if not line or line[0] == "#": continue if line[0] == "^": # the previous line is an annotated tag, this line is the target commit refs[-1] = ( *refs[-1], line[1:] ) else: mo = re.search( "^([0-9a-f]{40}) (.*)$", line ) refs.append( ( mo.group(1), mo.group(2) ) ) return refs
Source code
A new script refs.py will find and dump refs in a repo.
Have your say