Elliot Speck Disk Space Invader

Ripping DVCS Ledgers from Web Servers

Distributed Version Control

One trend that has been on the constant uptick since the release of Git in 2005 is the usage of distributed version control systems (DVCS) for updating web applications. DVCS drastically modernised the development workflow for a lot of different types of software because it allowed developers to take the full working tree of their codebase anywhere - even offline. No need to checkin / checkout every single file you work on every time, and the need for locking was reduced essentially to binary assets, as everything else can be done via patches.

While DVCS themselves aren’t the cause of the issue, they drastically popularised the model even for sole developers, who could use the system essentially as a journal for their software, allowing them to commit and revert changes they make as they see fit. For many languages it also sped up the process of deployment, as it allowed pushing code to the server to be a simple task:

# Add a 'remote' - a version of the repository hosted elsewhere
git remote add staging [email protected]

# Commit a change to your local repository.
git commit -am "Modify the login process." -m "Move the password hashing process to the beginning of the routine to prevent timing attacks."

# Push the version of your repository to the 'staging' remote repository.
git push staging master 

Plenty of magic can happen behind the scenes in the form of post-commit hooks that allow you to do things such as rebuild a binary or reboot an application server whenever you successfully receive a repository push, allowing one to codify the entire deployment pipeline into a simple shell script.

Storms Ahead

What is increasingly common, however, is the unfortunate misusage of the directory in which Git stores its repository files: .git. While robots.txt may have information in bespoke web applications about things the developers don’t want the wider world to see, .git - if present - skips that entire process and contains the entirety of the source code for the application. Finding a .git directory within the root directory of a web application is a big deal, as you can pull that codebase down and review it for further issues - often with a simple git clone:

# The example.com website has accidentally exposed its .git repository at https://example.com/.git
git clone https://example.com/.git site-rip
cd site-rip
vi index.php

Finding these doesn’t even need to be a case of running git clone against every site you come across - this can be done with your friendly neighbourhood web browser by checking for the presence of /.git/config. There are plenty of interesting files in the repository directory, including /.git/logs/HEAD which contains a log of all commits (including email addresses) to the current working copy on the server.

Prevention

Ideally, one can simply not store the .git directory in the web root. This is by far and away the best solution as it removes all other “what if” possibilities that any other solution may have, while still allowing you to actually use a Git pipeline like the above. If for some reason (development time, maintenance time, etc.) this isn’t possible, it should be at least restricted in your HTTP server configuration file, preventing remote users from accessing the directory at all. If your web server is a ‘remote’ as it is in the pipeline above, there’s simply no reason that anyone should be able to access it.

nginx:

server {
    ...
    location /.git {
        return 404; # Some would argue a 403 would be better here, but pretending it doesn't exist is my preferred method.
                    # There's no harm in lying about it here.
    }

Apache .htaccess file:

<Directorymatch "^/.*/\.git/">
    Order deny,allow
    Deny from all
</Directorymatch>