During my first employment at Indeed, I cloned every repository down to my machine. This approach worked for a while when the number of repositories was small. As the organization has grown, the solution quickly became unmanageable. While many people do not work across every repository, many are familiar with the pain of setting up a new machine. I wrote gitfs for a few reasons. First, to reduce the time spent setting up a new development environment. Second, to remove the need to figure out where all my projects need to be cloned. In this post, I discuss some challenges faced and lessons learned in writing my first file system.
gitfs in Action
gitfs
is a FUSE file system that helps reduce the management of git repositories.
It works by connecting to well defined api’s (GitHub, Bitbucket, and Gitlab) and fetching repository urls associated with the user.
These urls are parsed into a virutal directory structure that can be navigated via the terminal on linux or osx.
[mjpitz@mjpitz ~/Development/code 1/1]
$ ls
github.com
[mjpitz@mjpitz ~/Development/code 1/1]
$ cd github.com/
[mjpitz@mjpitz ~/Development/code/github.com 1/1]
$ ls
indeedeng indeedeng-alpha mjpitz
[mjpitz@mjpitz ~/Development/code/github.com 1/1]
$ cd mjpitz/
[mjpitz@mjpitz ~/Development/code/github.com/mjpitz 1/1]
$ ls
OpenGrok gitfs jgrapht proto2-3
consul-api grpc-java laas rpi
docker-clickhouse grpc.github.io mya.sh seo-portal
docker-utils grpcsh mp serverless-plugin-simulate
dotfiles hbase-docker okhttp simple-daemon-node
envoy idea-framework proctor spring-config-repo
generator-idea java-gitlab-api proctorjs
[mjpitz@mjpitz ~/Development/code/github.com/mjpitz 1/1]
$ cd mya.sh/
[mjpitz@mjpitz ~/Development/code/github.com/mjpitz/mya.sh master 1/1]
$ ls
Gemfile _drafts _posts go statics
Gemfile.lock _includes _site index.html
_config.yml _layouts docker-compose.yml pages
_data _plugins error.html s3_website.yml
[mjpitz@mjpitz ~/Development/code/github.com/mjpitz/mya.sh master 1/1]
$
Challenge 1 - Finding a complete example
The first big challenge that I encountered was finding a complete working example. I chose the bazil/fuse library since it provided a clean low level implementation. Using a few basic tutorials, I was able to implement a read-only version of the file system. Unfortunately, the tutorials often only implemented a couple of interfaces from the library. And finding a complete example proved to be very difficult. Eventually, I stumbled across cockroachdb/examples-go which provides a good example to work off of.
Using this reference, I implemented 2 structures. One that represented a file and one that represented a directory. As the project progressed, having the logic in two separate files became difficult to manage. Eventually, these implementations collapsed into a single INode structure. This made it easy to keep a lot of business logic in one place. For portability, I added an interface for quick reference detailing which methods need to be implemented.
package filesystem
import "bazil.org/fuse/fs"
type INode interface {
// common node functions
fs.Node
fs.NodeSetattrer
// directory functions
fs.NodeStringLookuper
fs.HandleReadDirAller
fs.NodeMkdirer
fs.NodeCreater
fs.NodeRemover
fs.NodeRenamer
fs.NodeSymlinker
// handle functions
fs.NodeOpener
fs.HandleWriter
fs.HandleReader
fs.NodeFsyncer
fs.HandleFlusher
fs.HandleReleaser
// symlink functions
fs.NodeReadlinker
}
Challenge 2 - Debugging
Debugging a file system can be intense. Since many operations happen in such a short period of time, a full set of logs can quickly fill your disk. First, I started by only logging errors. That solution was insufficient. In many cases, context from the request and wrapping structure would’ve helped debug issues. After iterating on the log a few times, I wound up adding an info log at the start of the method. It included details about the request, details about the structure, as well as what method was being invoked. From this, I was able to see the full sequence of operations on the file system. But it was a lot.
In many cases, the error logs were enough to understand what went wrong.
To reduce the volume in the typical case, I implemented a DEBUG
mode.
By default, the info log is suppressed.
When DEBUG
is set to true
, the info log and the additional details are logged to stdout.
Since debugging now requires restarting the file system, I needed to understand reproduction steps before restarting.
By understanding the reproduction steps well, I am able to reproduce the issue quickly, keeping the debug log short and easy to read.