Filesystem over S3 : s3backer

4 minutes read | 741 words

25 Jul, 2020 Aaruni Kaushik

I have recently been experimenting with S3 compatible storage, in an effort to find the most efficient way to expand the storage my server has on offer.

After scouring the web for options, and briefly settling on S3F, I discovered s3backer.

The Problem

Storage in S3 happen via “objects” in “buckets”. Each object is at least 4 kB in size, with no upper limit as far as I know. But objects cannot be updated. Suppose you store myFile.odt as an object on your S3 cloud. If you then edit the file, and want to store the changes on your S3, it will first discard the copy it already has, and require you to push the entire file again to cloud. While this is not a problem for read only data, this quickly adds a lot of overhead to “hot data” : data that is read from and written to often. Not only is this method wasteful for repeated rewrites of a file, in my experiments writing big files directly to S3 (over S3FS) was also painfully slow.

The Solution

Presenting s3backer. This neat little program can connect to S3 compatible storage, and treat a bucket like a physical storage device. It abstracts the S3 away as some disk of a chosen block size ( 128kb is the recommended to get started with ), and presents it as a large disk image on your system. You can then format this disk image with a filesystem of your choice, and mount it as if it were just a regular storage device.

At this point, all of your applications are blind to the fact that this newly mounted storage is backed by S3, and all operations supported. Since this is a regular filesystem, all kinds of disk caching the kernel performs for regular drives are also applied here, further increasing performance. You circumvent the problem of having to rewrite entire objects on the S3 because of the fact that all of your files are split into objects of 128kb each, and only the parts of the files modified by your program will need to be reuploaded.

The Drawbacks

Not a problem in my use case, but it should be mentioned that using a bucket with S3 makes it incompatible with simultaneous read/write mounts. Moreover, it offers no safeguard against such a situation. It is up to the end user to make sure this does not happen.

Because of the way s3backer works, data in a bucket can only meaningfully be accessed via s3backer, or something similar. You will not be able to access your files via webview, S3FS, or other methods.

Installation and Usage

For Debian, there are no prebuilt package for s3backer. As a result, for Debian, you must download the latest release code, and compile it yourself. The build dependencies are provided as an apt gettable list:

libcurl4-openssl-dev libfuse-dev libexpat1-dev libssl-dev zlib1g-dev pkg-config autoconf automake

Then, installation is pretty standard.

$ ./configure
$ make
$ sudo make install

Once s3backer is installed, you will need to create two mountpoints. The first, for the program to connect to S3 and mount a bucket of your choosing on your filesystem. The second, to mount the disk image from the first mountpoint as a loop device. Call them s3-b and s3-fs.

It is then possible to mount a demo bucket provided on the official wiki page , just to see how it works.

$ sudo mount -o loop s3-b/file s3-fs

Unmount by first umounting the file system, and then umounting the backing store.

$ sudo umount s3-fs && sudo umount s3-b

Once you are convinced that s3backer works as advertised, you can config it to use your S3 bucket.

Put your accessID and accessKey in a conveniently located file (e.g.: ~.passwd-s3b) in the format accessID:accessKey
Fire up s3backer to create a backing store in your bucket

$ s3backer --blockSize=128k --size=1t --listBlocks mybucket s3-b

Note : Find the full list of options at the s3backer ManPage
Now create a filesystem pretending that s3-b/file was a block device

$ mkfs.ext4 s3-b/file

Note that creating an ext4 filesystem might take a lot of time to initialize the block device. Make sure you use the --listBlocks option otherwise you will incur a ton of network transfer
Finally, mount the filesystem you have just created, and then use it as normal!

$ sudo mount -o loop s3-b/file s3-fs