Understanding and verifying Syncthing’s untrusted peer encryption • T-shaped

Syncthing is a great piece of open source software that allows you to synchronize folders of files across your devices, without relying on cloud providers such as Dropbox and OneDrive.

One scenario it supports is synchronization between a trusted device (e.g. your own laptop) and an untrusted device (e.g. a cloud server somewhere). In this scenario, the files received by the untrusted device are additionally encrypted using a key that only trusted devices possess. Even if someone were to gain access to the files on the untrusted device, they would not be able to do anything useful with them! This functionality can for instance be used to build your own Dropbox-style service (with the benefit of using an always-on cloud server with good connectivity) or perhaps a back-up system (using Syncthing’s versioning system or by snapshotting the underlying file system to provide historical back-up – obviously if you delete all your files, Syncthing will otherwise happily synchronize and delete the files on all other devices as well!).

The exact encryption scheme used for untrusted peers is documented here. In order to be able to fully understand and trust the encryption scheme used I decided to try and implement it myself. While the syncthing command line utility supports offline decryption of a folder, I thought it would be neat to have a tool completely separate from the syncthing code to do so.

Encrypted folder structure

On the untrusted peer, a synchronized encrypted folder looks like the one below.

On the trusted peer, all filenames (full paths) are encrypted and formatted as Base32 before being sent off to the untrusted peer. For example, the file path wonnx/wonnx/Cargo.lock encrypts to 4ISDQJPKRK0GI2F23V1D4E32VQ8MQQNAN18RA1GU6SFEOAKB9VT93R8OALMM8 using the key ‘test’ and the folder ID ‘tommy’.

As these encrypted file names can become arbitrarily long, Syncthing therefore ‘chops them up’ and creates a folder structure on the untrusted peer side to prevent any issues with the file system’s file name limits. At the root of the encrypted folder are subdirectories that correspond to the first character in the encrypted file name. Our example file would hence end up in the folder 4.syncthing-enc. In this folder, there are subdirectories corresponding to the next two characters (in our example IS). Our example file is hence stored on the encrypted peer at the path 4.syncthing-enc/IS/DQJPKRK0GI2F23V1D4E32VQ8MQQNAN18RA1GU6SFEOAKB9VT93R8OALMM8 (Syncthing will generate more levels of directories for long file names). To read back a file name, you simply remove the .syncthing-enc string and all slashes from the full file path to regain the original Base32-encoded version of the file name.

Password tokens

When an untrusted peer synchronizes with more than one peer, all peers must use the same encryption key to synchronize with the untrusted peer. The issue is that the untrusted peer does not know the password, and therefore there should be a different way for it to verify that passwords used by peers wishing to synchronize with it are equal. To be able to do this, a ‘password token’ is generated and stored on the untrusted peer. This file contains the string "syncthing" + folderID (known to all peers), encrypted with the encryption key (only known to trusted peers). It is stored as a JSON file at .stfolder/syncthing-encryption_password_token on the untrusted side:

The untrusted peer cannot decrypt the password token, but all peers in the know of the key can generate the token themselves and then send it to the untrusted peer for comparison.

A good first exercise would this be to try and generate the password token ourselves and see if ours matches the stored one!

I decided to use JavaScript for this exercise as I am quite familiar with it and suspect it has all sorts of libraries available to do the required cryptographic operations. We are going to need the following:

scrypt. This is a method for generating fixed size (strong) keys (in our case 32 bytes long) from plaintext (possibly weak) passwords. It does so by performing many rounds of hashing on the plain text password and a nonce. As the hash is costly to compute, this makes it impractical for attackers to generate large dictionaries of scrypted passwords and perform brute force attacks on the key. Support for scrypt is included in NodeJS in the node:crypto package.
Base64. The password token in the file is serialized as base64, so we need a way to decode this. In NodeJS you can simply do this using Buffer.from("<some base64>", "base64").
Base32. This becomes relevant when we want to decode and encode file names, as Syncthing uses base32 for that as discussed above. I use the packages base32-decode and base32-encode for this purpose.
AES-SIV. This is the actual encryption algorithm used by syncthing for the password token and file names. It accepts a ciphertext and a 32 bit key (and a nonce, but it is not used here) and returns the plaintext (or vice versa). AES is well-known and supported, but AES-SIV is less known. Judging from the syncthing source code this appears to be an AEAD scheme based on AES and implemented in a Go library called miscreant. AEAD is a way to combine encryption and authentication, which is more secure than using plain AES. We are in luck as the miscreant package is also available for JavaScript. While it is a bit weird though that such a (seemingly) obscure algorithm is used, it does seem appropriate to use an AEAD type of scheme here.

AES-SIV is only used for the password token, file name encryption and hashing of file blocks. The actual file contents are encrypted using XChaCha20-Poly1305 with a key derived from the folder key combined with the file name (combined and using HKDF with SHA256 to generate a strong key), and using random nonces. Each file hence has its own encryption key. This would allow for implementing e.g. a file sharing feature in the future where the keys for individual files could be handed out while all other files still remain inaccessible.

After having installed all the required packages, I came up with the following code for generating the password token:

import { scrypt } from "node:crypto";
import base32Decode from "base32-decode";
import base32Encode from "base32-encode";
import * as miscreant from "miscreant";

const folderID = process.env.ST_FOLDER || "tommy";
const folderEncryptionPassword = process.env.ST_PASSWORD || "test";

const prefix = "syncthing";

scrypt(
  folderEncryptionPassword,
  prefix + folderID,
  32,
  { N: 32768, r: 8, p: 1, maxmem: 128 * 32768 * 8 * 2 },
  async (err, key) => {
    if (err) {
      console.error(err);
      return null;
    }
    console.log("Key", key.toString("hex"));

    const aead = await miscreant.AEAD.importKey(
      key,
      "AES-SIV",
      new miscreant.PolyfillCryptoProvider()
    );

    // Generate a password token (should be the same value as in the .stfolder/syncthing-encryption_password_token file)
    const passwordToken = await aead._siv.seal(
      Buffer.from(prefix + folderID, "utf8"),
      [new Uint8Array(0)]
    );
    console.log(
      "Generated password token:",
      Buffer.from(passwordToken).toString("base64")
    );
);

A few things were tricky here:

The scrypt call needs to use the exact same parameters as the syncthing code (these are not in the documentation but can be found in the code. Yay open source!). It took me a while to find out that NodeJS also needs the ‘maxmem’ parameter set (in Go the memory requirement is calculated automatically).
A nonce is not used when performing the encryption, as each peer needs to be able to calculate the same password token (and using random nonces would lead to a different token each time – these would all decrypt to the same value but then the encrypted peer, who cannot decrypt, would not be able to tell if these were all the same passwords!). The JS miscreant package does not support this directly however, so I had to work around this by using the aead._siv.seal call directly and passing an empty nonce buffer. It is very difficult to debug encryption and decryption when all you see are blobs of random bytes!

Decrypting file names

The genereated password token matches the one on my disk! This means that our key is correct, AEAD decryption works, and we should also be able to decrypt file names.

// const fileNameToDecode = "A.syncthing-enc/AB/CDEF...";
    const cleaned = fileNameToDecode.replace(/(\.syncthing-enc)|\//g, "");
    const fnDecoded = Buffer.from(base32Decode(cleaned, "RFC4648-HEX"));
    const res = await aead._siv.open(fnDecoded, [new Uint8Array(0)]);
    const fileNameDecoded = Buffer.from(res).toString("utf8");
    console.log("Decrypted file name:", fileNameDecoded);

This, too, works like a charm!

What would an attacker do?

At this point the security of the encrypted folders looked good to my (mostly untrained) eyes. Browsing through the Go code for syncthing (mostly this file containing all the encryption bits), I did not spot anything suspicious, the documentation for filename encryption mostly matches what is actually happening, and I am able to reproduce it fully independently. This is very comforting!

That said, I started wondering what it would take to actually insert a malicious backdoor in here that would be hard to spot, even for someone doing the same exercise as I did above. One particularly nasty way would be to change the code in such a way that it would generate the same key each time – such would actually be hard to notice for end users as encryption would actually still be happening (just with a key that is not very secret), and the actual encryption key (after scrypt) is never exposed to the user (even if it were, it would be difficult to notice that the same key is generated each time unless the user has different passwords protecting different folders and knows about the fact that the key should be different for each. See also the SSH keys that were generated in Debian Linux for a while and turned out to be highly predictable!).

In the syncthing codebase, the function KeyFromPassword is responsible for generating the key (it also performs some caching because as explained the scrypt operation is costly):

// KeyFromPassword uses key derivation to generate a stronger key from a
// probably weak password.
func (g *KeyGenerator) KeyFromPassword(folderID, password string) *[keySize]byte {
	cacheKey := folderKeyCacheKey{folderID, password}
	g.mut.Lock()
	defer g.mut.Unlock()
	if key, ok := g.folderKeys.Get(cacheKey); ok {
		return key
	}
	bs, err := scrypt.Key([]byte(password), knownBytes(folderID), 32768, 8, 1, keySize)
	if err != nil {
		panic("key derivation failure: " + err.Error())
	}
	if len(bs) != keySize {
		panic("key derivation failure: wrong number of bytes")
	}
	var key [keySize]byte
	copy(bs, key[:])
	g.folderKeys.Add(cacheKey, &key)
	return &key
}

Looks straightforward, right? Well, I inserted a backdoor in the code above as an exercise, can you spot it?

If you are not familiar with Go, this can be pretty hard to do, but the issue is in the following line:

 copy(bs, key[:])

Can you spot it now?

The copy function (whose documentation is actually quite hard to find, but it is a Go built-in) copies a buffer into another. The order of arguments is supposed to be (destination, source), but here however we are copying from the buffer we just initialized (key) into the buffer that contains the scrypt generated key (bs)! Go initializes buffers to zero, so this would mean the key buffer (which is returned and also stored to the cache) would always be 32 zero bytes…

The language itself makes it quite difficult to spot or even prevent the error I inserted above. In a language like Swift, this is solved using argument names (e.g. copy(from: someBuffer, to: someOtherBuffer)). Rust solves this by requiring you to specify which references are mutable (e.g. copy(&mut someBuffer, &otherBuffer) makes it immediately clear that otherBuffer is never written to and someBuffer might be). In NodeJS, you would use buffer.copy(target) which, again, may look quite ambiguous if you do not have the copy method’s documentation memorized or handy. A method name such as copyInto would have been a lot better.

While the syncthing code contains a test for the KeyFromPassword function (which asks to generate a key, and then verifies it against a hardcoded key), it would be a good idea to add a test that verifies that two calls to KeyFromPassword with different parameters actually generate two different keys. At this point it would of course be very difficult to change the KeyFromPassword behavior surreptitiously as the encrypted peer feature is in widespread use and any change leading to different encryption keys would be immediately noticed by users (not being able to sync or decrypt their files anymore).

Is this secure?

To answer the question whether the untrusted peer encryption mechanism of Syncthing is secure, one must first define the threat model. The untrusted peer functionality appears to be intended to maintain confidentiality of your data against anyone who is able to access the encrypted contents. Depending on your use case this can be some innocuous hacker or a nation state. The next question is whether the methodology and algorithm choices are sufficient to reach this goal. If so, the final question is whether the implementation is correct.

Not being a crypto expert I cannot confidently say that the cryptographical choices made here are sound. One thing that stood out for me was the fact that the password token, stored on the untrusted peer and with known plaintext, provides a nice target for off-line brute forcing attempts. (In addition to the password token, there may be files whose plaintext names are guessable, i.e. “.DS_Store” or “Thumbs.db”). However, as the algorithm has a 128 bit key size (at least), brute forcing as is requires an expected 2^127 guessing attempts. Unless a weakness in AES is discovered that makes key recovery with known plaintext substantially easier, this does not seem very feasible assuming classical computing. Nevertheless, quantum algorithms may make this easier. When the key is guessed, it is game over (note that the attacker does not need to know the password as the file content encryption starts from the same key).

As for the implementation, it all looks pretty straightforward. Note that absence of proof is not the same as proof of absence: the fact that no obvious vulnerability is found in the code does not mean it is fully secure. For all we know, the binary you use actually does contain the hypothetical copy backdoor, or has its own little weird implementation of copy at runtime, for instance. Additionally there are plenty opportunities in the underlying libraries such as miscreant, and there could be other places in the code that simply send off your key somewhere.

That said, a hypothetical vulnerability such as this one ideally should not lead to compromise on its own either. In the aviation industry this is called the Swiss cheese model and in cybersecurity defense in depth: when one defense falls, there should be others still in place. It is still a good idea to put other measures in place (such as a hardened server, encrypted file system, et cetera) to prevent anyone from accessing the encrypted files. (Syncthing itself obviously also employs encryption at the protocol level for in-transit data).

Final thoughts

All in all it is very satisfying to actually be able to understand and verify that software that should keep your private files secure is actually doing it, and in the right way. During this exercise I only managed to validate the encryption of the file names – the actual method for encrypting the file contents is a bit more complex, but at least we have verified the generation of the keys involved. An additional benefit of doing such an exercise is that encrypted folders can actually be decrypted in an emergency even without having the syncthing software itself available.