litchie.com
May 18, 2015

A Quest for Syncable Private Online Storage

It's necessary for apps to sync data, either documents or preferences, among our devices. Syncable means that modifications made on one device must be transferred to other devices swiftly. Private means that data must be encrypted with a user provided key before upload to server, so that neither cloud provider nor app developer can look inside your documents.

These requirements seem to contradict each other. Encrypted data is extremely expensive to sync. Even if you just change one byte, newly encrypted data will be entirely different, therefore a full syncing will have to copy every byte. However, if we design the storage file to be append-only, and use a stream cipher instead, then our goal can be met.

An append-only file is opened for reading and writing, only that the writing always happens at the end of file. For C programmers, such a file is opened in this way:

fopen("datafile", "a+");

Since it's append only, it's easy to sync by comparing file size and only downloading the missing data at the end, and another benefit is that your data will never get corrupted. When things go wrong, we can simply revert to earlier versions. Stream cipher encrypts data on the fly as they are being appended. There is no need to re-encrypt whole file from the start. Effectively we also have an encryptable version control storage.

The problem of append only data storage is that, unlike usual database systems, we need to build an external index file for fast queries. The external index file has to be built on first time load, and always be updated whenever there is new data coming in. It can also be encrypted so even if other people get access to your device, your index file is still safe.

Normally a remote server is required to help devices sync with each other. The server only has meta information, for example, size of the data file, timestamps of updates from clients. It can do some basic conflict resolution. When a client tries to push or append new data, the server requires the client to provide its local head position (same as file size) and checksum. If client head position is not equal to the head at server, then the server will reject updates from the client. The client should catch up with the server head position first, by downloading missing data and doing conflict resolution locally. Since the server has no idea of the contents, keeping the content in proper status is at the discretion of all clients. A badly behaving client could post garbage data to the server. Even in such case, we can still revert data to earlier versions, and revoke access permission for those bad clients if necessary.

Like git, clients have full copy of data. Therefore they can switch to another remote storage provider at will.

Data syncing can also be peer-to-peer. Two different local storages can negotiate a common head position by exchanging checksums of different portions of data, and then try to merge their differences after that.

Going forward, this is how app developers should protect private data of users, and this is how we can completely close any possible backdoors to user data, yet still provide convenience of fast syncing.

Comment
Mar 20, 2015

Trusted Cloud Computing

We all know non-public data on the cloud servers should be encrypted. What if data has to be processed right on the servers? Data processing programs need to know about the encryption key, however, we must only hand over the key to programs that we can trust. Trusted programs are those we can build from source, that means that we can embed one-time-use secrets to them. Every time when we want to run a program on server, a different executable copy with different secrets is uploaded to server, and server should launch it as soon as possible. The running program has to answer questions correctly, and shortly (to protect against secrets being reverse engineered), before we can send over the data access key. A trusted program must keep the key only in memory, never write the key to disk, and should hide or destroy the key after use. If it's restarted, it will have to ask for the key again, then we will know something is wrong. Open source programs are easier to be reverse engineered, therefore we must add secrets to it in an obfuscated way to make sure secrets can not be revealed by an attacker in a short time. Depending on security measures, the access key must be invalidated or the data should be removed after a certain period.

Comment
Feb 10, 2015

Raspberry Pi 2 is a game changer

Since RPi is out, I have bought several RPi B and B+. They are very useful as testing server. I also developed kiosk type commercial applications using RPi, mainly for information display. These Pis are usually attached to TV and online 24 hours everyday. They are very robust, I haven't received complaints since they were shipped and installed. However, on the other hand they aren't powerful enough, so I don't think they are of any good for consumers.

Now the announcement of RPi 2 just changes everything:

If those numbers don't mean anything to you, then remember that this is a 35$ computer that have the same computing power of 4 iPhone4. You can't imagine what this tiny computer is capable of.

The first thing I should do is to port Aemula (486 emulator) to RPi2. Then maybe try out any other interesting ideas on it. If you have any thoughts, or that you need some custom applications based on RPi2, please let me know.

Comment
Feb 07, 2015

Introducing BtStamp

BtStamp is an app that can timestamp important documents using bitcoin blockchain, and it's now available in AppStore.

How It Works

BtStamp pushes a SHA256 digest of your document into the bitcoin block chain, therefore creating a proof that the document exists at the time it enters the block chain.

Useful in these cases:

Secure and Private

BtStamp calculates SHA256 hash locally: the actual document will never leave your device. The proof is kept in the block chain permanently as a transaction. Even if BtStamp service is down or the app is unavailable, you can still search a registered document's digest on well known bitcoin websites, locate the transaction and find out when it entered the block chain.

The timestamp is done anonymously. No email address required.

"Prove It!"

When the time comes that you need to prove, you must be able to produce the original document. Then calculate its SHA256 digest with a third party tool, search the first 40 hexidecimal characters of that digest on a bitcoin website (for example blockchain.info), then you will find out the transaction, and the complete SHA256 digest in the output script of that transaction. The search can be avoided if you keep the transaction id in some place, but you do need to provide the original document and its digest.

Comment
Jan 30, 2015

On App Building Solutions

Facebook's answer to cross platform app development:

A Deep Dive on React Native

I dragged through the whole presentation, still some good ideas caught my attention.

  1. Make use of platform tool kit instead of emulating them with HTML5.
  2. Use a custom layout engine to build the view tree.
  3. Incremental tree rebuild.

Happy to see the industry looking for more pleasant ways of building apps. I have also been thinking on the problem for quite a while.

My wish list for an ultimate solution (at least for the next 30 years):

  1. App can be inspected and changed on the fly, no need to relaunch for non-primitive modifications. Think of how sculptors work, or even better, gardeners.
  2. Built-in revision control. git still feels too heavy.
  3. One source and can be adapted to different platforms. Think of how we use CSS prefixes for different platforms. Not pretty, but useful.
  4. Auto recalculation, connection between state and display elements should be seemless. Think of Excel.
  5. Text editors are no longer our primary tools.
  6. Stay close to metal.

One more thing, javascript won't be fundamental in my solution, because it's just not simple enough.

Comment
ARCHIVE | RSS