Requirements
- Store user files durably and sync them across devices.
- Upload efficiently and avoid resending unchanged data.
- Handle concurrent edits and offline changes.
High level design
Files are split into chunks, deduplicated, stored in object storage, and tracked by a metadata service that drives sync.
- Chunking: a file is divided into blocks, each hashed so identical blocks store once.
- Metadata service: tracks files, versions, and which chunks compose each file.
- Sync: clients compare local and server metadata and transfer only changed chunks.
Bottlenecks
- Bandwidth: resending whole files is wasteful, so dedup by chunk hash and upload only new blocks.
- Sync latency: clients should learn of changes fast, so a notification service pushes updates rather than polling.
- Conflicts: two devices editing offline diverge, so version each change and keep conflicting copies rather than silently overwriting.
A small metadata diff drives sync, while the heavy block data flows directly to and from object storage.
Key idea
A cloud storage system chunks and deduplicates files into a block store with a metadata service that syncs only changed chunks and resolves conflicts safely.