A meeting point for nodes
A barrier is a rule that says no node may proceed past a certain point until every participant has reached it. In a single machine you use a counter and a wait. Across a network you need a shared store that all nodes can see.
How it works
- Each node, on arrival, registers itself in a shared location.
- It then waits until the count of arrivals equals the expected total.
- When the last node arrives, the count is met and everyone is released together.
This is useful in phased computations. Phase two cannot safely begin until every node finishes phase one, otherwise some nodes read stale results.
The release
The tricky part is the release. All waiting nodes must learn the barrier opened at roughly the same time. A coordination service does this by letting nodes watch the arrival count and notifying them when it reaches the target.
Key idea
A distributed barrier uses a shared count so that no node moves to the next phase until every participant has arrived, then releases them together.