A Distributed Database System (DDBS) stores data across multiple interconnected sites. Since many users and applications may access and modify distributed data at the same time, concurrency control is required to maintain consistency, isolation, and serializability. Without it, problems such as lost updates, dirty reads, inconsistent replicas, and distributed deadlocks may occur.
Concurrency control ensures that even when transactions execute simultaneously, the final result is equivalent to some correct serial execution. This is more difficult in distributed databases than in centralized systems because data may be fragmented or replicated across multiple locations, communication delay may occur, and a single global clock may not be available.
1. Lock-Based Concurrency Control
One of the most common methods is Distributed Two-Phase Locking (D2PL). In this approach, a transaction first acquires all required locks during the growing phase and then releases them during the shrinking phase. Once a transaction starts releasing locks, it cannot acquire new ones.
Distributed two-phase locking guarantees serializability, but it may increase communication overhead because lock requests often involve multiple sites. It can also lead to distributed deadlocks when transactions at different sites wait for one another indefinitely.
2. Timestamp Ordering Protocol
In timestamp ordering, each transaction is assigned a unique timestamp. This timestamp determines the logical order in which transactions should execute. Read and write operations are allowed only if they follow this order. If an operation violates the timestamp order, the transaction is rolled back and restarted.
This approach avoids deadlocks because transactions do not wait for locks. However, it may cause frequent rollbacks in systems with heavy conflict. In distributed systems, logical clocks such as Lamport timestamps are often used when a perfectly synchronized physical clock is not available.
3. Optimistic Concurrency Control
Optimistic Concurrency Control assumes that conflicts among transactions are rare. Transactions are allowed to execute without locking, and conflict checking is performed only before commit. If no conflict is found, the transaction commits successfully. If a conflict is detected, the transaction is rolled back.
This method works well in low-contention environments such as distributed analytical systems. However, in write-heavy applications, repeated rollbacks may reduce performance.
4. Multiversion Concurrency Control
Multiversion Concurrency Control (MVCC) maintains multiple versions of data items. Readers can access older committed versions while writers create newer versions. This allows read and write operations to proceed without blocking each other in many cases.
MVCC is widely used in modern distributed databases because it improves performance for read-heavy workloads. However, it requires additional storage and careful version management.
5. Distributed Deadlock Handling
Deadlocks in distributed systems are harder to detect than in centralized systems because waiting relationships may span multiple sites. A transaction at one site may wait for another transaction at a different site, creating a cycle across the network.
Common deadlock handling techniques include:
- Deadlock prevention: avoids cyclic waiting using methods such as Wait-Die and Wound-Wait.
- Deadlock detection: identifies cycles using local or global wait-for graphs.
- Deadlock resolution: aborts or rolls back one or more transactions to break the cycle.
6. Replication and Quorum-Based Control
In replicated distributed databases, the same data item may exist at several sites. Concurrency control must ensure that all replicas remain consistent. One common method is the quorum-based protocol, where a transaction must obtain a sufficient number of read or write permissions before accessing the replicated item.
For example, if there are N replicas, read quorum R and write quorum W are usually chosen so that:
R + W > N and W > N/2
These conditions ensure that read and write operations overlap on at least one replica, helping maintain consistency.
7. Role of Commit Protocols
Concurrency control is closely related to distributed commit protocols. A distributed transaction may involve several sites, and all of them must either commit or abort together. The most common protocol is Two-Phase Commit (2PC), which has a prepare phase and a commit/abort phase.
Although 2PC ensures atomicity, it may block if the coordinator fails. Three-Phase Commit (3PC) reduces this blocking problem by adding an intermediate pre-commit phase.
Comparison of Major Concurrency Control Techniques
| Mechanism | Main Advantage | Main Limitation | Suitable Environment |
|---|---|---|---|
| Distributed Two-Phase Locking | Ensures serializability | May cause deadlocks | High-conflict transactional systems |
| Timestamp Ordering | Deadlock-free | Higher rollback rate | Systems requiring ordered execution |
| Optimistic Concurrency Control | Low locking overhead | Poor performance under high conflict | Low-contention workloads |
| MVCC | Readers and writers rarely block each other | Extra storage required | Read-heavy distributed databases |
Concurrency control in distributed database systems is essential for maintaining correctness when multiple transactions operate at the same time across different sites. Techniques such as distributed two-phase locking, timestamp ordering, optimistic concurrency control, and multiversion concurrency control each offer different trade-offs between consistency, performance, scalability, and fault tolerance.
Modern distributed databases often use hybrid approaches that combine MVCC, distributed timestamps, and consensus protocols such as Paxos or Raft. The choice of concurrency control mechanism depends on workload characteristics, replication strategy, network latency, and the required level of consistency.

No comments:
Post a Comment
Note: Only a member of this blog may post a comment.