Concurrency Control Mechanism in Distributed DBMS - BunksAllowed

BunksAllowed is an effort to facilitate Self Learning process through the provision of quality tutorials.

Community

Concurrency Control Mechanism in Distributed DBMS

Share This

A Distributed Database System (DDBS) stores data across multiple interconnected sites. Since many users and applications may access and modify distributed data at the same time, concurrency control is required to maintain consistency, isolation, and serializability. Without it, problems such as lost updates, dirty reads, inconsistent replicas, and distributed deadlocks may occur.

Distributed database architecture diagram
Figure 1: Basic architecture of a distributed database system.

Concurrency control ensures that even when transactions execute simultaneously, the final result is equivalent to some correct serial execution. This is more difficult in distributed databases than in centralized systems because data may be fragmented or replicated across multiple locations, communication delay may occur, and a single global clock may not be available.

1. Lock-Based Concurrency Control

One of the most common methods is Distributed Two-Phase Locking (D2PL). In this approach, a transaction first acquires all required locks during the growing phase and then releases them during the shrinking phase. Once a transaction starts releasing locks, it cannot acquire new ones.

Two phase locking protocol diagram
Figure 2: Two-phase locking mechanism.

Distributed two-phase locking guarantees serializability, but it may increase communication overhead because lock requests often involve multiple sites. It can also lead to distributed deadlocks when transactions at different sites wait for one another indefinitely.

2. Timestamp Ordering Protocol

In timestamp ordering, each transaction is assigned a unique timestamp. This timestamp determines the logical order in which transactions should execute. Read and write operations are allowed only if they follow this order. If an operation violates the timestamp order, the transaction is rolled back and restarted.

This approach avoids deadlocks because transactions do not wait for locks. However, it may cause frequent rollbacks in systems with heavy conflict. In distributed systems, logical clocks such as Lamport timestamps are often used when a perfectly synchronized physical clock is not available.

3. Optimistic Concurrency Control

Optimistic Concurrency Control assumes that conflicts among transactions are rare. Transactions are allowed to execute without locking, and conflict checking is performed only before commit. If no conflict is found, the transaction commits successfully. If a conflict is detected, the transaction is rolled back.

This method works well in low-contention environments such as distributed analytical systems. However, in write-heavy applications, repeated rollbacks may reduce performance.

4. Multiversion Concurrency Control

Multiversion Concurrency Control (MVCC) maintains multiple versions of data items. Readers can access older committed versions while writers create newer versions. This allows read and write operations to proceed without blocking each other in many cases.

Multiversion concurrency control diagram
Figure 3: MVCC maintains multiple versions of data items.

MVCC is widely used in modern distributed databases because it improves performance for read-heavy workloads. However, it requires additional storage and careful version management.

5. Distributed Deadlock Handling

Deadlocks in distributed systems are harder to detect than in centralized systems because waiting relationships may span multiple sites. A transaction at one site may wait for another transaction at a different site, creating a cycle across the network.

Distributed deadlock wait-for graph
Figure 4: Distributed deadlock represented using a wait-for graph.

Common deadlock handling techniques include:

  • Deadlock prevention: avoids cyclic waiting using methods such as Wait-Die and Wound-Wait.
  • Deadlock detection: identifies cycles using local or global wait-for graphs.
  • Deadlock resolution: aborts or rolls back one or more transactions to break the cycle.

6. Replication and Quorum-Based Control

In replicated distributed databases, the same data item may exist at several sites. Concurrency control must ensure that all replicas remain consistent. One common method is the quorum-based protocol, where a transaction must obtain a sufficient number of read or write permissions before accessing the replicated item.

For example, if there are N replicas, read quorum R and write quorum W are usually chosen so that:

R + W > N and W > N/2

These conditions ensure that read and write operations overlap on at least one replica, helping maintain consistency.

7. Role of Commit Protocols

Concurrency control is closely related to distributed commit protocols. A distributed transaction may involve several sites, and all of them must either commit or abort together. The most common protocol is Two-Phase Commit (2PC), which has a prepare phase and a commit/abort phase.

Although 2PC ensures atomicity, it may block if the coordinator fails. Three-Phase Commit (3PC) reduces this blocking problem by adding an intermediate pre-commit phase.

Comparison of Major Concurrency Control Techniques

Mechanism Main Advantage Main Limitation Suitable Environment
Distributed Two-Phase Locking Ensures serializability May cause deadlocks High-conflict transactional systems
Timestamp Ordering Deadlock-free Higher rollback rate Systems requiring ordered execution
Optimistic Concurrency Control Low locking overhead Poor performance under high conflict Low-contention workloads
MVCC Readers and writers rarely block each other Extra storage required Read-heavy distributed databases

Concurrency control in distributed database systems is essential for maintaining correctness when multiple transactions operate at the same time across different sites. Techniques such as distributed two-phase locking, timestamp ordering, optimistic concurrency control, and multiversion concurrency control each offer different trade-offs between consistency, performance, scalability, and fault tolerance.

Modern distributed databases often use hybrid approaches that combine MVCC, distributed timestamps, and consensus protocols such as Paxos or Raft. The choice of concurrency control mechanism depends on workload characteristics, replication strategy, network latency, and the required level of consistency.




Happy Exploring!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.