How Distributed Query Processing Differs from Centralized Systems - BunksAllowed

BunksAllowed is an effort to facilitate Self Learning process through the provision of quality tutorials.

Community

How Distributed Query Processing Differs from Centralized Systems

Share This

Query processing is the procedure by which a database system interprets and executes an SQL query. While the basic goal remains the same—retrieve correct results efficiently—the approach differs significantly between centralized and distributed database systems.

Key Idea: Centralized systems process queries at one location, while distributed systems must coordinate across multiple locations.

1. What is Query Processing?

Query processing involves:

  • Parsing the SQL query
  • Optimizing the query
  • Executing the query plan

In centralized systems, all these steps happen within a single server. In distributed systems, they are spread across multiple nodes.


2. Centralized Query Processing

In a centralized database system:

  • All data is stored in one location
  • Query execution happens on a single server
  • No data transfer across network is required

Example

SELECT name FROM Customer WHERE city = 'Delhi';

The system simply scans the local table and returns results.

Advantage: Simpler and faster for small-scale systems.

3. Distributed Query Processing

In distributed databases:

  • Data is stored across multiple sites
  • Queries may involve data from different locations
  • System must coordinate execution across nodes

Example

SELECT c.name
FROM Customer c, Account a
WHERE c.id = a.cid;

If:

  • Customer → Site 1
  • Account → Site 2

The system must decide:

  • Where to perform the join
  • Which data to move
  • How to minimize communication cost
Key Challenge: Communication cost dominates performance in distributed systems.

4. Major Differences

Aspect Centralized System Distributed System
Data Location Single site Multiple sites
Execution Single server Multiple nodes
Communication None Required
Optimization Simpler Complex
Performance Factor CPU & I/O CPU, I/O & Network

5. Key Steps in Distributed Query Processing

1. Query Decomposition

Break the query into smaller subqueries.

2. Data Localization

Determine where data is stored.

3. Global Optimization

Find the best strategy for executing the query across sites.

4. Local Optimization

Each site optimizes its part of the query.

5. Execution

Results are combined to produce the final output.


6. Data Shipping vs Function Shipping

Data Shipping

Move data to the site where computation is performed.

Function Shipping

Move computation to the site where data is stored.

Optimization Rule: Send smaller data or move computation to reduce network cost.

7. Example Comparison

Centralized

JOIN(Customer, Account)

All data is already available locally.

Distributed

Option 1: Move Customer to Site 2
Option 2: Move Account to Site 1
Option 3: Process partially at both sites

The optimizer selects the best option based on cost.


8. Challenges Unique to Distributed Systems

  • Network latency
  • Data consistency
  • Site failures
  • Heterogeneous databases

9. Advantages of Distributed Query Processing

  • Parallel execution
  • Improved performance
  • Scalability
  • Fault tolerance

10. Real-World Insight

Modern systems like cloud databases use distributed query processing to handle large-scale data efficiently.

They rely on:

  • Advanced query optimizers
  • Parallel processing
  • Efficient data distribution strategies

Conclusion

Distributed query processing is fundamentally different from centralized processing due to the involvement of multiple sites and network communication.

While centralized systems are simpler, distributed systems offer scalability and performance advantages.

Understanding these differences is essential for designing modern, efficient database systems.



Happy Exploring!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.