Query processing is the procedure by which a database system interprets and executes an SQL query. While the basic goal remains the same—retrieve correct results efficiently—the approach differs significantly between centralized and distributed database systems.
1. What is Query Processing?
Query processing involves:
- Parsing the SQL query
- Optimizing the query
- Executing the query plan
In centralized systems, all these steps happen within a single server. In distributed systems, they are spread across multiple nodes.
2. Centralized Query Processing
In a centralized database system:
- All data is stored in one location
- Query execution happens on a single server
- No data transfer across network is required
Example
SELECT name FROM Customer WHERE city = 'Delhi';
The system simply scans the local table and returns results.
3. Distributed Query Processing
In distributed databases:
- Data is stored across multiple sites
- Queries may involve data from different locations
- System must coordinate execution across nodes
Example
SELECT c.name FROM Customer c, Account a WHERE c.id = a.cid;
If:
- Customer → Site 1
- Account → Site 2
The system must decide:
- Where to perform the join
- Which data to move
- How to minimize communication cost
4. Major Differences
| Aspect | Centralized System | Distributed System |
|---|---|---|
| Data Location | Single site | Multiple sites |
| Execution | Single server | Multiple nodes |
| Communication | None | Required |
| Optimization | Simpler | Complex |
| Performance Factor | CPU & I/O | CPU, I/O & Network |
5. Key Steps in Distributed Query Processing
1. Query Decomposition
Break the query into smaller subqueries.
2. Data Localization
Determine where data is stored.
3. Global Optimization
Find the best strategy for executing the query across sites.
4. Local Optimization
Each site optimizes its part of the query.
5. Execution
Results are combined to produce the final output.
6. Data Shipping vs Function Shipping
Data Shipping
Move data to the site where computation is performed.
Function Shipping
Move computation to the site where data is stored.
7. Example Comparison
Centralized
JOIN(Customer, Account)
All data is already available locally.
Distributed
Option 1: Move Customer to Site 2 Option 2: Move Account to Site 1 Option 3: Process partially at both sites
The optimizer selects the best option based on cost.
8. Challenges Unique to Distributed Systems
- Network latency
- Data consistency
- Site failures
- Heterogeneous databases
9. Advantages of Distributed Query Processing
- Parallel execution
- Improved performance
- Scalability
- Fault tolerance
10. Real-World Insight
Modern systems like cloud databases use distributed query processing to handle large-scale data efficiently.
They rely on:
- Advanced query optimizers
- Parallel processing
- Efficient data distribution strategies
Conclusion
Distributed query processing is fundamentally different from centralized processing due to the involvement of multiple sites and network communication.
While centralized systems are simpler, distributed systems offer scalability and performance advantages.
Understanding these differences is essential for designing modern, efficient database systems.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.