Data Shipping vs Function Shipping: Which is Better? - BunksAllowed

BunksAllowed is an effort to facilitate Self Learning process through the provision of quality tutorials.

Community

Data Shipping vs Function Shipping: Which is Better?

Share This

In distributed database systems, data is stored across multiple locations. When executing a query, the system must decide how to process data efficiently across these locations.

Two important strategies used in distributed query processing are:

  • Data Shipping
  • Function Shipping
Core Question: Should we move data to computation, or move computation to data?

1. What is Data Shipping?

In Data Shipping, the required data is transferred from one site to another where the computation (query processing) will take place.

Example

If Table A is at Site 1 and Table B is at Site 2:

Step 1: Transfer Table A to Site 2
Step 2: Perform JOIN at Site 2
Definition: Move data → process at one location.

Advantages

  • Simple to implement
  • Centralized processing logic

Disadvantages

  • High communication cost
  • Inefficient for large datasets

2. What is Function Shipping?

In Function Shipping, instead of moving data, the query or computation is sent to the site where the data resides.

Example

Step 1: Send query to Site 1
Step 2: Process data locally at Site 1
Step 3: Send result back
Definition: Move computation → process locally.

Advantages

  • Reduces data transfer
  • Efficient for large datasets

Disadvantages

  • Requires processing capability at remote sites
  • More complex coordination

3. Comparison Table

Feature Data Shipping Function Shipping
Approach Move data Move computation
Network Cost High Low
Performance Slower for large data Faster for large data
Complexity Simple Complex
Best Use Small datasets Large datasets

4. Example Scenario

Query:

SELECT c.name
FROM Customer c, Account a
WHERE c.id = a.cid;

Assume:

  • Customer → Site 1
  • Account → Site 2

Option 1: Data Shipping

Move Customer to Site 2
Perform JOIN at Site 2

Option 2: Function Shipping

Send JOIN operation to Site 1
Process Customer locally
Send partial results
Better Option: Depends on data size and network cost.

5. When to Use Data Shipping?

  • When data size is small
  • When computation is complex
  • When remote sites have limited processing power

6. When to Use Function Shipping?

  • When data size is large
  • When network bandwidth is limited
  • When remote sites can process queries efficiently

7. Hybrid Approach

In practice, most systems use a hybrid approach:

  • Some data is moved
  • Some computation is moved
Goal: Minimize total cost by combining both strategies.

8. Real-World Insight

Modern distributed systems dynamically decide:

  • Whether to ship data or functions
  • Based on cost estimation

This decision is part of the query optimizer.


Conclusion

Both data shipping and function shipping are essential strategies in distributed query processing.

There is no one-size-fits-all solution:

  • Data shipping is simple but costly for large data
  • Function shipping is efficient but more complex

The best approach depends on:

  • Data size
  • Network conditions
  • System capabilities

Understanding these concepts helps in designing efficient and scalable distributed database systems.



Happy Exploring!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.