[ Prev ]
2020-05-03

-- Apr 29 In-Class Exercise Thread
Naïve: S sends whole table (c2, d1) (c5, d2) (c2, d3) (c6, d4) R computes the join (a2, b2, c2, d1) (a2, b2, c2, d3)
Semi-join S sends column C (c2, c5, c2, c6) R computes R⋉S (a2, b2, c2) and send back to S S computes R(X,Y)⋈S(Y,Z) (a2, b2, c2, d1) , (a2, b2, c2, d3)
The semi-join approach sends 7 entries and the Naïve approach sends 8 entries. The semi-join approach is more efficient than the Naïve approach.
Naïve: S sends whole table (c2, d1) (c5, d2) (c2, d3) (c6, d4) R computes the join (a2, b2, c2, d1) (a2, b2, c2, d3) Semi-join S sends column C (c2, c5, c2, c6) R computes R⋉S (a2, b2, c2) and send back to S S computes R(X,Y)⋈S(Y,Z) (a2, b2, c2, d1) , (a2, b2, c2, d3) The semi-join approach sends 7 entries and the Naïve approach sends 8 entries. The semi-join approach is more efficient than the Naïve approach.

-- Apr 29 In-Class Exercise Thread
 Semi-joins:
 Machine with R computes the values of C and sends it to machine with S, sends {c1, c2, c3, c4}
 Machine with S computes join of S with these values and sends it to machine with R, sends{ {c2, d1}, {c2, d3} }
 Machine with R uses these values to compute S natural join R, final answer is { {a2, b2, c2, d1}, {a2, b2, c2, d3} }
 Send all of S:
 Machine with S sends { {c2, d1}, {c5, d2}, {c2, c3}, {c6, d4} } to machine with R
 Machine with R computes R natural join S which gives { {a2, b2, c2, d1}, {a2, b2, c2, d3} }
 In this case, both methods send the same amount of data (8 pieces)
Semi-joins: Machine with R computes the values of C and sends it to machine with S, sends {c1, c2, c3, c4} Machine with S computes join of S with these values and sends it to machine with R, sends{ {c2, d1}, {c2, d3} } Machine with R uses these values to compute S natural join R, final answer is { {a2, b2, c2, d1}, {a2, b2, c2, d3} } Send all of S: Machine with S sends { {c2, d1}, {c5, d2}, {c2, c3}, {c6, d4} } to machine with R Machine with R computes R natural join S which gives { {a2, b2, c2, d1}, {a2, b2, c2, d3} } In this case, both methods send the same amount of data (8 pieces)
2020-05-04

-- Apr 29 In-Class Exercise Thread
Semi-Join:
1. S sends over column C with 4 entries
2. R semi-join S {[a2, b2, c2]} is computed and sent to S
3. S gets the result and uses it to compute the natural join {[a2, b2, c2, d1], [a2, b2, c2, d3]}
Naive:
1. Since S is smaller, send a copy of S table to R {(c2,d1), (c5, d2), (c2, d3), (c6, d4)}
2. R computes join and gets {(a2, b2, c2, d1), (a2, b2, c2, d3)}
The semi-join method sends 7 total entries and communicates twice, while the naive approach sends 8 total entries but communicates once.
(Edited: 2020-05-04)
Semi-Join: 1. S sends over column C with 4 entries 2. R semi-join S {[a2, b2, c2]} is computed and sent to S 3. S gets the result and uses it to compute the natural join {[a2, b2, c2, d1], [a2, b2, c2, d3]} Naive: 1. Since S is smaller, send a copy of S table to R {(c2,d1), (c5, d2), (c2, d3), (c6, d4)} 2. R computes join and gets {(a2, b2, c2, d1), (a2, b2, c2, d3)} The semi-join method sends 7 total entries and communicates twice, while the naive approach sends 8 total entries but communicates once.

-- Apr 29 In-Class Exercise Thread
Distributed Setting Join:
  1. The machine with R computes the projection then sends it to the machine with S (smaller table)
  2. The machine with S computes the semi-join then sends it to the machine with R
  3. The machine with R uses it to compute the full join
Sent All S:
  1. The machine with S (smaller table) sends table to the machine with R
  2. The machine with R uses it to compute the whole join
The semi-join approach sends less data between the machines.
Distributed Setting Join: # The machine with R computes the projection then sends it to the machine with S (smaller table) # The machine with S computes the semi-join then sends it to the machine with R # The machine with R uses it to compute the full join Sent All S: # The machine with S (smaller table) sends table to the machine with R # The machine with R uses it to compute the whole join The semi-join approach sends less data between the machines.
X