-- Feb 16 In-Class Exercise
The following rewrite of the algorithm MAT-VEC (and the helper functions) uses spawn/sync to parallelize the loops, including the inner for loop on j:
MAT-VEC(A, x):
n = A.rows
let y be a new vector of length n
MAT-VEC-FIRST-LOOP(y, 1, n)
MAT-VEC-SECOND-LOOP(A, x, y, n, 1, n)
return y
MAT-VEC-FIRST-LOOP(y, i, i'):
if i == i':
y[i] = 0
else:
mid = floor((i + i')/2)
first = spawn MAT-VEC-FIRST-LOOP(y, i, mid)
second = MAT-VEC-FIRST-LOOP(y, mid + 1, i')
sync
return first + second
MAT-VEC-SECOND-LOOP(A, x, y, n, i, i'):
if i == i':
MAT-VEC-THIRD-LOOP(A, x, y, i, 1, n)
else:
mid = floor((i + i')/2)
spawn MAT-VEC-SECOND-LOOP(A, x, y, n, i, mid)
MAT-VEC-SECOND-LOOP(A, x, y, n, mid + 1, i')
sync
MAT-VEC-THIRD-LOOP(A, x, y, i, j, j'):
if j == j':
y[i] = y[i] + a[i][j] * x[j]
else:
mid = floor((j + j')/2)
first = spawn MAT-VEC-THIRD-LOOP(A, x, y, i, j, mid)
second = MAT-VEC-THIRD-LOOP(A, x, y, i, mid + 1, j')
sync
return first + second
T_1 = theta(n^2) since we have doubly nested loops
span(MAT-VEC-FIRST-LOOP) = theta(log n) since each iteration is theta(1)
span(MAT-VEC-THIRD-LOOP) = theta(log n) since each iteration is theta(1)
span(MAT-VEC-SECOND-LOOP) = theta(log n + log n) = theta(2 log n) since each iteration is theta(log n)
T_inf = theta(log n) + max(span(MAT-VEC-FIRST-LOOP)) + theta(log n) + max(span(MAT-VEC-SECOND-LOOP)) = theta(log n) + theta(log n) + theta(log n) + theta(2 log n) = theta(log n) dominates
T_1/T_inf = theta(n^2)/theta(log n) = theta(n^2/(log n))
The added complexity is not worth it since the parallelism grows exponentially, indicating that given a large fixed number of processors, the algorithm will exhibit nearly perfect linear speedup which is not significant enough at larger values.
(
Edited: 2022-02-16)
The following rewrite of the algorithm MAT-VEC (and the helper functions) uses spawn/sync to parallelize the loops, including the inner for loop on j:
MAT-VEC(A, x):
n = A.rows
let y be a new vector of length n
MAT-VEC-FIRST-LOOP(y, 1, n)
MAT-VEC-SECOND-LOOP(A, x, y, n, 1, n)
return y
MAT-VEC-FIRST-LOOP(y, i, i'):
if i == i':
y[i] = 0
else:
mid = floor((i + i')/2)
first = spawn MAT-VEC-FIRST-LOOP(y, i, mid)
second = MAT-VEC-FIRST-LOOP(y, mid + 1, i')
sync
return first + second
MAT-VEC-SECOND-LOOP(A, x, y, n, i, i'):
if i == i':
MAT-VEC-THIRD-LOOP(A, x, y, i, 1, n)
else:
mid = floor((i + i')/2)
spawn MAT-VEC-SECOND-LOOP(A, x, y, n, i, mid)
MAT-VEC-SECOND-LOOP(A, x, y, n, mid + 1, i')
sync
MAT-VEC-THIRD-LOOP(A, x, y, i, j, j'):
if j == j':
y[i] = y[i] + a[i][j] * x[j]
else:
mid = floor((j + j')/2)
first = spawn MAT-VEC-THIRD-LOOP(A, x, y, i, j, mid)
second = MAT-VEC-THIRD-LOOP(A, x, y, i, mid + 1, j')
sync
return first + second
T_1 = theta(n^2) since we have doubly nested loops
span(MAT-VEC-FIRST-LOOP) = theta(log n) since each iteration is theta(1)
span(MAT-VEC-THIRD-LOOP) = theta(log n) since each iteration is theta(1)
span(MAT-VEC-SECOND-LOOP) = theta(log n + log n) = theta(2 log n) since each iteration is theta(log n)
T_inf = theta(log n) + max(span(MAT-VEC-FIRST-LOOP)) + theta(log n) + max(span(MAT-VEC-SECOND-LOOP)) = theta(log n) + theta(log n) + theta(log n) + theta(2 log n) = theta(log n) dominates
T_1/T_inf = theta(n^2)/theta(log n) = theta(n^2/(log n))
The added complexity is not worth it since the parallelism grows exponentially, indicating that given a large fixed number of processors, the algorithm will exhibit nearly perfect linear speedup which is not significant enough at larger values.