Parallel package: Difference between revisions
No edit summary |
m (Remove redundant Category:Packages. Categories at bottom.) |
||
(8 intermediate revisions by 4 users not shown) | |||
Line 6: | Line 6: | ||
== multicore parallelization (parcellfun, pararrayfun) == | == multicore parallelization (parcellfun, pararrayfun) == | ||
See also the [[NDpar package]], for an extension of these functions to N-dimensional arrays | |||
=== calculation on a single array === | === calculation on a single array === | ||
Line 18: | Line 21: | ||
</pre> | </pre> | ||
}} | }} | ||
should output | should output | ||
<code> | <code><pre> | ||
<pre> | |||
parcellfun: 10/10 jobs done | parcellfun: 10/10 jobs done | ||
Line 27: | Line 30: | ||
1 4 9 16 25 36 49 64 81 100 | 1 4 9 16 25 36 49 64 81 100 | ||
</pre> | </pre></code> | ||
</code> | |||
{{Codeline|nproc}} returns the number of cpus available (number of cores or twice as much with hyperthreading). One can use {{Codeline|nproc - 1}} instead, in order to leave one cpu free for instance. | {{Codeline|nproc}} returns the number of cpus available (number of cores or twice as much with hyperthreading). One can use {{Codeline|nproc - 1}} instead, in order to leave one cpu free for instance. | ||
Line 48: | Line 50: | ||
should output | should output | ||
<code> | <code><pre> | ||
<pre> | |||
parcellfun: 4/4 jobs done | parcellfun: 4/4 jobs done | ||
vector_y = | vector_y = | ||
1 4 9 16 25 36 49 64 81 100 | 1 4 9 16 25 36 49 64 81 100 | ||
</pre></code> | |||
The {{Codeline|"ChunksPerProc"}} option is mandatory with {{Codeline|"Vectorized", true}}. {{Codeline|1}} means that each proc will do its job in one shot (chunk). This number can be increased to use less memory for instance. A higher number of {{Codeline|"ChunksPerProc"}} allows also more flexibility in case of long calculations on a busy machine. If one cpu has finished all its jobs, it can take over the pending jobs of another. | |||
=== Output in cell arrays === | |||
The following sample code was an answer to [http://stackoverflow.com/questions/27422219/for-every-row-reshape-and-calculate-eigenvectors-in-a-vectorized-way this question]. The goal was to diagonalize 2x2 matrices contained as rows of a 2d array (each row of the array being a flattened 2x2 matrix). | |||
{{code|diagonalize NxN matrices contained in an array| | |||
<pre> | |||
A = [0.6060168 0.8340029 0.0064574 0.7133187; | |||
0.6325375 0.0919912 0.5692567 0.7432627; | |||
0.8292699 0.5136958 0.4171895 0.2530783; | |||
0.7966113 0.1975865 0.6687064 0.3226548; | |||
0.0163615 0.2123476 0.9868179 0.1478827]; | |||
N = 2; | |||
[eigenvectors, eigenvalues] = pararrayfun(nproc, | |||
@(row_idx) eig(reshape(A(row_idx, :), N, N)), | |||
1:rows(A), "UniformOutput", false) | |||
</pre> | </pre> | ||
}} | |||
With {{codeline|"UniformOutput", false}}, the outputs are contained in cell arrays (one cell per slice). In the sample above, both {{codeline|eigenvectors}} and {{codeline|eigenvalues}} are {{codeline|1x5}} cell arrays. | |||
== cluster operation == | |||
Documentation can be found in the {{codeline|README.parallel}} or {{codeline|README.bw}} files, located inside the {{codeline|doc}} directory of the parallel package. | |||
[[Category:Octave Forge]] |
Revision as of 11:18, 10 June 2019
The Parallel execution package provides utilities to work with clusters, but also functions to parallelize work among cores of a single machine.
To install: pkg install -forge parallel
And then, once on each octave session, pkg load parallel
multicore parallelization (parcellfun, pararrayfun)
See also the NDpar package, for an extension of these functions to N-dimensional arrays
calculation on a single array
Code: simple |
# fun is the function to apply fun = @(x) x^2; vector_x = 1:10; vector_y = pararrayfun(nproc, fun, vector_x) |
should output
parcellfun: 10/10 jobs done
vector_y =
1 4 9 16 25 36 49 64 81 100
nproc
returns the number of cpus available (number of cores or twice as much with hyperthreading). One can use nproc - 1
instead, in order to leave one cpu free for instance.
fun
can be replaced by @myfun
if the function resides in the myfun.m
file.
In the previous example, the function was executed once for each element of the input vector_x
.
If the function is vectorized (can act on a vector and not just on scalar input), then it can be much more efficient to use the "Vectorized", true
option.
Code: vectorized |
# fun is the function to apply, vectorized (see the dot) fun = @(x) x.^2; vector_x = 1:10; vector_y = pararrayfun(nproc, fun, vector_x, "Vectorized", true, "ChunksPerProc", 1) |
should output
parcellfun: 4/4 jobs done
vector_y =
1 4 9 16 25 36 49 64 81 100
The "ChunksPerProc"
option is mandatory with "Vectorized", true
. 1
means that each proc will do its job in one shot (chunk). This number can be increased to use less memory for instance. A higher number of "ChunksPerProc"
allows also more flexibility in case of long calculations on a busy machine. If one cpu has finished all its jobs, it can take over the pending jobs of another.
Output in cell arrays
The following sample code was an answer to this question. The goal was to diagonalize 2x2 matrices contained as rows of a 2d array (each row of the array being a flattened 2x2 matrix).
Code: diagonalize NxN matrices contained in an array |
A = [0.6060168 0.8340029 0.0064574 0.7133187; 0.6325375 0.0919912 0.5692567 0.7432627; 0.8292699 0.5136958 0.4171895 0.2530783; 0.7966113 0.1975865 0.6687064 0.3226548; 0.0163615 0.2123476 0.9868179 0.1478827]; N = 2; [eigenvectors, eigenvalues] = pararrayfun(nproc, @(row_idx) eig(reshape(A(row_idx, :), N, N)), 1:rows(A), "UniformOutput", false) |
With "UniformOutput", false
, the outputs are contained in cell arrays (one cell per slice). In the sample above, both eigenvectors
and eigenvalues
are 1x5
cell arrays.
cluster operation
Documentation can be found in the README.parallel
or README.bw
files, located inside the doc
directory of the parallel package.