Parallel package: Difference between revisions
No edit summary |
|||
Line 6: | Line 6: | ||
== multicore parallelization (parcellfun, pararrayfun) == | == multicore parallelization (parcellfun, pararrayfun) == | ||
See also the [[NDpar]] package, for an extension of these functions to N-dimensional arrays | |||
=== calculation on a single array === | === calculation on a single array === |
Revision as of 08:40, 21 September 2014
The Parallel execution package provides utilities to work with clusters, but also functions to parallelize work among cores of a single machine.
To install: pkg install -forge parallel
And then, once on each octave session, pkg load parallel
multicore parallelization (parcellfun, pararrayfun)
See also the NDpar package, for an extension of these functions to N-dimensional arrays
calculation on a single array
Code: simple |
# fun is the function to apply fun = @(x) x^2; vector_x = 1:10; vector_y = pararrayfun(nproc, fun, vector_x) |
should output
parcellfun: 10/10 jobs done
vector_y =
1 4 9 16 25 36 49 64 81 100
nproc
returns the number of cpus available (number of cores or twice as much with hyperthreading). One can use nproc - 1
instead, in order to leave one cpu free for instance.
fun
can be replaced by @myfun
if the function resides in the myfun.m
file.
In the previous example, the function was executed once for each element of the input vector_x
.
If the function is vectorized (can act on a vector and not just on scalar input), then it can be much more efficient to use the "Vectorized", true
option.
Code: vectorized |
# fun is the function to apply, vectorized (see the dot) fun = @(x) x.^2; vector_x = 1:10; vector_y = pararrayfun(nproc, fun, vector_x, "Vectorized", true, "ChunksPerProc", 1) |
should output
parcellfun: 4/4 jobs done
vector_y =
1 4 9 16 25 36 49 64 81 100
The
"ChunksPerProc"
option is mandatory with "Vectorized", true
. 1
means that each proc will do its job in one shot (chunk). This number can be increased to use less memory for instance. A higher number of "ChunksPerProc"
allows also more flexibility in case of long calculations on a busy machine. If one cpu has finished all its jobs, it can take over the pending jobs of another.