Difference between revisions of "Parallel package"
m (Remove redundant Category:Packages. Categories at bottom.) |
|||
(7 intermediate revisions by 4 users not shown) | |||
Line 8: | Line 8: | ||
− | See also the [[NDpar]] | + | See also the [[NDpar package]], for an extension of these functions to N-dimensional arrays |
=== calculation on a single array === | === calculation on a single array === | ||
Line 21: | Line 21: | ||
</pre> | </pre> | ||
}} | }} | ||
+ | |||
should output | should output | ||
− | <code> | + | <code><pre> |
− | <pre> | ||
parcellfun: 10/10 jobs done | parcellfun: 10/10 jobs done | ||
Line 30: | Line 30: | ||
1 4 9 16 25 36 49 64 81 100 | 1 4 9 16 25 36 49 64 81 100 | ||
− | </pre> | + | </pre></code> |
− | </code> | ||
{{Codeline|nproc}} returns the number of cpus available (number of cores or twice as much with hyperthreading). One can use {{Codeline|nproc - 1}} instead, in order to leave one cpu free for instance. | {{Codeline|nproc}} returns the number of cpus available (number of cores or twice as much with hyperthreading). One can use {{Codeline|nproc - 1}} instead, in order to leave one cpu free for instance. | ||
Line 51: | Line 50: | ||
should output | should output | ||
− | <code> | + | <code><pre> |
− | <pre> | ||
parcellfun: 4/4 jobs done | parcellfun: 4/4 jobs done | ||
vector_y = | vector_y = | ||
1 4 9 16 25 36 49 64 81 100 | 1 4 9 16 25 36 49 64 81 100 | ||
+ | </pre></code> | ||
+ | |||
+ | The {{Codeline|"ChunksPerProc"}} option is mandatory with {{Codeline|"Vectorized", true}}. {{Codeline|1}} means that each proc will do its job in one shot (chunk). This number can be increased to use less memory for instance. A higher number of {{Codeline|"ChunksPerProc"}} allows also more flexibility in case of long calculations on a busy machine. If one cpu has finished all its jobs, it can take over the pending jobs of another. | ||
+ | |||
+ | === Output in cell arrays === | ||
+ | |||
+ | The following sample code was an answer to [http://stackoverflow.com/questions/27422219/for-every-row-reshape-and-calculate-eigenvectors-in-a-vectorized-way this question]. The goal was to diagonalize 2x2 matrices contained as rows of a 2d array (each row of the array being a flattened 2x2 matrix). | ||
+ | |||
+ | {{code|diagonalize NxN matrices contained in an array| | ||
+ | <pre> | ||
+ | A = [0.6060168 0.8340029 0.0064574 0.7133187; | ||
+ | 0.6325375 0.0919912 0.5692567 0.7432627; | ||
+ | 0.8292699 0.5136958 0.4171895 0.2530783; | ||
+ | 0.7966113 0.1975865 0.6687064 0.3226548; | ||
+ | 0.0163615 0.2123476 0.9868179 0.1478827]; | ||
+ | |||
+ | N = 2; | ||
+ | [eigenvectors, eigenvalues] = pararrayfun(nproc, | ||
+ | @(row_idx) eig(reshape(A(row_idx, :), N, N)), | ||
+ | 1:rows(A), "UniformOutput", false) | ||
</pre> | </pre> | ||
− | + | }} | |
− | + | ||
+ | With {{codeline|"UniformOutput", false}}, the outputs are contained in cell arrays (one cell per slice). In the sample above, both {{codeline|eigenvectors}} and {{codeline|eigenvalues}} are {{codeline|1x5}} cell arrays. | ||
+ | |||
+ | == cluster operation == | ||
+ | |||
+ | Documentation can be found in the {{codeline|README.parallel}} or {{codeline|README.bw}} files, located inside the {{codeline|doc}} directory of the parallel package. | ||
+ | |||
+ | [[Category:Octave Forge]] |
Latest revision as of 04:18, 10 June 2019
The Parallel execution package provides utilities to work with clusters, but also functions to parallelize work among cores of a single machine.
To install: pkg install -forge parallel
And then, once on each octave session, pkg load parallel
Contents
multicore parallelization (parcellfun, pararrayfun)[edit]
See also the NDpar package, for an extension of these functions to N-dimensional arrays
calculation on a single array[edit]
Code: simple |
# fun is the function to apply fun = @(x) x^2; vector_x = 1:10; vector_y = pararrayfun(nproc, fun, vector_x) |
should output
parcellfun: 10/10 jobs done
vector_y =
1 4 9 16 25 36 49 64 81 100
nproc
returns the number of cpus available (number of cores or twice as much with hyperthreading). One can use nproc - 1
instead, in order to leave one cpu free for instance.
fun
can be replaced by @myfun
if the function resides in the myfun.m
file.
In the previous example, the function was executed once for each element of the input vector_x
.
If the function is vectorized (can act on a vector and not just on scalar input), then it can be much more efficient to use the "Vectorized", true
option.
Code: vectorized |
# fun is the function to apply, vectorized (see the dot) fun = @(x) x.^2; vector_x = 1:10; vector_y = pararrayfun(nproc, fun, vector_x, "Vectorized", true, "ChunksPerProc", 1) |
should output
parcellfun: 4/4 jobs done
vector_y =
1 4 9 16 25 36 49 64 81 100
The "ChunksPerProc"
option is mandatory with "Vectorized", true
. 1
means that each proc will do its job in one shot (chunk). This number can be increased to use less memory for instance. A higher number of "ChunksPerProc"
allows also more flexibility in case of long calculations on a busy machine. If one cpu has finished all its jobs, it can take over the pending jobs of another.
Output in cell arrays[edit]
The following sample code was an answer to this question. The goal was to diagonalize 2x2 matrices contained as rows of a 2d array (each row of the array being a flattened 2x2 matrix).
Code: diagonalize NxN matrices contained in an array |
A = [0.6060168 0.8340029 0.0064574 0.7133187; 0.6325375 0.0919912 0.5692567 0.7432627; 0.8292699 0.5136958 0.4171895 0.2530783; 0.7966113 0.1975865 0.6687064 0.3226548; 0.0163615 0.2123476 0.9868179 0.1478827]; N = 2; [eigenvectors, eigenvalues] = pararrayfun(nproc, @(row_idx) eig(reshape(A(row_idx, :), N, N)), 1:rows(A), "UniformOutput", false) |
With "UniformOutput", false
, the outputs are contained in cell arrays (one cell per slice). In the sample above, both eigenvectors
and eigenvalues
are 1x5
cell arrays.
cluster operation[edit]
Documentation can be found in the README.parallel
or README.bw
files, located inside the doc
directory of the parallel package.