Difference between revisions of "Parallel package"

From Octave
Jump to navigation Jump to search
m (Remove redundant Category:Packages. Categories at bottom.)
(Overhaul page.)
Line 1: Line 1:
The Parallel execution package provides utilities to work with clusters, but also functions to parallelize work among cores of a single machine.
+
The {{Forge|parallel|parallel package}} is part of the Octave Forge project. See its {{Forge|parallel|homepage}} for the latest release.
  
To install: {{Codeline|pkg install -forge parallel}}
+
This package provides utilities to work with clusters<ref>[https://octave.sourceforge.io/parallel/package_doc/ Package documentation]</ref>, but also functions to parallelize work among cores of a single machine.
  
And then, once on each octave session, {{Codeline|pkg load parallel}}
+
* Install: {{Codeline|pkg install -forge parallel}}
 +
* Load: {{Codeline|pkg load parallel}}
  
== multicore parallelization (parcellfun, pararrayfun) ==
+
== Multicore parallelization (parcellfun, pararrayfun) ==
  
 +
=== Calculation on a single array ===
  
See also the [[NDpar package]], for an extension of these functions to N-dimensional arrays
+
<syntaxhighlight lang="octave">
 
 
=== calculation on a single array ===
 
 
 
{{Code|simple|<pre>
 
 
# fun is the function to apply  
 
# fun is the function to apply  
 
fun = @(x) x^2;
 
fun = @(x) x^2;
Line 19: Line 17:
  
 
vector_y = pararrayfun(nproc, fun, vector_x)
 
vector_y = pararrayfun(nproc, fun, vector_x)
</pre>
+
</syntaxhighlight>
}}
 
  
 
should output
 
should output
  
<code><pre>
+
<syntaxhighlight lang="plain">
 
parcellfun: 10/10 jobs done
 
parcellfun: 10/10 jobs done
  
Line 30: Line 27:
  
 
     1    4    9    16    25    36    49    64    81  100
 
     1    4    9    16    25    36    49    64    81  100
</pre></code>
+
</syntaxhighlight>
  
 
{{Codeline|nproc}} returns the number of cpus available (number of cores or twice as much with hyperthreading). One can use {{Codeline|nproc - 1}} instead, in order to leave one cpu free for instance.
 
{{Codeline|nproc}} returns the number of cpus available (number of cores or twice as much with hyperthreading). One can use {{Codeline|nproc - 1}} instead, in order to leave one cpu free for instance.
Line 39: Line 36:
 
If the function is vectorized (can act on a vector and not just on scalar input), then it can be much more efficient to use the {{Codeline|"Vectorized", true}} option.
 
If the function is vectorized (can act on a vector and not just on scalar input), then it can be much more efficient to use the {{Codeline|"Vectorized", true}} option.
  
{{Code|vectorized|<pre>
+
<syntaxhighlight lang="octave">
 
# fun is the function to apply, vectorized (see the dot)
 
# fun is the function to apply, vectorized (see the dot)
 
fun = @(x) x.^2;
 
fun = @(x) x.^2;
Line 46: Line 43:
  
 
vector_y = pararrayfun(nproc, fun, vector_x, "Vectorized", true, "ChunksPerProc", 1)
 
vector_y = pararrayfun(nproc, fun, vector_x, "Vectorized", true, "ChunksPerProc", 1)
</pre>
+
</syntaxhighlight>
}}
 
 
should output
 
should output
  
<code><pre>
+
<syntaxhighlight lang="plain">
 
parcellfun: 4/4 jobs done
 
parcellfun: 4/4 jobs done
 
vector_y =
 
vector_y =
  
 
     1    4    9    16    25    36    49    64    81  100
 
     1    4    9    16    25    36    49    64    81  100
</pre></code>
+
</syntaxhighlight>
  
 
The {{Codeline|"ChunksPerProc"}} option is mandatory with {{Codeline|"Vectorized", true}}. {{Codeline|1}} means that each proc will do its job in one shot (chunk). This number can be increased to use less memory for instance. A higher number of {{Codeline|"ChunksPerProc"}} allows also more flexibility in case of long calculations on a busy machine. If one cpu has finished all its jobs, it can take over the pending jobs of another.
 
The {{Codeline|"ChunksPerProc"}} option is mandatory with {{Codeline|"Vectorized", true}}. {{Codeline|1}} means that each proc will do its job in one shot (chunk). This number can be increased to use less memory for instance. A higher number of {{Codeline|"ChunksPerProc"}} allows also more flexibility in case of long calculations on a busy machine. If one cpu has finished all its jobs, it can take over the pending jobs of another.
Line 61: Line 57:
 
=== Output in cell arrays ===
 
=== Output in cell arrays ===
  
The following sample code was an answer to [http://stackoverflow.com/questions/27422219/for-every-row-reshape-and-calculate-eigenvectors-in-a-vectorized-way this question]. The goal was to diagonalize 2x2 matrices contained as rows of a 2d array (each row of the array being a flattened 2x2 matrix).
+
The following sample code was an answer to [https://stackoverflow.com/questions/27422219/for-every-row-reshape-and-calculate-eigenvectors-in-a-vectorized-way this question]. The goal was to diagonalize 2x2 matrices contained as rows of a 2d array (each row of the array being a flattened 2x2 matrix).
  
{{code|diagonalize NxN matrices contained in an array|
+
<syntaxhighlight lang="octave">
<pre>
 
 
A = [0.6060168 0.8340029 0.0064574 0.7133187;
 
A = [0.6060168 0.8340029 0.0064574 0.7133187;
0.6325375 0.0919912 0.5692567 0.7432627;
+
    0.6325375 0.0919912 0.5692567 0.7432627;
0.8292699 0.5136958 0.4171895 0.2530783;
+
    0.8292699 0.5136958 0.4171895 0.2530783;
0.7966113 0.1975865 0.6687064 0.3226548;
+
    0.7966113 0.1975865 0.6687064 0.3226548;
0.0163615 0.2123476 0.9868179 0.1478827];
+
    0.0163615 0.2123476 0.9868179 0.1478827];
  
 
N = 2;
 
N = 2;
Line 75: Line 70:
 
                                 @(row_idx) eig(reshape(A(row_idx, :), N, N)),  
 
                                 @(row_idx) eig(reshape(A(row_idx, :), N, N)),  
 
                                 1:rows(A), "UniformOutput", false)
 
                                 1:rows(A), "UniformOutput", false)
</pre>
+
</syntaxhighlight>
}}
 
  
 
With {{codeline|"UniformOutput", false}}, the outputs are contained in cell arrays (one cell per slice). In the sample above, both {{codeline|eigenvectors}} and {{codeline|eigenvalues}} are {{codeline|1x5}} cell arrays.
 
With {{codeline|"UniformOutput", false}}, the outputs are contained in cell arrays (one cell per slice). In the sample above, both {{codeline|eigenvectors}} and {{codeline|eigenvalues}} are {{codeline|1x5}} cell arrays.
  
== cluster operation ==
+
== References ==
 +
 
 +
<references />
 +
 
 +
== See also ==
  
Documentation can be found in the {{codeline|README.parallel}} or {{codeline|README.bw}} files, located inside the {{codeline|doc}} directory of the parallel package.
+
* [[File:]] - examples of how to use <code>parrarrayfun</code>
 +
* [[NDpar package]] - an extension of these functions to N-dimensional arrays
  
 
[[Category:Octave Forge]]
 
[[Category:Octave Forge]]

Revision as of 20:04, 3 March 2021

The parallel package is part of the Octave Forge project. See its homepage for the latest release.

This package provides utilities to work with clusters[1], but also functions to parallelize work among cores of a single machine.

  • Install: pkg install -forge parallel
  • Load: pkg load parallel

Multicore parallelization (parcellfun, pararrayfun)

Calculation on a single array

# fun is the function to apply 
fun = @(x) x^2;

vector_x = 1:10;

vector_y = pararrayfun(nproc, fun, vector_x)

should output

parcellfun: 10/10 jobs done

vector_y =

     1     4     9    16    25    36    49    64    81   100

nproc returns the number of cpus available (number of cores or twice as much with hyperthreading). One can use nproc - 1 instead, in order to leave one cpu free for instance.

fun can be replaced by @myfun if the function resides in the myfun.m file.

In the previous example, the function was executed once for each element of the input vector_x. If the function is vectorized (can act on a vector and not just on scalar input), then it can be much more efficient to use the "Vectorized", true option.

# fun is the function to apply, vectorized (see the dot)
fun = @(x) x.^2;

vector_x = 1:10;

vector_y = pararrayfun(nproc, fun, vector_x, "Vectorized", true, "ChunksPerProc", 1)

should output

parcellfun: 4/4 jobs done
vector_y =

     1     4     9    16    25    36    49    64    81   100

The "ChunksPerProc" option is mandatory with "Vectorized", true. 1 means that each proc will do its job in one shot (chunk). This number can be increased to use less memory for instance. A higher number of "ChunksPerProc" allows also more flexibility in case of long calculations on a busy machine. If one cpu has finished all its jobs, it can take over the pending jobs of another.

Output in cell arrays

The following sample code was an answer to this question. The goal was to diagonalize 2x2 matrices contained as rows of a 2d array (each row of the array being a flattened 2x2 matrix).

A = [0.6060168 0.8340029 0.0064574 0.7133187;
     0.6325375 0.0919912 0.5692567 0.7432627;
     0.8292699 0.5136958 0.4171895 0.2530783;
     0.7966113 0.1975865 0.6687064 0.3226548;
     0.0163615 0.2123476 0.9868179 0.1478827];

N = 2;
[eigenvectors, eigenvalues] = pararrayfun(nproc, 
                                @(row_idx) eig(reshape(A(row_idx, :), N, N)), 
                                1:rows(A), "UniformOutput", false)

With "UniformOutput", false, the outputs are contained in cell arrays (one cell per slice). In the sample above, both eigenvectors and eigenvalues are 1x5 cell arrays.

References

See also

  • [[File:]] - examples of how to use parrarrayfun
  • NDpar package - an extension of these functions to N-dimensional arrays