Difference between revisions of "Parallel package"

From Octave
Jump to navigation Jump to search
m (Remove redundant Category:Packages. Categories at bottom.)
m
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
The Parallel execution package provides utilities to work with clusters, but also functions to parallelize work among cores of a single machine.
+
The {{Forge|parallel|parallel package}} is part of the Octave Forge project. See its {{Forge|parallel|homepage}} for the latest release.
  
To install: {{Codeline|pkg install -forge parallel}}
+
This package provides utilities to work with clusters<ref>[https://octave.sourceforge.io/parallel/package_doc/ Package documentation]</ref>, but also functions to parallelize work among cores of a single machine.
  
And then, once on each octave session, {{Codeline|pkg load parallel}}
+
* Install: {{Codeline|pkg install -forge parallel}}
 +
* Load: {{Codeline|pkg load parallel}}
  
== multicore parallelization (parcellfun, pararrayfun) ==
+
== Multicore parallelization (parcellfun, pararrayfun) ==
  
 +
=== Calculation on a single array ===
  
See also the [[NDpar package]], for an extension of these functions to N-dimensional arrays
+
<syntaxhighlight lang="octave">
 
 
=== calculation on a single array ===
 
 
 
{{Code|simple|<pre>
 
 
# fun is the function to apply  
 
# fun is the function to apply  
 
fun = @(x) x^2;
 
fun = @(x) x^2;
Line 19: Line 17:
  
 
vector_y = pararrayfun(nproc, fun, vector_x)
 
vector_y = pararrayfun(nproc, fun, vector_x)
</pre>
+
</syntaxhighlight>
}}
 
  
 
should output
 
should output
  
<code><pre>
+
<syntaxhighlight lang="text">
 
parcellfun: 10/10 jobs done
 
parcellfun: 10/10 jobs done
  
Line 30: Line 27:
  
 
     1    4    9    16    25    36    49    64    81  100
 
     1    4    9    16    25    36    49    64    81  100
</pre></code>
+
</syntaxhighlight>
  
 
{{Codeline|nproc}} returns the number of cpus available (number of cores or twice as much with hyperthreading). One can use {{Codeline|nproc - 1}} instead, in order to leave one cpu free for instance.
 
{{Codeline|nproc}} returns the number of cpus available (number of cores or twice as much with hyperthreading). One can use {{Codeline|nproc - 1}} instead, in order to leave one cpu free for instance.
Line 39: Line 36:
 
If the function is vectorized (can act on a vector and not just on scalar input), then it can be much more efficient to use the {{Codeline|"Vectorized", true}} option.
 
If the function is vectorized (can act on a vector and not just on scalar input), then it can be much more efficient to use the {{Codeline|"Vectorized", true}} option.
  
{{Code|vectorized|<pre>
+
<syntaxhighlight lang="octave">
 
# fun is the function to apply, vectorized (see the dot)
 
# fun is the function to apply, vectorized (see the dot)
 
fun = @(x) x.^2;
 
fun = @(x) x.^2;
Line 46: Line 43:
  
 
vector_y = pararrayfun(nproc, fun, vector_x, "Vectorized", true, "ChunksPerProc", 1)
 
vector_y = pararrayfun(nproc, fun, vector_x, "Vectorized", true, "ChunksPerProc", 1)
</pre>
+
</syntaxhighlight>
}}
 
 
should output
 
should output
  
<code><pre>
+
<syntaxhighlight lang="text">
 
parcellfun: 4/4 jobs done
 
parcellfun: 4/4 jobs done
 
vector_y =
 
vector_y =
  
 
     1    4    9    16    25    36    49    64    81  100
 
     1    4    9    16    25    36    49    64    81  100
</pre></code>
+
</syntaxhighlight>
  
 
The {{Codeline|"ChunksPerProc"}} option is mandatory with {{Codeline|"Vectorized", true}}. {{Codeline|1}} means that each proc will do its job in one shot (chunk). This number can be increased to use less memory for instance. A higher number of {{Codeline|"ChunksPerProc"}} allows also more flexibility in case of long calculations on a busy machine. If one cpu has finished all its jobs, it can take over the pending jobs of another.
 
The {{Codeline|"ChunksPerProc"}} option is mandatory with {{Codeline|"Vectorized", true}}. {{Codeline|1}} means that each proc will do its job in one shot (chunk). This number can be increased to use less memory for instance. A higher number of {{Codeline|"ChunksPerProc"}} allows also more flexibility in case of long calculations on a busy machine. If one cpu has finished all its jobs, it can take over the pending jobs of another.
Line 61: Line 57:
 
=== Output in cell arrays ===
 
=== Output in cell arrays ===
  
The following sample code was an answer to [http://stackoverflow.com/questions/27422219/for-every-row-reshape-and-calculate-eigenvectors-in-a-vectorized-way this question]. The goal was to diagonalize 2x2 matrices contained as rows of a 2d array (each row of the array being a flattened 2x2 matrix).
+
The following sample code was an answer to [https://stackoverflow.com/questions/27422219/for-every-row-reshape-and-calculate-eigenvectors-in-a-vectorized-way this question]. The goal was to diagonalize 2x2 matrices contained as rows of a 2d array (each row of the array being a flattened 2x2 matrix).
  
{{code|diagonalize NxN matrices contained in an array|
+
<syntaxhighlight lang="octave">
<pre>
 
 
A = [0.6060168 0.8340029 0.0064574 0.7133187;
 
A = [0.6060168 0.8340029 0.0064574 0.7133187;
0.6325375 0.0919912 0.5692567 0.7432627;
+
    0.6325375 0.0919912 0.5692567 0.7432627;
0.8292699 0.5136958 0.4171895 0.2530783;
+
    0.8292699 0.5136958 0.4171895 0.2530783;
0.7966113 0.1975865 0.6687064 0.3226548;
+
    0.7966113 0.1975865 0.6687064 0.3226548;
0.0163615 0.2123476 0.9868179 0.1478827];
+
    0.0163615 0.2123476 0.9868179 0.1478827];
  
 
N = 2;
 
N = 2;
Line 75: Line 70:
 
                                 @(row_idx) eig(reshape(A(row_idx, :), N, N)),  
 
                                 @(row_idx) eig(reshape(A(row_idx, :), N, N)),  
 
                                 1:rows(A), "UniformOutput", false)
 
                                 1:rows(A), "UniformOutput", false)
</pre>
+
</syntaxhighlight>
}}
 
  
 
With {{codeline|"UniformOutput", false}}, the outputs are contained in cell arrays (one cell per slice). In the sample above, both {{codeline|eigenvectors}} and {{codeline|eigenvalues}} are {{codeline|1x5}} cell arrays.
 
With {{codeline|"UniformOutput", false}}, the outputs are contained in cell arrays (one cell per slice). In the sample above, both {{codeline|eigenvectors}} and {{codeline|eigenvalues}} are {{codeline|1x5}} cell arrays.
  
== cluster operation ==
+
== References ==
 +
 
 +
<references />
 +
 
 +
== See also ==
  
Documentation can be found in the {{codeline|README.parallel}} or {{codeline|README.bw}} files, located inside the {{codeline|doc}} directory of the parallel package.
+
* [[File:Examples_of_how_to_use_parrarrayfun.pdf]]
 +
* [[NDpar package]] - an extension of these functions to N-dimensional arrays
  
 
[[Category:Octave Forge]]
 
[[Category:Octave Forge]]

Latest revision as of 19:09, 3 March 2021

The parallel package is part of the Octave Forge project. See its homepage for the latest release.

This package provides utilities to work with clusters[1], but also functions to parallelize work among cores of a single machine.

  • Install: pkg install -forge parallel
  • Load: pkg load parallel

Multicore parallelization (parcellfun, pararrayfun)[edit]

Calculation on a single array[edit]

# fun is the function to apply 
fun = @(x) x^2;

vector_x = 1:10;

vector_y = pararrayfun(nproc, fun, vector_x)

should output

parcellfun: 10/10 jobs done

vector_y =

     1     4     9    16    25    36    49    64    81   100

nproc returns the number of cpus available (number of cores or twice as much with hyperthreading). One can use nproc - 1 instead, in order to leave one cpu free for instance.

fun can be replaced by @myfun if the function resides in the myfun.m file.

In the previous example, the function was executed once for each element of the input vector_x. If the function is vectorized (can act on a vector and not just on scalar input), then it can be much more efficient to use the "Vectorized", true option.

# fun is the function to apply, vectorized (see the dot)
fun = @(x) x.^2;

vector_x = 1:10;

vector_y = pararrayfun(nproc, fun, vector_x, "Vectorized", true, "ChunksPerProc", 1)

should output

parcellfun: 4/4 jobs done
vector_y =

     1     4     9    16    25    36    49    64    81   100

The "ChunksPerProc" option is mandatory with "Vectorized", true. 1 means that each proc will do its job in one shot (chunk). This number can be increased to use less memory for instance. A higher number of "ChunksPerProc" allows also more flexibility in case of long calculations on a busy machine. If one cpu has finished all its jobs, it can take over the pending jobs of another.

Output in cell arrays[edit]

The following sample code was an answer to this question. The goal was to diagonalize 2x2 matrices contained as rows of a 2d array (each row of the array being a flattened 2x2 matrix).

A = [0.6060168 0.8340029 0.0064574 0.7133187;
     0.6325375 0.0919912 0.5692567 0.7432627;
     0.8292699 0.5136958 0.4171895 0.2530783;
     0.7966113 0.1975865 0.6687064 0.3226548;
     0.0163615 0.2123476 0.9868179 0.1478827];

N = 2;
[eigenvectors, eigenvalues] = pararrayfun(nproc, 
                                @(row_idx) eig(reshape(A(row_idx, :), N, N)), 
                                1:rows(A), "UniformOutput", false)

With "UniformOutput", false, the outputs are contained in cell arrays (one cell per slice). In the sample above, both eigenvectors and eigenvalues are 1x5 cell arrays.

References[edit]

See also[edit]