OEP:pkg

Revision as of 20:48, 18 November 2012 by Carandraug (talk | contribs)

Abstract

This OEP refers to Octave's design of the pkg system. The purpose of this system is to handle the installation, loading, and removal of Octave packages.

The current implementation of pkg has problems mainly when there's both local and global installations of packages, and when multiple octave and package versions try to coexist. This document attempts to design a solution for this.

The main idea of the solution is to keep database files with each package location and dependencies, and allow for the merge of such files.

To allow reinstallation of the packages, we propose to keep the source of the package. This would also make it easier to run the tests of packages.

Rationale and examples

This design is meant to allow the following:

 * keeping multiple versions of the same package installed and load a specific one
 * keep packages installed for multiple versions of Octave, specially in the case of
   .oct files which need to be rebuilt for each octave function
 * reinstall a package from its cache after installing new octave version
 * run the tests from packages (find tests in .cc sources)
 * clean the package cache
 * usage of alternate database files
 * usage of packages in remote directories which may not be available at all
   times

Available vs Loaded

To avoid problems reading this document, the distinction between available and loaded package should be done early. An available package is a package that is currently available to pkg to loading, unloading or reinstall. It is already installed but not necessarily loaded. A loaded package is an installed package whose functions have been added to Octave's function search path.

Types of package installs

This design supports 3 types of package installations: global (relative to the octave installation), local (user specific) and external (in any other place). Note that Octave itself can be installed in some different ways. It might be a system-wide installation (located somewhere in /usr/local/ for example), a local installation of a normal user (somewhere on /home/user/anywhere), or installed in the home directory of a system user (can be anywhere).

Global installs

Packages installed globally will be available to everyone from startup. This is the type of package installation that a system administrator would do for example. The meaning of global here is relative to the Octave installation though. If an Octave installation is local (installed by a user in ~/usr/local), a global installation of a package will still place its files in the home directory of the user (in ~/usr/local/ as well).

A global installation is performed automatically if the user installing the package has write permissions to those directories (localfcnfiledir and localapioctfiledir). In case it has no permissions, a local package installation is performed instead.

Local installs

Local packages are specific to a user. They are located in that user home directory into ann .octave directory. As with global package installations, they are available from startup. Unlike global, they are user specific, only available to the user that installed it. A local install for a user can be an external install for some other user.

External installs

These are like local packages but in a non-standard location. Octave does not know about this installations at startup even if the installation was done with the same Octave version. These can be packages installed in a filesystem that is not always mounted, local packages installs from another user in the same system, or anything else really.

An external package was installed with pkg, it is simply not constantly tracked down by Octave. An external package install will have a .db associated file just like the .db files for the local installs. To load an external package, the path for the .db file needs to be passed to pkg and the db named. Then packages from there can be loaded.

For example, after starting an Octave session, one can load two .db files. One is the labdev (/mnt/labdev/octave_packages.db) and the other is the friendA (/home/friendA/.octave/octave_packages.db). Once these two external db are loaded, the packages associated with it are made available to pkg and can be loaded normally. It's possible that the same package name and version exists in both dbs hence the need to name them (so it's possible to specify from which one should a package be loaded).

Package names

For parsing of the commands and files, some limitations on package names are required. This will limit what pkg commands can do. For example, if a package name is allowed to use score, then commands such as "pkg load image-2.0.0" can no longer be used to load a specific package version. Something such as "pkg load image::2.0.0" would have to be used. Using this alternative syntax means that package names cannot have colons.

This is not only limited to package versions. As pkg is to be expanded to load pkg databases from other files (packages in a not always mounted directory for example), it becomes a possibility to have more than one package with the same version available to "pkg load". This means that it becomes necessary to specify which package to load. Something like "pkg load image-lab-2.0.0" can be used. A nice thing would also be "pkg load image-2.0.0 from lab" but that would add one of following 2 limitations: either no package can be named from; or pkg load becomes limited to load only one package.

Also, supporting multiple packages versions means that the word "all" to refer to all packages has new limitations. Should we load only the latest version of each package? And if there's multiple packages with the same version on varios db, which one should be loaded? I'd propose the default to be:

- load the latest version availale - load the local install of the package - load the global install of the package - load the package from the external .db, starting from the latest added in case there's more than one.

For package names, the proposal is to limit package names to the same as variable names (makes it even easier to check validaity with isvarname). So package name must start with a letter, and otherwise be comprised of alphanumeric and underscores characters. Unlike variable names, package names will not be case sensitive since it would create problems when installing packages in filesystems that are not case sensitive (creating directories named Image and image would not be possible in FAT systems).


User cases

Case 1

Denise installs Octave 3.4.3 and installs the latest version of the financial (1.0.4) and image (2.0.0) package with "pkg install -forge financial image". After installing the packages, pkg keeps the tarballs in the system in cache for future use. The financial package is comprised of only .m function files while the image package is a mixture of .m and .oct. After installation, she runs `pkg test financial test` which runs all tests in the package (using the cached package to run the tests in the .cc files).


Later, Denise installs Octave 3.6.2 but keeps the previous version of Octave on the system since some of her old code no longer runs correctly. Loading the financial package is no problem but loading the image package returns the error

 pkg: image package not built for current version of Octave. Run `pkg reinstall image`

Denise runs `pkg reinstall image` which reinstalls the package (effectively keeping the .m files, but simply rebuilding the oct files for the new version). Depending on the Octave version she will run. Different paths will be loaded even though the package is the same.

A new version of the financial package (1.2.0) is released which is dependent on Octave 3.6.0. While using Octave 3.6.2, Denise installs the new version of the package "pkg install -forge financial". The files for the previous version of the package are kept altough "pkg load financial" will only load the latest version. However, when Denise is using Octave 3.4.3, as financial 1.2.0 requires Octave 3.6.0, pkg load will only load financial 1.0.4.

comments

shouldn't `rebuild` be used instead of `reinstall` ?

Case 2

Owen is stuck using the financial package 1.0.4 because some of his code no longer works in the latest versions. However the latest version of financial is 1.2.0 and pkg install -forge would install that version instead. He installs the old version of the package with "pkg install -forge financial-1.0.4".

But Owen wants to fix his code for the new version so also installs the new version of the package to experiment. On his code, he then uses "pkg load financial-1.0.4" while "pkg load financial" always loads the latest version of the package.

Case 3

Lisa is using Octave in a remote machine on the biochemistry department. The system administrator installed Octave 3.6.2, signal package 1.2.0, and general 1.0.0. Lisas uses all of them but she also requires the image package. However, the system administrator does not have time to access security issues with the package and tells her to install that package locally. She runs "pkg install -forge image" which installs the package in her home directory. When she runs "pkg list" she sees both the global packages and her own packages

When Octave 3.6.3 is released, Lisa wants to use the new version since it fixes one bug that has been aanoying her for a long time but the system administrator does not want to make the update and tells her to build it herself locally

Case 4

Diana is a student that wants to run her code in the departmental cluster. However, the system does not have an installation of Octave and she needs to install it on her home directory. When she installs packages, these installations are global (to her home directory) since she has write permissions on the directory where octave is installed. She installs the signal and image package.

Ligia is a colleague of Diana that wants to use the same cluster but wants to save herself from the trouble of building Octave. So she uses Diana's install of Octave. Since all packages were installed globally, Lígia has no trouble using the same packages. However, Lígia also needs to use the struct package and installs it "pkg install -forge struct". Since she does not have permissions to write on Diana's home directory, her install of the struct package is local. When Diana runs Octave she does not see the struct package installed, it only shows up for Ligia.

Diana wants to use the same version of the struct package that Ligia already installed but that package was installed locally to Ligia's home directory. She uses "pkg load-list /home/ligia/.octave/packages.db" to add the list of ligias packages to her own list of available packages. which she can load.

comments

Why not store the "packages.db" together with the packages? instead of loading the a packages database file. Then, Diana could just say pkg addpath ~Ligia/octave
Because she might want to use some of her packages, not all. This adding the .db file to her instance of Octave will not load the package, she still needs to load it. And she may want to load only some of them.

Case 5

John is a professor of biomechanics and uses Octave on his classes. Most of the exercises he gives to the class require the use of multiple packages in Octave Forge. Depending on the class, the requires packages are different. He creates a metapackage for his student listing all required packages. The students install it with "pkg install -url path-to-his-metapackage". The metapackage has no file it simply lists a bunch of package has dependencies. Since pkg solves this dependencies automatically, a message showing which packages will be installed is displayed before doing it.

Where to install things

These should not be hardcoded and taken from octave_config_info. There's many paths there whose purpose is explained on octave sources buil-aux/common.mk (see the Where To Install Things and Octave-specific directories sections on that file.)