Summer of Code - Getting Started

From Octave
Revision as of 14:47, 27 February 2015 by Nir (talk | contribs) (→‎Nonlinear and constrained least squares: updated description based on maintainers list discussion)
Jump to navigation Jump to search

The following is distilled from the Projects page for the benefit of potential Google and ESA Summer of Code (SoC) students. Although students are welcome to attempt any of the projects in that page or any of their own choosing, here we offer some suggestions on what good student projects might be.

Steps Toward a Successful Application

If you like any of the projects described below these are the steps you need to follow to apply:

  • Help Us Get To Know You
If you aren't communicating with us before the application is due, your application will not be accepted.
Join the maintainers mailing list or read the archives and see what topics we discuss and how the developers interact with each other.
Hang out in our IRC channel. Ask questions, submit patches, show us that you are motivated and well-prepared. There will be more applicants than we can effectively mentor, so do ask for feedback on your public application to increase the strength of your proposal!
  • Find Something That Interests You
It's critical that you find a project that excites you. You'll be spending most of the summer working on it (we expect you to treat the SoC as a full-time job). But don't just tell us how interested you are, show us. You can do that by fixing a few bugs or interacting with us on IRC well before the deadline. Our experience shows us that successful SoC students demonstrate their interest early and often.
  • Prepare Your Proposal With Us
By working with us to prepare your proposal, you'll be getting to know us and showing us how you approach problems. The best place for this is your wiki user page and the IRC channel.
  • Complete Your Application
Fill out our public application template.
This is best done by creating an account at this wiki and copying the template from its page.
You really only need to copy and answer the public part there, there is no need to showcase everything else to everybody reading your user page!
Fill out our private application template.
This is best done by copying the template from its page and adding the required information to your application at Google (melange) or at ESA.
Only the organization admin and the possible mentors will see this data. You can still edit it after submitting until the deadline!

Things You'll be Expected to Know or Quickly Learn

Octave is mostly written in C++ and its own scripting language that is mostly compatible with Matlab. There are bits and pieces of Fortran, Perl, C, awk, and Unix shell scripts here and there. In addition to being familiar with C++ and Octave's scripting language, successful applicants will be familiar with or able to quickly learn about Octave's infrastructure. You can't spend the whole summer learning how to build Octave or prepare a changeset and still successfully complete your project.

  • The Build System
The GNU build system is used to build Octave.
While you generally don't need to understand too much unless you actually want to change how Octave is built, you should be able to understand enough to get a general idea of how to build Octave.
If you've ever done a configure && make && make install series of commands, you have already used the GNU build system.
You must demonstrate that you are able to build the development version of Octave from sources before the application deadline.
  • The Version Control System
We use Mercurial (abbreviated hg).
Mercurial is the distributed version control system (DVCS) we use for managing our source code. You should have some basic understanding of how a DVCS works, but hg is pretty easy to pick up, especially if you already know a VCS like git or svn.
  • The Procedure for Contributing Changesets
You will be expected to follow the same procedures as other contributors and core developers.
You will be helping current and future Octave developers by using the same style for changes, commit messages, and so on. You should also read the same contributing guidelines we have for everyone.
This page describes the procedures students are expected to use to publicly display their progress in a public mercurial repo during their work.
  • The Maintainers Mailing List
We primarily use mailing lists for communication among developers.
The mailing list is used most often for discussions about non-trivial changes to Octave, or for setting the direction of development.
You should follow basic mailing list etiquette. For us, this mostly means "do not top post".
  • The IRC Channel
We also have the #octave IRC channel in Freenode.
You should be familiar with the IRC channel. It's very helpful for new contributors (you) to get immediate feedback on ideas and code.
Unless your primary mentor has a strong preference for some other method of communication, the IRC channel will likely be your primary means of communicating with your mentor and Octave developers.
  • The Octave Forge Project
Octave-Forge is a project closely related to Octave where packages reside. They are somewhat analogous to Matlab's toolboxes.
  • Related Skills
In addition, you probably should know some mathematics, engineering, or experimental science or something of the sort.
If you've used Matlab before, you probably have already been exposed to the kinds of problems that Octave is used for.

Criteria by which applications are judged

These might vary somewhat depending on the mentors and coordinators for a particular Summer of Code, but typically the main factors considered would be:

  • Applicant has demonstrated an ability to make substantial modifications to Octave
The most important thing is that your application has some interesting code samples to judge you by. It's ok during the application period to ask for help on how to format these code samples, which normally are Mercurial patches.
  • Applicant shows understanding of topic
Your application should make it clear that you're reasonably well versed in the subject area and won't need all summer just to read up on it.
  • Applicant shows understanding of and interest in Octave development
The best evidence for this is previous contributions and interactions.
  • Well thought out, adequately detailed, realistic project plan
"I'm good at this, so trust me" isn't enough. You should describe which algorithms you'll use and how you'll integrate with existing Octave code. You should also prepare a full timeline and goals for the midterm and final evaluations.

Suggested projects

The following projects are broadly grouped by category and probable skills required to tackle each. Remember to check Projects for more ideas if none of these suit you, and your own ideas are always welcome.

Info icon.svg
these are a suggested projects but you are welcome to propose your own projects provided you find an Octave mentor

Numerical

These projects involve implementing certain mathematical functions in Octave.

Improve logm, sqrtm, funm

The goal here is to implement some missing Matlab functions related to matrix functions like the matrix exponential. There is a general discussion of the problem.

Potential mentor: Jordi Gutiérrez Hermoso

Generalised eigenvalue problem

Certain calling forms of the eig function are currently missing, including preliminary balancing; computing left eigenvectors as a third output; and choosing among generalized eigenvalue algorithms. See also this discussion.

Required skills: C++; familiarity with numerical linear algebra and LAPACK.

Difficulty: medium.

Potential mentor: Nir Krakauer

Nonlinear and constrained least squares

The Optimization package is missing the functions lsqcurvefit, lsqlin, lsqnonlin to conveniently solve least-squares problems that are nonlinear and/or constrained. The first priority is to implement these as wrappers to algorithms already present in the Optimization package, cf. the documentation for leasqr, nonlin_residmin, nonlin_curvefit, nonlin_min. Implementing related missing optimization functions such as fmincon could also be part of the project. A possible extension with lower priority would be to add new optimization algorithms or variants to those already implemented, perhaps based on free implementations in other languages, such as minpack in Fortran and levmar in C.

Required skills: m-file scripting; familiarity with optimization problems, terminology, concepts, and algorithms.

Difficulty: medium.

Potential mentor: Nir Krakauer

TISEAN package

TISEAN is a suite of code for nonlinear time series analysis. It is old but there are many algorithms there that haven't been re-implemented as libre software. The objective is to integrate TISEAN as a octave package as it was done for the Control package. The functions cuould be integrated in the existing time series analysis package

Required skills: m-file scripting, c/C++ and FORTRAN API knowledge.

Difficulty: easy/medium

Mentor: User:KaKiLa

Symbolic package

Octave's Symbolic package handles symbolic computing and other CAS tools. The main component of Symbolic is a pure m-file class "@sym" which uses the Python package SymPy to do (most of) the actual computations. The package aims to expose the full functionality of SymPy while also providing a high-level of compatibility with the Matlab Symbolic Math Toolbox. Currently, communication between Octave and Python is handled with a pipe (see "help popen2") and parsing text. However, this is fragile when things go wrong: for example, catching exceptions from Python is a bit ad hoc.

The main aim of this proposed project is to implement (or even better co-opt an existing) C/C++ oct-file interface that interacts with Python as a library, and e.g., deals gracefully with exceptions. This could either supplement the existing IPC or replace it altogether.

Required skills: m-file scripting, C/C++, and Python

Difficulty: easy/medium

Mentor: Colin B. Macdonald

Interval package

The recent GNU Octave interval package provides several arithmetic functions with accurate and guaranteed error bounds. Its development started in the end of 2014 and there is some fundamental functionality left to be implemented:

  1. Currently, everything is console/text only and the @infsup class needs functions for plotting intervals in graphs as lines, rectangles or boxes. For examples of how the result may look like, see images [1] and [2].
  2. The functions polyval, fsolve, and possibly roots shall be implemented for intervals (as m-files). Algorithms can be migrated from the C-XSC Toolbox (C++ code) from [3] (see rpeval.cpp, nlinsys.cpp, and cpzero.cpp respectively). All arithmetic operations required by these algorithms already exist in the package.

The second tasks requires knowledge of basic interval arithmetic concepts. If that would be a problem, the first task can be extended at will.

Required skills: m-file scripting, basic knowledge of computer arithmetics (especially floating-point computations)

Difficulty: medium

Mentor: Oliver Heimlich

Infrastructure

Octave Package management

Octave management of installed packages is performed by a single function, pkg, which does pretty much everything. This function has a few limitations which are hard to implement with the current codebase, and will most likely require a full rewrite.

The planned improvements are:

  • support for multiple Octave installs
  • support for multiple version packages
  • support for system-wide and user installed packages
  • automatic handling of dependencies
  • more flexibility on dependencies, e.g., dependent on specific Octave build options or being dependent in one of multiple packages
  • management of tests and demos in C++ sources of packages
  • think ahead for multiple
  • easily load or check specific package versions

The current pkg also performs some functions which probably should not. Instead a package for developers should be created with such tools.

Many of these problems have been solved in other languages. Familiarity with how other languages handle this problem will be useful to come up with elegant solutions. In some cases, there are standards to follow. For example, there are specifications published by freedesktop.org about where files should go (base directory spec) and Windows seems to have its own standards. See bugs #36477 and #40444 for more details.

In addition, package names may start to collide very easily. One horrible way to workaround this by is choosing increasingly complex package names that give no hint on the package purpose. A much better is option is providing an Authority category like Perl 6 does. Nested packages is also an easy way to provide packages for specialized subjects (think image::morphology). A new pkg would think all this things now, or allow their implementation at a later time. Read the unfinished plan for more details.

Minimum requirements: Ability to read and write Octave code, experience with Octave packages, and understanding of the basics of autotools. The most important skill is software design.

Difficulty: Easy to Medium

Mentor: Carnë Draug

Image Analysis

Improvements to N-dimensional image processing

The image package has partial functionality for N-dimensional images. These images exist for example in medical imaging where slices from scans are assembled to form anatomical 3D images. If taken over time and at different laser wavelengths or light filters, they can also result in 5D images. Albeit less common, images with even more dimensions also exist. However, their existence is irrelevant since most of the image processing operations are mathematical operations which are independent of the number of dimensions.

As part of GSoC 2013, the core functions for image IO, imwrite and imread, were extended to better support this type of images. Likewise, many functions in the image package, mostly morphology operators, were expanded to deal with this type of image. Since then, many other functions have been improved, sometimes completely rewritten, to abstract from the number of dimensions. In a certain way, supporting ND images is also related to choosing good algorithms since such large images tend to be quite large.

This project will continue on the previous work, and be mentored by the previous GSoC student and current image package maintainer. Planning the project requires selection of functions lacking ND support and identifying their dependencies. For example, supporting imclose and imopen was better implemented by supporting imerode and imdilate which then propagated ND support to all of its dependencies. These dependencies need to be discovered first since often they are not being used yet, and may even be missing function. This project can also be about implementing functions that have not yet been implemented. Also note that while some functions in the image package will accept ND images as input, they are actually not correctly implemented and will give incorrect results.

Required skills: m-file scripting, and a fair amount of C++ since a lot of image analysis cannot be vectorized. Familiarity with common CS algorithms and willingness to read literature describing new algorithms will be useful.

Difficulty: difficult

Potential mentor: Carnë Draug

Improve Octave's image IO

There are a lot of image formats. To handle this, Octave uses GraphicsMagic (GM), a library capable of handling a lot of them in a single C++ interface. However, GraphicsMagick still has its limitations. The most important are:

  • GM has build option quantum which defines the bitdepth to use when reading an image. Building GM with high quantum means that images of smaller bitdepth will take a lot more memory when reading, but building it too low will make it impossible to read images of higher bitdepth. It also means that the image needs to always be rescaled to the correct range.
  • GM supports unsigned integers only thus incorrectly reading files such as TIFF with floating point data
  • GM hides away details of the image such as whether the image file is indexed. This makes it hard to access the real data stored on file.

This project would implement better image IO for scientific file formats while leaving GM handle the others. Since TIFF is the de facto standard for scientific images, this should be done first. Among the targets for the project are:

  • implement the Tiff class which is a wrap around libtiff, using classdef. To avoid creating too many private __oct functions, this project could also create a C++ interface to declare new Octave classdef functions.
  • improve imread, imwrite, and imfinfo for tiff files using the newly created Tiff class
  • port the bioformats into Octave and prepare a package for it
  • investigate other image IO libraries
  • clean up and finish the dicom package to include into Octave core
  • prepare a matlab compatible implementation of the FITS package for inclusion in Octave core

Required skills: knowledge of C++ and C since most libraries are written in those languages

Difficulty: medium

Potential mentor: Carnë Draug