JWE Project Ideas: Difference between revisions

From Octave
Jump to navigation Jump to search
m (Protected "JWE Project Ideas" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite)) [cascading])
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
<!-- This file should be edited at https://wiki.octave.org/JWE_Project_Ideas -->
<!-- This file should be edited at https://wiki.octave.org/JWE_Project_Ideas -->


'''2021-11-19: This page is out of date -- jwe'''
The following are projects that I would like to work on, roughly prioritized by my level of interest.  I intend to provide expanded explanations for each of these items.


== Language and functions ==
== Comment parsing ==


=== classdef issues ===
Refactor comment handling in lexer and parser.


==== Compatibility issues ====
* [<span style="color:DarkGreen">Done</span>] Gather and attach all comments to tokens in the lexer, never in the parser.  This change will allow us to simplify the grammar by eliminating the stash_comment nonterminal symbol in the parser.
* [<span style="color:DarkOrange">In Progress - still need to handle some separator tokens</span>] Store tokens (with location info and comments attached) in the parse tree instead of imprecise data like "leading_comment" or "trailing_comment".  This change will allow better location info for error reporting and easier and more accurate access to comments so that we can more easily find documentation strings or test and demo blocks (some of that work is already done).
* Recognize and tag comments that look like test or demo blocks.  Skip those when looking for doc strings.
* Update the parse tree classes to provide access to the new information stored in them.
* Update the tree_print_code class to use the new info to provide better output.  Allow comments to be omitted from the output.
* Combine and simplify start_function, finish_function, and recover_from_parsing_function functions in the parser into a single make_function function. (This job is somewhat separate from the comment handling changes but now seems like a good time to do it.)
* Modify Octave's demo function to find demo comment blocks that are associated with classdef methods.


Make a list here, pointing to individual bug reports?
== [<span style="color:DarkGreen">Done</span>] Location info in the parse tree ==


==== Load/save for classdef ====
Once all tokens are stored in the parse tree, we can eliminate the separate storage of line and column info and eliminate two arguments from nearly every tree_* class constructor.  Any location info that is needed later for error messages, debugging, or code generation can be obtained from individual tokens.


See also general load/save issues.
== MException object ==


==== Improve / simplify implementation ====
* Provide a mostly Matlab-compatible `MException` object.
* Make the `MException` constructor a built-in function (this job may require some changes to the classdef implementation).
* Fix the Octave interpreter to create an `MException` object when it throws an error and make that object available in `catch` blocks.


Although the basic features that are implemented now appear to mostly work, the implementation seems overly complicated, making it difficult to debug and modify.  There seems to be quite a bit of room for improvement here.
== FTLK graphics widget ==


=== Syntax, semantics, and data types ===
Eliminate fltk graphics widget or move to external package.


==== Matlab-compatible argument validation blocks ====
== gnuplot graphics widget ==


New language feature, syntax is accepted by parser now but argument validation is not performed.
Eliminate gnuplot graphics widget or move to external package.


==== Function handle refactoring ====
== Function objects ==


* Load/save for all types of function handles and all data formats (ascii, binary, hdf5, mat5)
Refactor function objects.
* Use std::shared_ptr for function objects instead of bare pointer to octave_function.


==== String class ====
== arguments blocks ==


Matlab now uses "" to create string objects that behave differently from Octave double-quoted strings.  We could start by creating a compatible string class, then hooking it up to the "" syntax.  No matter what, the transition will be difficult because Matlab's "" strings still treat "\n" as two characters (backslash and n) rather than a single character (newline).
Finish implementation of Matlab-compatible argument validation blocks.


==== Other new data types ====
== local functions ==


Andrew Janke has implementations of these classes (FIX: link to repos here)
Implement Matlab-compatible local functions in script files.


* table
== string object ==
* datetime, duration, calendarDuration
* categorical
* timetable
* timeseries


==== single / integer valued ranges ====
== load_path class ==


This is a compatibility issue.
Refactor (or rewrite) load_path class.


==== Refactor load-path ====
== Broadcasting ==


* Directories are not properly removed from load path (FIX: link to bug report here)
* Refactor broadcasting.
* Should we really have ADD_PKG and DEL_PKG files?  If so, how can we make them safe?
* Make broadcasting work for sparse matrices.


==== Eliminate special matrix types ====
== GUI command widget ==


Although the special range, diagonal matrix, and permutation matrix data types in Octave require less memory than storing full matrices, they tend to cause trouble when people expect full compatibility or exactly the same results when performing arithmetic on Ranges vs. Matrices.  Now that we have broadcasting operators, the need for diagonal matrices is not as great.
Make common command widget for Windows and Unixy systems work well enough to become the default command line interface for the GUI.


==== Special case FOR loop limits ====
== OpenGL graphics ==
 
Currently, "for i = 1:N ..." uses a Range object for the "1:N" loop bounds.  If we eliminate Ranges as a special space-saving type, then we should handle this syntax as a special case.  Even if we don't eliminate Ranges, that might be a good idea, as we could handle "for i = 1:Inf ..." easily without having to worry about how to deal with that in an ordinary Range object vs. FOR loop bounds.
 
==== Local functions ====
 
The semantics for local functions in scripts is different from the
way Octave currently handles functions that are defined in script
files.
 
==== Matlab packages ====
 
+DIR directories in the loadpath; related to classdef
 
Octave already searches for files in package directories and
understands the PKG.fcn syntax and functionality.  The big missing
piece is implementation of the "import" functionality and handling
it efficiently and in a way that is compatible with Matlab.
 
==== Refactor broadcasting ====
 
Are there better ways to use templates to handle function calls rather than using macros to define a set of functions for array/array, array/scalar, and scalar/array ops as in DEFMXBINOP in mx-inlines.cc?
 
==== Sparse matrix issues ====
 
==== Broadcasting ====
 
Broadcasting does not work for sparse matrices.  This seems like a big missing feature.
 
==== Structural zeros ====
 
Octave currently skips structural zeros for most (all?) sparse matrix operations.  Matlab returns a sparse matrix filled with NaNs for something like "sprand (5, 5, 0.1) .^ NaN".  Should we go for full compatibility?  Mathematical correctness?  Traditional behavior of sparse matrix libraries?  It seems no one really agrees on what is correct or best.  Maybe compatibility should win?
 
==== Indexed assignment ====
 
In an assignment like Sparse_object(idx) = GrB_object(idx), Octave does not attempt to apply a conversion operator to transform the RHS type to the LHS type.  Is this also a problem for assignments of objects with conversion operators to full matrix objects?
 
==== graph and digraph ====
 
Would it be difficult to provide these objects?
 
== GUI ==
 
=== Communication with interpreter ===
 
Currently, communication between the GUI and the interpreter
mostly happens when the interpreter is otherwise idle and waiting
for user input at the command prompt and the implementation is
somewhat complicated.  We need to determine whether this is the
best we can do, or if there is some other implementation that
would be more flexible and reliable.
 
=== [[GUI terminal widget|GUI command window]] ===
 
The implementation of the GUI command window for Unix-like systems
is a completely separate implementations from the one used on
Windows systems.  There should be only one, and the GUI should be
completely in charge of user input and output.  This will probably
require implementing some kind of simple output pager internally
instead of using an external program, but overall user interaction
could be improved.
 
=== GUI code editor ===
 
Make it possible to use external editors such as Emacs, vim, or
others with the GUI in addition to Octave's built-in code editor
 
== Graphics ==
 
=== Publication-quality figures ===
 
Generating EPS or PDF versions of figures needs improvement.
 
=== OpenGL graphics ===


* Modernize our use of OpenGL graphics to use shader programs instead of the legacy OpenGL API.
* Scaling plot data values/ranges to fit in single-precision OpenGL values
* Scaling plot data values/ranges to fit in single-precision OpenGL values
* Performance issues
* Lack of WYSIWYG


=== FLTK widget ===
== classdef ==


With the rest of the GUI using Qt widgets, we should eliminate the FLTK plotting widget.  It duplicates functionality and requires additional effort to maintain.  Maybe we no longer need the octave-cli binary (the one that is not linked with Qt libraries)?
Refactor (or rewrite) classdef implementation.


=== Qt toolkit threading ===
== Qt graphics widget ==


It seems likely that the locking of the gh_manager object is insufficient or even incorrect in some cases.
Refactor (or rewrite) Qt graphics widget.


=== classdef graphics objects ===
== graphics properties ==


This is a large project, but one that will likely have to be tackled at some point.
Refactor graphics properties classes.


== Miscellaneous ==
== graphics threading issues ==


=== Handle UTF-8 ===
Fix handling of graphics properties to be properly thread safe.


We need to handle UTF-8 (or whatever) characters properly in all parts of Octave.  Try to do this in a Matlab-compatible way.
== HDF5 load and save ==


=== Load / Save ===
Implement Matlab compatible HDF5-based load and save functions.


* Make the load and save commands compatible with Matlab's HDF5-based file format.  Matlab users expect this and we need something like this to support large arrays anyway.  As much as possible, the initial implementation should be written in Octave's scripting language and the proposed [[Low-level interface to HDF5 functions]] so that it can easily be updated and patched as needed while we are still working out the details.  Only later should we consider translating performance-critical parts to C++, and then, only if really necessary.  
* Make the load and save commands compatible with Matlab's HDF5-based file format.  Matlab users expect this and we need something like this to support large arrays anyway.  As much as possible, the initial implementation should be written in Octave's scripting language and the proposed Low-level interface to HDF5 functions so that it can easily be updated and patched as needed while we are still working out the details.  Only later should we consider translating performance-critical parts to C++, and then, only if really necessary.
* Phase out Octave's own text and binary formats.  Too much effort is required to maintain the code to support all the various formats.
* Phase out Octave's own text and binary formats.  Too much effort is required to maintain the code to support all the various formats.
* Low-level interface to HDF5 functions.  Create a thin wrapper for the HDF5 library.  As much as possible, make it compatible with the [https://www.mathworks.com/help/matlab/low-level-functions.html Matlab interface to HDF5].  However, we may support newer functions (as of 2020/10/30, the list of Matlab functions appears to correspond to an older version of the library than is presently available in the HDF5 library itself) and support for legacy functions has a low priority.


=== Low-level interface to HDF5 functions ===
== External editors ==


Create a thin wrapper for the HDF5 library.  As much as possible, make it compatible with the [https://www.mathworks.com/help/matlab/low-level-functions.html Matlab interface to HDF5].  However, we may support newer functions (as of 2020/10/30, the list of Matlab functions appears to correspond to an older version of the library than is presently available in the HDF5 library itself) and support for legacy functions has a low priority.
Make it possible to use external editors such as Emacs, vim, or others with the GUI in addition to Octave's built-in code editor.


Also as of 2020/10/30, [[User:jwe|jwe]] is working on this project.  Help is welcome!
== who -file ==


=== RNG issues ===
Fix who -file to just read file and list info, not create dummy scope.


RandStream and Other RNG issues
== import ==


This is likely a large project, but it would be nice to have updated, compatible interfaces.
Make "import" work in a matlab-compatible way ==


=== MEX Interface ===
== Code quality ==


Implement mxMakeReal and mxMakeComplex functions.
=== JIT compiler ===
A proof-of-concept implementation was done several years ago by a
Google Summer of Code student.  It was never complete and little
work has been done since.  It also depends on an old version of
LLVM.  In addition to LLVM, we should consider the JIT library
features of GCC.
This is probably the most difficult item (at least for me) since it
will require fairly advanced knowledge of compiler infrastructure
and Octave internals.
=== loadlibrary ===
This feature might be nice to have but it has a low priority.
=== Complex integers ===
Should we support this feature?  Should we refactor the implementation of array objects to make this job easier?
=== who -file option ===
Should just read file and list info, not create dummy scope.  Likewise for whos function.
== Maintenance and packaging ==
=== General code quality ===
* Use C++11 features where possible.
* Better and more complete use of C++ namespaces.
* Better and more complete use of C++ namespaces.
* Better use of C++ features. Especially standard library features as their implementation becomes more widely available. For example, we might be able to simplify some things in Octave by using the C++17 filesystem and special functions libraries, if they provide results that are at least as good what we are using now.
* Better use of C++ features. Especially standard library features as their implementation becomes more widely available. For example, we might be able to simplify some things in Octave by using the C++17 filesystem and special functions libraries, if they provide results that are at least as good what we are using now.
* Eliminate C preprocessor macros where possible
* Eliminate C preprocessor macros where possible.
* added_static must go! (not sure about this now)
* Use const in more parse tree functions.
* Should not expose symbol_record in call_stack functions if possible
* remove, replace, or at least rename the "added static" concept in the symbol_record class.
* Remove unused symbol_table/scope/record functions
* Should not expose symbol_record in call_stack functions if possible.
* Use const in more parse tree functions
* Remove unused symbol_table/scope/record functions.
* Do recursive functions work properly with load/save now?
* Do recursive functions work properly with load/save now?
* Use enums for options internally (typically to replace bool values)
* Use enums for options internally (typically to replace bool values).
* Audit global variables and eliminate them where possible
* Audit global variables and eliminate them where possible.
 
* Audit use of panic_* functions and replace with calls to error where possible.
=== Symbol visibility ===
* Fix symbol visibility so we are mostly tagging namespace decls, not individual functions.
 
* Complete use of dispatch types for functions (search for "classes:" to find the few current examples).
We really should be tagging the functions that we wish to export from shared libraries.
* Tag for built-in functions to specify maxiumum number of inputs.
 
=== Dispatch types for functions ===
 
Search for "classes:" in sources to find the few current examples.
 
=== min/max nargin values ===
 
Should we do this, and allow the interpreter to automatically error when a function is given too few/many arguments?
 
=== Toolboxes ===
 
Move some core toolboxes (communications, control systems, image
processing, optimization, signal processing, and statistics), to
core Octave so development is managed along with Octave. Core
Octave developers are already responsible for these packages
anyway, and users don't seem to understand why they need to
install them separately.  Core parts of the ordinary differential
equations package have already been moved to Octave.
 
=== Documentation ===
 
* Docs for call stack with examples and illustrations
* Docs for lexer and parser with examples and illustrations
* Docs for fcn_info object
* Docs for load_path object
* Docs for classdef internals
* Docs for Qt graphics toolkit internals
* Docs for Qt GUI and communication with interpreter
* Improve other Doxygen docs for internals to make it easier for new contributors to understand the Octave code base.


=== Windows distribution ===
== Windows distribution ==


Eliminate the following msys packages.  Some might be removed
Eliminate the following msys packages.  Some might be removed

Latest revision as of 05:12, 4 April 2024


The following are projects that I would like to work on, roughly prioritized by my level of interest. I intend to provide expanded explanations for each of these items.

Comment parsing

Refactor comment handling in lexer and parser.

  • [Done] Gather and attach all comments to tokens in the lexer, never in the parser. This change will allow us to simplify the grammar by eliminating the stash_comment nonterminal symbol in the parser.
  • [In Progress - still need to handle some separator tokens] Store tokens (with location info and comments attached) in the parse tree instead of imprecise data like "leading_comment" or "trailing_comment". This change will allow better location info for error reporting and easier and more accurate access to comments so that we can more easily find documentation strings or test and demo blocks (some of that work is already done).
  • Recognize and tag comments that look like test or demo blocks. Skip those when looking for doc strings.
  • Update the parse tree classes to provide access to the new information stored in them.
  • Update the tree_print_code class to use the new info to provide better output. Allow comments to be omitted from the output.
  • Combine and simplify start_function, finish_function, and recover_from_parsing_function functions in the parser into a single make_function function. (This job is somewhat separate from the comment handling changes but now seems like a good time to do it.)
  • Modify Octave's demo function to find demo comment blocks that are associated with classdef methods.

[Done] Location info in the parse tree

Once all tokens are stored in the parse tree, we can eliminate the separate storage of line and column info and eliminate two arguments from nearly every tree_* class constructor. Any location info that is needed later for error messages, debugging, or code generation can be obtained from individual tokens.

MException object

  • Provide a mostly Matlab-compatible `MException` object.
  • Make the `MException` constructor a built-in function (this job may require some changes to the classdef implementation).
  • Fix the Octave interpreter to create an `MException` object when it throws an error and make that object available in `catch` blocks.

FTLK graphics widget

Eliminate fltk graphics widget or move to external package.

gnuplot graphics widget

Eliminate gnuplot graphics widget or move to external package.

Function objects

Refactor function objects.

arguments blocks

Finish implementation of Matlab-compatible argument validation blocks.

local functions

Implement Matlab-compatible local functions in script files.

string object

load_path class

Refactor (or rewrite) load_path class.

Broadcasting

  • Refactor broadcasting.
  • Make broadcasting work for sparse matrices.

GUI command widget

Make common command widget for Windows and Unixy systems work well enough to become the default command line interface for the GUI.

OpenGL graphics

  • Modernize our use of OpenGL graphics to use shader programs instead of the legacy OpenGL API.
  • Scaling plot data values/ranges to fit in single-precision OpenGL values

classdef

Refactor (or rewrite) classdef implementation.

Qt graphics widget

Refactor (or rewrite) Qt graphics widget.

graphics properties

Refactor graphics properties classes.

graphics threading issues

Fix handling of graphics properties to be properly thread safe.

HDF5 load and save

Implement Matlab compatible HDF5-based load and save functions.

  • Make the load and save commands compatible with Matlab's HDF5-based file format. Matlab users expect this and we need something like this to support large arrays anyway. As much as possible, the initial implementation should be written in Octave's scripting language and the proposed Low-level interface to HDF5 functions so that it can easily be updated and patched as needed while we are still working out the details. Only later should we consider translating performance-critical parts to C++, and then, only if really necessary.
  • Phase out Octave's own text and binary formats. Too much effort is required to maintain the code to support all the various formats.
  • Low-level interface to HDF5 functions. Create a thin wrapper for the HDF5 library. As much as possible, make it compatible with the Matlab interface to HDF5. However, we may support newer functions (as of 2020/10/30, the list of Matlab functions appears to correspond to an older version of the library than is presently available in the HDF5 library itself) and support for legacy functions has a low priority.

External editors

Make it possible to use external editors such as Emacs, vim, or others with the GUI in addition to Octave's built-in code editor.

who -file

Fix who -file to just read file and list info, not create dummy scope.

import

Make "import" work in a matlab-compatible way ==

Code quality

  • Better and more complete use of C++ namespaces.
  • Better use of C++ features. Especially standard library features as their implementation becomes more widely available. For example, we might be able to simplify some things in Octave by using the C++17 filesystem and special functions libraries, if they provide results that are at least as good what we are using now.
  • Eliminate C preprocessor macros where possible.
  • Use const in more parse tree functions.
  • remove, replace, or at least rename the "added static" concept in the symbol_record class.
  • Should not expose symbol_record in call_stack functions if possible.
  • Remove unused symbol_table/scope/record functions.
  • Do recursive functions work properly with load/save now?
  • Use enums for options internally (typically to replace bool values).
  • Audit global variables and eliminate them where possible.
  • Audit use of panic_* functions and replace with calls to error where possible.
  • Fix symbol visibility so we are mostly tagging namespace decls, not individual functions.
  • Complete use of dispatch types for functions (search for "classes:" to find the few current examples).
  • Tag for built-in functions to specify maxiumum number of inputs.

Windows distribution

Eliminate the following msys packages. Some might be removed entirely if they are unnecessary for running Octave or building Octave Forge packages. Otherwise, we should be building them from source as we do all other tools and libraries that are distributed with Octave. The difficulty is that although the msys packges are typically based on old versions of these packages, they sometimes have fixes that are needed to allow them to run properly on Windows systems. Note also that we distribute a termcap library, but the msys version of less depends on the msys termcap library.

  • bash
  • coreutils
  • diffutils
  • dos2unix
  • file
  • findutils
  • gawk
  • grep
  • gzip
  • less
  • libcrypt
  • libiconv
  • libintl
  • libmagic
  • libopenssl
  • make
  • msys-core
  • patch
  • perl
  • regex
  • sed
  • tar
  • termcap
  • unzip
  • wget
  • zip
  • zlib