The IO package is part of the
octave-forge project and provides input/output from/in external formats.
For the Excel/COM interface:
* A windows computer with Excel installed. Note that 64-bit MS-Office has no support for COMx /ActiveX so you might have to resort to the Java interfaces below
-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED. Currently (2013) windows-1.2.1 is the best option.
For the Java / Apache POI / JExcelAPI interfaces (general):
-forge java-1.2.8 package or later version on Linux* octave -forge java-1.2.8 with latest svn fixes on Windows/MingW
* Java JRE or JDK > 1.6.0 (hasn't been tested with earlier versions). Although not an Octave issue, as to security you'd better get the latest Java version anyway.
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.
** Matlab's xlsread flags some spreadsheet errors,
octave-forge just returns blank cells.** Octave -forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:
**# it relies on cached range values and thus may be out-of-date;
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.
** When using the Java interface, reading and writing xls-files by
octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave
-forge's xlsread doesn't.
** Octave can read either formula results (evaluated formulas) or the formula text strings; Matlab can't.
-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns;
octave-forge's xlswrite doesn't.
** When invoking Excel/COM interface,
octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).
==== Comparison of interfaces & usage ====
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in
octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.8) not all Excel functions have been implemented. Obviously, as new functions are added in every new Excel release it's hard to catch up for Apache POI. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.
Since io package version 2.2.0, the "OCT" interface has experimental write support for .xlsx and .ods formats, since io-2.2.2 (expected mid-May 2014) also for gnumeric. If you can't wait for gnumeric I/O you can checkout a snapshot from svn (see octave.sf.net, http://sourceforge.net/p/octave/code/HEAD/tree/trunk/octave-forge/main/io/)