IO package: Difference between revisions

Jump to navigation Jump to search
900 bytes added ,  18 October 2019
m
Fix typo
m (Fix typo)
(11 intermediate revisions by 6 users not shown)
Line 1: Line 1:
The IO package is part of the octave-forge project and provides input/output from/in external formats.
The {{Forge|io|IO package}} is part of the Octave Forge project and provides input/output from/in external formats.
 
<div class="tocinline">__TOC__</div>


=== About read/write support ===
=== About read/write support ===
Line 36: Line 38:
* First open the spreadsheet file using xlsopen (for Excel or gnumeric files) or odsopen (.ods or .gnumeric).  
* First open the spreadsheet file using xlsopen (for Excel or gnumeric files) or odsopen (.ods or .gnumeric).  
'''NOTE''': the output of these functions is a file pointer handle that you should treat carefully!
'''NOTE''': the output of these functions is a file pointer handle that you should treat carefully!
* (for reading data) Read the data using raw_data = xls2oct (fileptr [,sheet#] [,cellrange] [,options])
* (for reading data) Read the data using raw_data = xls2oct (<fileptr> [,sheet#] [,cellrange] [,options])
* Next, optionally split the data in numerical, text and raw data and optionally get the limits of where these came from:
* Next, optionally split the data in numerical, text and raw data and optionally get the limits of where these came from:
[num, txt, raw, lims] = parsecell (data, <fileptr.lims>)
[num, txt, raw, lims] = parsecell (data, <fileptr.lims>)
* (for writing data) Write the data using <fileptr> = oct2xls (data, <fileptr> [,sheet#] [,cellrange] [,options])
* (for writing data) Write the data using <fileptr> = oct2xls (data, <fileptr> [,sheet#] [,cellrange] [,options])
* When you're finished, DO NOT FORGET to colse the file pointer handle:
* When you're finished, DO NOT FORGET to close the file pointer handle:
<fileptr> = xlsclose (<fileptr>)</pre>
<fileptr> = xlsclose (<fileptr>)</pre>


Line 47: Line 49:


Obviously this is much more flexible (and FASTER) than xlsread and xlswrite. In fact, Octave's io package xlsread is a mere wrapper for an xlsopen-xls2oct-parsecell-xlsclose sequence. Similarly for xlswrite, odsread, and odswrite.
Obviously this is much more flexible (and FASTER) than xlsread and xlswrite. In fact, Octave's io package xlsread is a mere wrapper for an xlsopen-xls2oct-parsecell-xlsclose sequence. Similarly for xlswrite, odsread, and odswrite.


==== .xls ~= .xlsx ====
==== .xls ~= .xlsx ====
Line 94: Line 95:


And so on ...
And so on ...
==== 32 vs. 64-bit issues ====
Generally, if you use a Java-based interface for spreadsheet I/O, it doesn't matter much whether you use Octave 32-bit or Octave 64-bit.
However, the UNO interface (for LibreOffice & OpenOffice.org) is more pertinent: 64-bit Octave can only use a 64-bit LibreOffice with UNO, same for 32 bit.




Line 126: Line 133:
</nowiki></pre>
</nowiki></pre>


An easier way is to collect all required Java class libs fo spreadsheet I/O (the .jar files) in one subdir and have chk_spreadsheet_support .m sort it all out:
An easier way is to collect all required Java class libs for spreadsheet I/O (the .jar files) in one subdir and have chk_spreadsheet_support .m sort it all out:
<pre><nowiki>
<pre><nowiki>
octave:8>  chk_spreadsheet_support ('/full/path/to/subdir/with/.jar/files')
octave:8>  chk_spreadsheet_support ('/full/path/to/subdir/with/.jar/files')
Line 158: Line 165:


And reading/writing .xls files should work.
And reading/writing .xls files should work.
POI is located differently on every system. To easily find where are the files and their names try to search for: libapache-poi-java.list.
in terminal:
<pre><nowiki>
find / -name "libapache-poi-java*" 2>/dev/null
</nowiki></pre>
in octave (poi path in ubuntu):
<pre><nowiki>
fid = fopen ("/var/lib/dpkg/info/libapache-poi-java.list");
line = fgetl (fid);
while (line != -1)
    javaaddpath(line);
    line = fgetl (fid);
endwhile
fclose (fid);
pkg load io;
disp(chk_spreadsheet_support); % should be 2
</nowiki></pre>


== Detailed Information (TL) ==
== Detailed Information (TL) ==
Line 384: Line 409:
For the Excel/COM interface:
For the Excel/COM interface:
* A windows computer with Excel installed. Note that 64-bit MS-Office has no support for COMx /ActiveX so you might have to resort to the Java interfaces below
* A windows computer with Excel installed. Note that 64-bit MS-Office has no support for COMx /ActiveX so you might have to resort to the Java interfaces below
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED. Currently (2013) windows-1.2.1 is the best option.
* Octave Forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED. Currently (2013) windows-1.2.1 is the best option.


For the Java / Apache POI / JExcelAPI interfaces (general):
For the Java / Apache POI / JExcelAPI interfaces (general):
* octave-forge java-1.2.8 package or later version on Linux
* octave Forge java-1.2.8 package or later version on Linux
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW
* octave Forge java-1.2.8 with latest svn fixes on Windows/MingW
* Java JRE or JDK > 1.6.0 (hasn't been tested with earlier versions). Although not an Octave issue, as to security you'd better get the latest Java version anyway.
* Java JRE or JDK > 1.6.0 (hasn't been tested with earlier versions). Although not an Octave issue, as to security you'd better get the latest Java version anyway.


Line 485: Line 510:
* xlsread
* xlsread
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.
** Matlab's xlsread flags some spreadsheet errors, Octave Forge just returns blank cells.
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:
** Octave Forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:
**# it relies on cached range values and thus may be out-of-date;
**# it relies on cached range values and thus may be out-of-date;
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....
** When using the Java interface, reading and writing xls-files by Octave Forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave Forge's xlsread doesn't.
** Octave can read either formula results (evaluated formulas) or the formula text strings; Matlab can't.
** Octave can read either formula results (evaluated formulas) or the formula text strings; Matlab can't.


* xlswrite
* xlswrite
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).
** Octave Forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; Octave Forge's xlswrite doesn't.


* xlsfinfo
* xlsfinfo
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).
** When invoking Excel/COM interface, Octave Forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).


==== Comparison of interfaces & usage ====
==== Comparison of interfaces & usage ====
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.


'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in Octave Forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.


'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.8) not all Excel functions have been implemented. Obviously, as new functions are added in every new Excel release it's hard to catch up for Apache POI. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.8) not all Excel functions have been implemented. Obviously, as new functions are added in every new Excel release it's hard to catch up for Apache POI. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.
Line 580: Line 605:
Since io package version 2.2.0, the "OCT" interface has experimental write support for .xlsx and .ods formats, since io-2.2.2 (expected mid-May 2014) also for gnumeric. If you can't wait for gnumeric I/O you can checkout a snapshot from svn (see octave.sf.net, http://sourceforge.net/p/octave/code/HEAD/tree/trunk/octave-forge/main/io/)
Since io package version 2.2.0, the "OCT" interface has experimental write support for .xlsx and .ods formats, since io-2.2.2 (expected mid-May 2014) also for gnumeric. If you can't wait for gnumeric I/O you can checkout a snapshot from svn (see octave.sf.net, http://sourceforge.net/p/octave/code/HEAD/tree/trunk/octave-forge/main/io/)


[[Category:Octave-Forge]]
[[Category:Octave Forge]]
21

edits

Navigation menu