IO package: Difference between revisions

Jump to navigation Jump to search
1,044 bytes added ,  24 July 2022
 
(5 intermediate revisions by the same user not shown)
Line 29: Line 29:
</nowiki></pre>
</nowiki></pre>


Be aware that writing to worksheets using the OCT interface wipes all formatting.


==== xlswrite / odswrite versus xlsopen / odsopen ..... xlsclose / odsclose ====
==== xlswrite / odswrite versus xlsopen / odsopen ..... xlsclose / odsclose ====
Line 55: Line 56:
* .xls - is an outdated default binary file format from <= Office 2003 - '''try to avoid this format!'''
* .xls - is an outdated default binary file format from <= Office 2003 - '''try to avoid this format!'''
* .xlsx - is the new default file format since Office 2007. [https://en.wikipedia.org/wiki/OOXML It consists of xml files stored in a .zip container.] - '''always save in or convert to this format!'''
* .xlsx - is the new default file format since Office 2007. [https://en.wikipedia.org/wiki/OOXML It consists of xml files stored in a .zip container.] - '''always save in or convert to this format!'''
* The ''(new)'' OCT interface can read ''(since version 1.2.5)'' and write ''(since version 2.2.0)'' .xlsx files dependency-free! No need of MS Windows+Office nor Java.
* The ''(new)'' OCT interface can read ''(since version 1.2.5)'' and write ''(since version 2.2.0)'' .xlsx files dependency-free! No need of MS Windows+Office nor Java, with one caveat: when writing it doesn't retain formatting. For that you still need one of the other interfaces.
* Windows is notorious for hiding "known" file extensions. However in Windows Explorer it is easy to change this and have Windows show all file extensions.
* Windows is notorious for hiding "known" file extensions. However in Windows Explorer it is easy to change this and have Windows show all file extensions.


==== different interfaces ====
==== different interfaces ====
Line 422: Line 422:
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''' or '''xmlbeans.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''' or '''xmlbeans.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)
* For POI-4.x.x additional .jar files are needed: commons-collection4-4.x.jar, xml-apis.jar
A note on Apache POI:
This used to be the most reliable and reasonably fast spreadsheet I/O interface SW. But sadly, each major Apache POI release turns out to have become much more complicated in the sense of finding out which .jar files (Java libraries) are needed for Octave. As of Apache 5.x.x, I (= IO package maintainer) have given up on hunting this down. AFAICS for 5.x.x commons-io, log4j-api and maybe others are also required.


JExcelAPI specific:
JExcelAPI specific:
Line 490: Line 495:


==== Spreadsheet formula support ====
==== Spreadsheet formula support ====
When using the COM, POI, JXL, and UNO interfaces you can:
When using the OCT, COM, POI, JXL, and UNO interfaces you can:
* (When reading, xls2oct) either read spreadsheet formula results, or the literal formula text strings (also works with OCT interface);
* (When reading, xls2oct) either read spreadsheet formula results, or the literal formula text strings (also works with OCT interface);
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings.
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings.
Line 502: Line 507:
Be aware that there's no formula evaluator in JExcelAPI (JXL) nor OpenXLS (OXS). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL' or 'OXS', do not expect meaningful results when reading those files later on ,unless you open them in Excel and write them back to disk.
Be aware that there's no formula evaluator in JExcelAPI (JXL) nor OpenXLS (OXS). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL' or 'OXS', do not expect meaningful results when reading those files later on ,unless you open them in Excel and write them back to disk.


While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet). <br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion.
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion. <br />
Reading and writing "shared formulas" with the OCT interface isn't (yet) supported; see bug
[https://savannah.gnu.org/bugs/?52875 bug #52875].


==== Matlab compatibility ====
==== Matlab compatibility ====
Line 595: Line 602:
=== OCT interface ===
=== OCT interface ===


Since io package version 1.2.4, an interface called "OCT" was added. Except for unzip, it has no dependencies. It's still experimental but fast! Feel free to test it and give us a feedback.
Since io package version 1.2.4, an interface called "OCT" was added. Except for unzip, it has no dependencies and it is faster than the Java-based interfaces.
Currently it supports reading and writing .xlsx, .ods and .gnumeric files (the latter in yet-to-be-released io-2.2.2).
If  
If  
<pre>chk_spreadsheet_support == 0</pre>
<pre>chk_spreadsheet_support == 0</pre>
Line 604: Line 610:
<pre>m = xlsread ('file.xlsx', 1, [], 'OCT');</pre>
<pre>m = xlsread ('file.xlsx', 1, [], 'OCT');</pre>


Since io package version 2.2.0, the "OCT" interface has experimental write support for .xlsx and .ods formats, since io-2.2.2 (expected mid-May 2014) also for gnumeric. If you can't wait for gnumeric I/O you can checkout a snapshot from svn (see octave.sf.net, http://sourceforge.net/p/octave/code/HEAD/tree/trunk/octave-forge/main/io/)
About development: <br />
The OCT interface makes use of regular expressions for parsing the XML contents of OOXML, ODS and gnumeric formats. While frowned upon by XML gurus (see for example [https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 here] for some amusing postings), using regexps is much faster than any current XML parser. But the trade-off is that regexps are fragile, esp. withregard to the order in which XML tags appear in XML nodes. <br />
Just for reassurance: to date we haven't seen any problems with the OCT interface for reading and writing regular data.
 


[[Category:Octave Forge]]
[[Category:Octave Forge]]
99

edits

Navigation menu