Editing IO package
Jump to navigation
Jump to search
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
The | The IO package is part of the octave-forge project and provides input/output from/in external formats. | ||
== ODS support == | |||
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org) | (ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org) | ||
=== Files content === | |||
* '''odsread.m''' — no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays. | * '''odsread.m''' — no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays. | ||
* '''odswrite.m''' — no-hassle write script for writing to an ODS file. | * '''odswrite.m''' — no-hassle write script for writing to an ODS file. | ||
Line 212: | Line 24: | ||
=== Required support software === | |||
For Windows (MingW): | For Windows (MingW): | ||
Line 223: | Line 33: | ||
For ODS access, you'll need to choose at least one of the following java class files collections: | For ODS access, you'll need to choose at least one of the following java class files collections: | ||
* (currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, 0.8.7 | * (currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, and 0.8.7 work OK!) & xercesImpl.jar. Get them here: | ||
** http:// | ** http://odftoolkit.org/projects/odfdom/pages/Home | ||
** http://odftoolkit.org/projects/odfdom/downloads/directory/current-version | |||
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download | ** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download | ||
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1. | * jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.2 (final) is the most recent one and recommended for Octave) | ||
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice | * OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath. | ||
These must be referenced with full pathnames in your javaclasspath. | These must be referenced with full pathnames in your javaclasspath. Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements | ||
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements | |||
=== Usage === | |||
(see “help ods<function_filename>” in octave terminal.) | (see “help ods<function_filename>” in octave terminal.) | ||
Line 252: | Line 62: | ||
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave. | If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave. | ||
=== Spreadsheet formula support === | |||
When using the OTK or UNO interface you can: | When using the OTK or UNO interface you can: | ||
Line 267: | Line 77: | ||
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in. | The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in. | ||
=== Gotchas === | |||
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size. | I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size. | ||
==== Date and time in ODS ==== | |||
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped). | Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped). | ||
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”. | OpenOffice.org stores dates as text strings like “yyyy-mm-dd”. | ||
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year). | MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year). | ||
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet. | Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet. | ||
Line 285: | Line 95: | ||
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays. | While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays. | ||
==== Java memory pool allocation size ==== | |||
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc. | The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc. | ||
Line 299: | Line 109: | ||
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint. | After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint. | ||
==== Reading cells containing errors ==== | |||
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc | Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc often contain invalid formulas but may have a 0 (null) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel). | ||
Smaller gotcha's : | Smaller gotcha's (only with jOpenDocument 1.2b2, fixed in 1.2b3+ and 1.2 final): | ||
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero). | * while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero). | ||
* a valid range MUST be specified, I haven't found a way to discover the actual occupied rows and columns (jOpenDocument can give the physical ones (= capacity) but that doesn't help). | |||
NOT fixed in | NOT fixed in version 1.2 final: | ||
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK. | * jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK. | ||
=== Matlab compatibility === | |||
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited. | AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited. | ||
odsread is fairly function-compatible to xlsread, however. | odsread is fairly function-compatible to xlsread, however. | ||
Line 314: | Line 125: | ||
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO. | Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO. | ||
=== Comparison of interfaces === | |||
The ODFtoolkit is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers. | |||
The | |||
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers. | While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers. | ||
The | The jOpenDocument interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model. | ||
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent. | However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent. | ||
Line 335: | Line 144: | ||
However, UNO is not stable yet (see below). | However, UNO is not stable yet (see below). | ||
=== Troubleshooting === | |||
Some hints for troubleshooting ODS support are given here. | Some hints for troubleshooting ODS support are given here. | ||
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script. | Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script. | ||
Line 344: | Line 153: | ||
# Check Java memory settings. Try javamem | # Check Java memory settings. Try javamem | ||
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better) | ## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better) | ||
## If it doesn't work, do | ## If it doesn't work, do:<br><code> | ||
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime') | ##:rt = java_invoke ('java.lang.Runtime', 'getRuntime') | ||
##:rt.gc | ##:rt.gc | ||
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</ | ##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</code> | ||
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation. | ## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation. | ||
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned. | # Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned. | ||
Line 362: | Line 170: | ||
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath. | ** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath. | ||
=== Development === | |||
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct. | As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct. | ||
Line 379: | Line 187: | ||
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets. | But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets. | ||
=== ODFDOM versions === | |||
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-(( | I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-(( | ||
In addition processing ODS files became significantly slower (up to 7 times!). | In addition processing ODS files became significantly slower (up to 7 times!). | ||
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on | End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on (early 2011) version 0.8.7 has been tested too - this needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet. | ||
So at the moment ( | So at the moment (May 2011 = last I looked) only odfdom versions 0.7.5, 0.8.6 and 0.8.7 are supported. | ||
If you want to experiment with odfdom 0.8 & 0.8.5, you can try: | If you want to experiment with odfdom 0.8 & 0.8.5, you can try: | ||
Line 391: | Line 199: | ||
* oct2ods.m (revision 7159) | * oct2ods.m (revision 7159) | ||
[[Category:Packages|Package documentation]] | |||
[[Category:OctaveForge|Packages]] | |||
[[Category: |