https://wiki.octave.org/wiki/api.php?action=feedcontributions&user=83.163.225.168&feedformat=atomOctave - User contributions [en]2024-03-29T06:19:26ZUser contributionsMediaWiki 1.39.2https://wiki.octave.org/wiki/index.php?title=Mapping_package&diff=13395Mapping package2020-10-11T09:11:36Z<p>83.163.225.168: /* Missing functions */</p>
<hr />
<div>The {{Forge|mapping|mapping package}} is part of the Octave Forge project.<br />
<br />
== Development ==<br />
=== Roadmap ===<br />
<br />
Targets for next mapping package releases:<br />
* Add rasterwrite.m (complementary to rasterread). Please contact package maintainer, there's already a C++ skeleton.<br />
* Maybe add wrapper functions around rasterread and rasterwrite (for e.g., geotiff, ASCII grid, etc.).<br />
* Add more options to mapshow.<br />
* Implement support for projections. I (current mapping pkg maintainer) have little need nor much experience with this subject => help welcome! BTW there's an OF proj package that offers some of the functionality.<br />
* Add geodesy functions. Patches have been submitted and integrated. The current roadmap is to further integrate Felipe Nievinsky's geodesy toolbox, updated by M. Hirsch.<br />
<br />
Several functions in the current mapping package release (1.4.0) and upcoming (1.4.1) haven't had much testing. Please try them out and report issues in the bug tracker with "[octave forge](mapping)" tag in the title.<br />
<br />
=== Missing functions ===<br />
Follows an incomplete list of stuff missing in the mapping package to be matlab compatible. Bugs are not listed here, [https://savannah.gnu.org/bugs/?func=search&group=octave search] and [https://savannah.gnu.org/bugs/?func=additem&group=octave report] them on the bug tracker instead.<br />
<br />
As a number of polygon functions in the mapping package relate to geometry stuff, chances are that some of that lacking functionality is already present in the [http://wiki.octave.org/Geometry_package geometry package]. In fact there is a discussion about which functions belong where. Matlab compatibility suggests the mapping package, but based on similar functionality the geometry package is probably a better home.<br />
<br />
Recent mapping toolbox versions are classdef-based. It is unclear yet if we need to follow this route as classdef support in Octave is still experimental and has no file I/O.<br />
<br />
{{Note|this entire section is about upcoming version (expected May 2020) 1.4.1. If a Matlab function is missing from the list and does not appear on the current release of the package, confirm that is also missing in the [http://hg.code.sf.net/p/octave/mapping/file sources], see esp. the INDEX file, before adding it.}}<br />
<br />
==== Alphabetical list ====<br />
<div style="column-count:4;-moz-column-count:4;-webkit-column-count:4"><br />
* arcgridread '''[1]'''<br />
* areaint<br />
* areamat<br />
* areaquad<br />
* avhrrgoode<br />
* avhrrlambert<br />
* axesm<br />
* axesmui<br />
* bufferm<br />
* bufgeoquad<br />
* camposm<br />
* camtargm<br />
* camupm<br />
* clabelm<br />
* clegendm<br />
* clipdata<br />
* clma<br />
* clrmenu<br />
* contour3m<br />
* contourcbar<br />
* contourcmap<br />
* contourfm<br />
* contourm<br />
* defaultm<br />
* demcmap<br />
* demdataui<br />
* distortcalc<br />
* dted<br />
* dteds<br />
* ecef2lv<br />
* ellipse1<br />
* etopo<br />
* etopo5<br />
* flatearthpoly<br />
* framem<br />
* gc2sc<br />
* gcm<br />
* gcpmap<br />
* gcxgc<br />
* gcxsc<br />
* geoloc2grid<br />
* geopoint<br />
* geoquadline<br />
* geoquadpt<br />
* georasterref<br />
* geoshape<br />
* geoshow<br />
* geotiff2mstruct<br />
* geotiffinfo '''[2]'''<br />
* geotiffread '''[1]'''<br />
* geotiffwrite<br />
* getm<br />
* getworldfilename<br />
* globedem<br />
* globedems<br />
* gradientm<br />
* grid2image<br />
* gridm<br />
* gshhs<br />
* gtextm<br />
* gtopo30<br />
* gtopo30s<br />
* handlem<br />
* imbedm<br />
* ingeoquad<br />
* inputm<br />
* interpm<br />
* intersectgeoquad<br />
* intrplat<br />
* intrplon<br />
* ismap<br />
* ispolycw '''4'''<br />
* ispolyccw '''[4]'''<br />
* kmlwrite<br />
* kmlwriteline<br />
* kmlwritepoint<br />
* lightm<br />
* lightmui<br />
* linecirc<br />
* linem<br />
* los2<br />
* ltln2val<br />
* lv2ecef<br />
* makeattribspec<br />
* makedbfspec<br />
* makerefmat<br />
* map.geodesy.AuthalicLatitudeConverter<br />
* map.geodesy.ConformalLatitudeConverter<br />
* map.geodesy.IsometricLatitudeConverter<br />
* map.geodesy.RectifyingLatitudeConverter<br />
* map.geodesy.isdegree<br />
* map.rasterref.GeographicRasterReference<br />
* map.rasterref.MapRasterReference<br />
* maplist<br />
* mapoutline<br />
* mappoint<br />
* mapprofile<br />
* maprasterref<br />
* maps<br />
* mapshape<br />
* mapshow '''[3]'''<br />
* maptool<br />
* maptrim<br />
* maptriml<br />
* maptrimp<br />
* maptrims<br />
* mapview<br />
* mdistort<br />
* meridianfwd<br />
* meshlsrm<br />
* meshm<br />
* mfwdtran<br />
* minvtran<br />
* mlabel<br />
* mlabelzero22pi<br />
* newpole<br />
* northarrow<br />
* oblateSpheroid<br />
* org2pol<br />
* originui<br />
* outlinegeoquad<br />
* panzoom<br />
* parallelui<br />
* pcolorm<br />
* plabel<br />
* plot3m<br />
* plotm<br />
* polcmap<br />
* poly2ccw '''[4]'''<br />
* poly2cw '''[4]'''<br />
* poly2fv<br />
* polybool '''[4]'''<br />
* polyjoin '''[4]'''<br />
* polymerge<br />
* polysplit '''[4]'''<br />
* polyxpoly<br />
* projfwd<br />
* projinv<br />
* projlist<br />
* putpole<br />
* quiver3m<br />
* quiverm<br />
* reducem<br />
* referenceSphere<br />
* refmatToGeoRasterReference<br />
* refmatToMapRasterReference<br />
* refmatToWorldFileMatrix<br />
* refvecToGeoRasterReference<br />
* resizem<br />
* rhxrh<br />
* rotatem<br />
* rotatetext<br />
* rsphere<br />
* satbath<br />
* scaleruler<br />
* scatterm<br />
* scircle1<br />
* scircle2<br />
* scircleg<br />
* scirclui<br />
* scxsc<br />
* sdtsdemread '''[1]'''<br />
* sdtsinfo '''[2]'''<br />
* sectorg<br />
* setm<br />
* shaderel<br />
* showaxes<br />
* stem3m<br />
* surfacem<br />
* surflm<br />
* surflsrm<br />
* surfm<br />
* symbolm<br />
* tbase<br />
* tightmap<br />
* tissot<br />
* track1<br />
* track2<br />
* trackg<br />
* trackui<br />
* unwrapMultipart<br />
* usamap<br />
* usgs24kdem<br />
* usgsdem<br />
* usgsdems<br />
* utmgeoid<br />
* utmzoneui<br />
* vec2mtx<br />
* vfwdtran<br />
* viewshed<br />
* vinvtran<br />
* vmap0data<br />
* vmap0read<br />
* vmap0rhead<br />
* vmap0ui<br />
* webmap<br />
* WebMapServer<br />
* wmcenter<br />
* wmclose<br />
* wmlimits<br />
* wmline<br />
* wmmarker<br />
* wmprint<br />
* wmremove<br />
* WMSCapabilities<br />
* wmsfind<br />
* wmsinfo<br />
* WMSLayer<br />
* WMSMapRequest<br />
* wmsread<br />
* wmsupdate<br />
* wmzoom<br />
* worldFileMatrixToRefmat<br />
* worldfileread<br />
* worldfilewrite<br />
* worldmap<br />
* zdatam<br />
</div><br />
<br />
* [1] ''As of mapping-1.2.1, rasterread can read any raster file that the GDAL library supports; see http://www.gdal.org/frmt_various.html. No separate functions for individual file format are required. There's some work to do on unifying output formats.''<br />
* [2] ''As [1], rasterinfo does the job.''<br />
* [3] ''As of mapping-1.2.0, there's a basic mapshow.''<br />
* [4] ''Implemented in OF geometry-4.0.0''<br />
* [5] ''See OF image package''<br />
* * ''Implemented in dev version''<br />
<br />
==== Grouped list ====<br />
(needs update relative to table above and newer mapping toolbox releases)<br />
(see numbered notes above)<br />
{| class="wikitable"<br />
| arcgridread '''[1]'''||File Import and Export||Standard File Formats<br />
|-<br />
| geotiff2mstruct||File Import and Export||Standard File Formats<br />
|-<br />
| geotiffinfo '''[2]'''||File Import and Export||Standard File Formats<br />
|-<br />
| geotiffread '''[1]'''||File Import and Export||Standard File Formats<br />
|-<br />
| geotiffwrite||File Import and Export||Standard File Formats<br />
|-<br />
| sdtsdemread '''[1]'''||File Import and Export||Standard File Formats<br />
|-<br />
| sdtsinfo '''[2]'''||File Import and Export||Standard File Formats<br />
|-<br />
| worldfileread||File Import and Export||Standard File Formats<br />
|-<br />
| worldfilewrite||File Import and Export||Standard File Formats<br />
|-<br />
| getworldfilename||File Import and Export||Standard File Formats<br />
|-<br />
| gpxread||File Import and Export||Standard File Formats<br />
|-<br />
| kmlwrite||File Import and Export||Standard File Formats<br />
|-<br />
| kmlwriteline||File Import and Export||Standard File Formats<br />
|-<br />
| kmlwritepoint||File Import and Export||Standard File Formats<br />
|-<br />
| makeattribspec||File Import and Export||Standard File Formats<br />
|-<br />
| makedbfspec||File Import and Export||Standard File Formats<br />
|-<br />
| imread '''[5]'''||File Import and Export||Standard File Formats<br />
|-<br />
| imwrite '''[5]'''||File Import and Export||Standard File Formats<br />
|-<br />
| demdataui||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| dted||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| dteds||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| etopo||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| etopo5||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| globedem||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| globedems||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| gtopo30||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| gtopo30s||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| satbath||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| tbase||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| usgs24kdem||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| usgsdem '''[1]'''||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| usgsdems '''[1]'''||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| avhrrgoode||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| avhrrlambert||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| egm96geoid||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| gshhs||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| vmap0data||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| vmap0read||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| vmap0rhead||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| vmap0ui||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| WebMapServer||Web Maps||Web Map Service<br />
|-<br />
| WMSCapabilities||Web Maps||Web Map Service<br />
|-<br />
| WMSLayer||Web Maps||Web Map Service<br />
|-<br />
| WMSMapRequest||Web Maps||Web Map Service<br />
|-<br />
| wmsfind||Web Maps||Web Map Service<br />
|-<br />
| wmsinfo||Web Maps||Web Map Service<br />
|-<br />
| wmsread||Web Maps||Web Map Service<br />
|-<br />
| wmsupdate||Web Maps||Web Map Service<br />
|-<br />
| webmap||Web Maps||Web Map Display<br />
|-<br />
| wmclose||Web Maps||Web Map Display<br />
|-<br />
| wmprint||Web Maps||Web Map Display<br />
|-<br />
| wmmarker||Web Maps||Web Map Display<br />
|-<br />
| wmline||Web Maps||Web Map Display<br />
|-<br />
| wmremove||Web Maps||Web Map Display<br />
|-<br />
| wmcenter||Web Maps||Web Map Display<br />
|-<br />
| wmzoom||Web Maps||Web Map Display<br />
|-<br />
| wmlimits||Web Maps||Web Map Display<br />
|-<br />
| axesm||Map Display||Map Layout and Axes<br />
|-<br />
| axesmui||Map Display||Map Layout and Axes<br />
|-<br />
| clma||Map Display||Map Layout and Axes<br />
|-<br />
| gcm||Map Display||Map Layout and Axes<br />
|-<br />
| getm||Map Display||Map Layout and Axes<br />
|-<br />
| handlem||Map Display||Map Layout and Axes<br />
|-<br />
| handlem-ui||Map Display||Map Layout and Axes<br />
|-<br />
| ismap||Map Display||Map Layout and Axes<br />
|-<br />
| setm||Map Display||Map Layout and Axes<br />
|-<br />
| showaxes||Map Display||Map Layout and Axes<br />
|-<br />
| tightmap||Map Display||Map Layout and Axes<br />
|-<br />
| usamap||Map Display||Map Layout and Axes<br />
|-<br />
| worldmap||Map Display||Map Layout and Axes<br />
|-<br />
| namem||Map Display||Map Layout and Axes<br />
|-<br />
| tagm||Map Display||Map Layout and Axes<br />
|-<br />
| framem||Map Display||Map Layout and Axes<br />
|-<br />
| ingeoquad||Map Display||Map Layout and Axes<br />
|-<br />
| gridm||Map Display||Map Layout and Axes<br />
|-<br />
| angl2str||Map Display||Map Layout and Axes<br />
|-<br />
| mlabel||Map Display||Map Layout and Axes<br />
|-<br />
| mlabelzero22pi||Map Display||Map Layout and Axes<br />
|-<br />
| northarrow||Map Display||Map Layout and Axes<br />
|-<br />
| plabel||Map Display||Map Layout and Axes<br />
|-<br />
| rotatetext||Map Display||Map Layout and Axes<br />
|-<br />
| scaleruler||Map Display||Map Layout and Axes<br />
|-<br />
| geoshow||Map Display||Vector and Raster Map Display<br />
|-<br />
| grid2image||Map Display||Vector and Raster Map Display<br />
|-<br />
| linem||Map Display||Vector and Raster Map Display<br />
|-<br />
| mapshow '''[3]'''||Map Display||Vector and Raster Map Display<br />
|-<br />
| meshm||Map Display||Vector and Raster Map Display<br />
|-<br />
| pcolorm||Map Display||Vector and Raster Map Display<br />
|-<br />
| plotm||Map Display||Vector and Raster Map Display<br />
|-<br />
| plot3m||Map Display||Vector and Raster Map Display<br />
|-<br />
| surfm||Map Display||Vector and Raster Map Display<br />
|-<br />
| usamap||Map Display||Vector and Raster Map Display<br />
|-<br />
| worldmap||Map Display||Vector and Raster Map Display<br />
|-<br />
| camposm||Map Display||3-D Map Display<br />
|-<br />
| camtargm||Map Display||3-D Map Display<br />
|-<br />
| camupm||Map Display||3-D Map Display<br />
|-<br />
| daspectm||Map Display||3-D Map Display<br />
|-<br />
| demcmap||Map Display||3-D Map Display<br />
|-<br />
| lightm||Map Display||3-D Map Display<br />
|-<br />
| lightmui||Map Display||3-D Map Display<br />
|-<br />
| meshlsrm||Map Display||3-D Map Display<br />
|-<br />
| shaderel||Map Display||3-D Map Display<br />
|-<br />
| surflm||Map Display||3-D Map Display<br />
|-<br />
| surflsrm||Map Display||3-D Map Display<br />
|-<br />
| surfacem||Map Display||3-D Map Display<br />
|-<br />
| zdatam||Map Display||3-D Map Display<br />
|-<br />
| clabelm||Map Display||Contour Maps<br />
|-<br />
| clegendm||Map Display||Contour Maps<br />
|-<br />
| contourcbar||Map Display||Contour Maps<br />
|-<br />
| contourcmap||Map Display||Contour Maps<br />
|-<br />
| contourm||Map Display||Contour Maps<br />
|-<br />
| contour3m||Map Display||Contour Maps<br />
|-<br />
| contourfm||Map Display||Contour Maps<br />
|-<br />
| quiverm||Map Display||Thematic Maps<br />
|-<br />
| quiver3m||Map Display||Thematic Maps<br />
|-<br />
| scatterm||Map Display||Thematic Maps<br />
|-<br />
| stem3m||Map Display||Thematic Maps<br />
|-<br />
| symbolm||Map Display||Thematic Maps<br />
|-<br />
| clrmenu||Map Display||Interaction with Maps<br />
|-<br />
| gcpmap||Map Display||Interaction with Maps<br />
|-<br />
| gtextm||Map Display||Interaction with Maps<br />
|-<br />
| inputm||Map Display||Interaction with Maps<br />
|-<br />
| maptool||Map Display||Interaction with Maps<br />
|-<br />
| maptrim||Map Display||Interaction with Maps<br />
|-<br />
| mapview||Map Display||Interaction with Maps<br />
|-<br />
| originui||Map Display||Interaction with Maps<br />
|-<br />
| parallelui||Map Display||Interaction with Maps<br />
|-<br />
| bufferm||Data Analysis||Vector Data<br />
|-<br />
| closePolygonParts||Data Analysis||Vector Data<br />
|-<br />
| extractfield||Data Analysis||Vector Data<br />
|-<br />
| flatearthpoly||Data Analysis||Vector Data<br />
|-<br />
| interpm||Data Analysis||Vector Data<br />
|-<br />
| intrplat||Data Analysis||Vector Data<br />
|-<br />
| intrplon||Data Analysis||Vector Data<br />
|-<br />
| linecirc||Data Analysis||Vector Data<br />
|-<br />
| polcmap||Data Analysis||Vector Data<br />
|-<br />
| polyjoin||Data Analysis||Vector Data<br />
|-<br />
| polymerge||Data Analysis||Vector Data<br />
|-<br />
| polysplit||Data Analysis||Vector Data<br />
|-<br />
| reducem||Data Analysis||Vector Data<br />
|-<br />
| removeExtraNanSeparators||Data Analysis||Vector Data<br />
|-<br />
| ispolycw||Data Analysis||Vector Data<br />
|-<br />
| poly2ccw||Data Analysis||Vector Data<br />
|-<br />
| poly2cw||Data Analysis||Vector Data<br />
|-<br />
| poly2fv||Data Analysis||Vector Data<br />
|-<br />
| polybool '''[4]'''||Data Analysis||Vector Data<br />
|-<br />
| polyxpoly||Data Analysis||Vector Data<br />
|-<br />
| geopoint||Data Analysis||Vector Data<br />
|-<br />
| geoshape||Data Analysis||Vector Data<br />
|-<br />
| mappoint||Data Analysis||Vector Data<br />
|-<br />
| mapshape||Data Analysis||Vector Data<br />
|-<br />
| map.rasterref.GeographicRasterReference||Data Analysis||Raster Data and Representations<br />
|-<br />
| map.rasterref.MapRasterReference||Data Analysis||Raster Data and Representations<br />
|-<br />
| geoloc2grid||Data Analysis||Raster Data and Representations<br />
|-<br />
| imbedm||Data Analysis||Raster Data and Representations<br />
|-<br />
| ltln2val||Data Analysis||Raster Data and Representations<br />
|-<br />
| mapoutline||Data Analysis||Raster Data and Representations<br />
|-<br />
| resizem||Data Analysis||Raster Data and Representations<br />
|-<br />
| limitm||Data Analysis||Raster Data and Representations<br />
|-<br />
| georasterref||Data Analysis||Raster Data and Representations<br />
|-<br />
| makerefmat||Data Analysis||Raster Data and Representations<br />
|-<br />
| maprasterref||Data Analysis||Raster Data and Representations<br />
|-<br />
| refmatToGeoRasterReference||Data Analysis||Raster Data and Representations<br />
|-<br />
| refmatToMapRasterReference||Data Analysis||Raster Data and Representations<br />
|-<br />
| refmatToWorldFileMatrix||Data Analysis||Raster Data and Representations<br />
|-<br />
| refvecToGeoRasterReference||Data Analysis||Raster Data and Representations<br />
|-<br />
| worldFileMatrixToRefmat||Data Analysis||Raster Data and Representations<br />
|-<br />
| mapprofile||Data Analysis||Conversion Between Vector and Raster Data<br />
|-<br />
| vec2mtx||Data Analysis||Conversion Between Vector and Raster Data<br />
|-<br />
| gradientm||Data Analysis||Terrain Data Analysis<br />
|-<br />
| los2||Data Analysis||Terrain Data Analysis<br />
|-<br />
| viewshed||Data Analysis||Terrain Data Analysis<br />
|-<br />
| wgs84Ellipsoid||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| earthRadius||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| rcurve||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| rsphere||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| geocentricLatitude||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| parametricLatitude||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| geodeticLatitudeFromGeocentric||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| geodeticLatitudeFromParametric||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| axes2ecc||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| majaxis||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| minaxis||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| ecc2flat||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| flat2ecc||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| ecc2n||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| n2ecc||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| oblateSpheroid||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| referenceEllipsoid||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| referenceSphere||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| map.geodesy.AuthalicLatitudeConverter||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| map.geodesy.ConformalLatitudeConverter||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| map.geodesy.IsometricLatitudeConverter||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| map.geodesy.RectifyingLatitudeConverter||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| str2angle||Coordinates, Geodesy, and Projections||Lengths and Angles<br />
|-<br />
| unwrapMultipart||Coordinates, Geodesy, and Projections||Lengths and Angles<br />
|-<br />
| map.geodesy.isdegree||Coordinates, Geodesy, and Projections||Lengths and Angles<br />
|-<br />
| azimuth||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| departure||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| distance||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| gc2sc||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| gcxgc||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| gcxsc||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| meridianarc||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| meridianfwd||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| reckon||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| rhxrh||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| track1||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| track2||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| trackg||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| trackui||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| ellipse1||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| deg2km||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| deg2nm||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| deg2sm||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| gcxsc||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| rad2nm||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| rad2sm||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| scircle1||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| scircle2||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| scircleg||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| scirclui||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| scxsc||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| sectorg||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| areaint||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| areamat||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| areaquad||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| bufgeoquad||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| geoquadline||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| geoquadpt||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| ingeoquad||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| intersectgeoquad||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| outlinegeoquad||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| map.rasterref.GeographicRasterReference||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| map.rasterref.MapRasterReference||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| antipode||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| mfwdtran||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| minvtran||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| newpole||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| org2pol||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| projfwd||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| projinv||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| putpole||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| rotatem||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| defaultm||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| geotiff2mstruct||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| maplist||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| maps||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| mfwdtran||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| minvtran||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| projlist||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| vfwdtran||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| vinvtran||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| clipdata||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| distortcalc||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| maptriml||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| maptrimp||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| maptrims||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| mdistort||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| tissot||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| utmgeoid||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| utmzone||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| utmzoneui||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ecef2geodetic||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| geodetic2ecef||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| geodetic2enu||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| geodetic2ned||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| geodetic2aer||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| enu2geodetic||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ned2geodetic||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| aer2geodetic||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ecef2enu||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ecef2ned||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ecef2aer||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| enu2ecef||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ned2ecef||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| aer2ecef||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| aer2enu||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| aer2ned||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| enu2aer||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ned2aer||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ecef2enuv||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ecef2nedv||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| enu2ecefv||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ned2ecefv||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
|}<br />
<br />
=== Missing options ===<br />
TBD<br />
<br />
=== Contributing ===<br />
* See for example [[User:Sandeepmv#Y:_Your_task]]<br />
* geod toolbox [https://drive.google.com/file/d/0B-I95wETyqQidnZWbm5TbzZRcHc/edit?usp=sharing] (BSD-licensed, available from its author outside of File Exchange)<br />
<br />
[[Category:Octave Forge]]<br />
[[Category:Missing functions]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=Mapping_package&diff=10141Mapping package2017-03-24T20:13:44Z<p>83.163.225.168: /* Missing functions */</p>
<hr />
<div>The {{Forge|mapping|mapping package}} is part of the octave-forge project.<br />
<br />
== Development ==<br />
=== Roadmap ===<br />
<br />
Targets for next mapping package releases:<br />
* Add rasterwrite.m (complementary to rasterread). Please contact package maintainer, there's already a C++ skeleton.<br />
* Maybe add wrapper functions around rasterread and rasterwrite (for e.g., geotiff, ASCII grid, etc.).<br />
* Add more options to mapshow.<br />
* Implement support for projections. I (current mapping pkg maintainer) have little need nor much experience with this subject => help welcome!<br />
* Add geodesy functions. Patches have been submitted but not yet integrated.<br />
<br />
Several functions in the current mapping package release (1.2.1) haven't had much testing. Please try them out and report issues in the bug tracker with "[OF] mapping pkg:" tag in the title.<br />
<br />
=== Missing functions ===<br />
Follows an incomplete list of stuff missing in the mapping package to be matlab compatible. Bugs are not listed here, [https://savannah.gnu.org/bugs/?func=search&group=octave search] and [https://savannah.gnu.org/bugs/?func=additem&group=octave report] them on the bug tracker instead.<br />
<br />
As a number of polygon functions in the mapping package relate to geometry stuff, chances are that some of that lacking functionality is already present in the [http://wiki.octave.org/Geometry_package geometry package]. In fact there is a discussion about which functions belong where. Matlab compatibility suggests the mapping package, but based on similar functionality the geometry package might also be a good home.<br />
<br />
{{Note|this entire section is about current version 1.2.1. If a Matlab function is missing from the list and does not appear on the current release of the package, confirm that is also missing in the [https://sourceforge.net/p/octave/mapping/development sources] before adding it.}}<br />
<br />
==== Alphabetical list ====<br />
<div style="column-count:4;-moz-column-count:4;-webkit-column-count:4"><br />
* aer2ecef<br />
* aer2enu<br />
* aer2geodetic<br />
* aer2ned<br />
* angl2str<br />
* arcgridread '''[1]'''<br />
* areaint<br />
* areamat<br />
* areaquad<br />
* avhrrgoode<br />
* avhrrlambert<br />
* axes2ecc<br />
* axesm<br />
* axesmui<br />
* bufferm<br />
* bufgeoquad<br />
* camposm<br />
* camtargm<br />
* camupm<br />
* clabelm<br />
* clegendm<br />
* clipdata<br />
* clma<br />
* closePolygonParts<br />
* clrmenu<br />
* contour3m<br />
* contourcbar<br />
* contourcmap<br />
* contourfm<br />
* contourm<br />
* defaultm<br />
* deg2nm<br />
* deg2sm<br />
* demcmap<br />
* demdataui<br />
* departure<br />
* distortcalc<br />
* dted<br />
* dteds<br />
* earthRadius<br />
* ecc2flat<br />
* ecc2n<br />
* ecef2aer<br />
* ecef2enu<br />
* ecef2enuv<br />
* ecef2geodetic<br />
* ecef2lv<br />
* ecef2ned<br />
* ecef2nedv<br />
* egm96geoid<br />
* ellipse1<br />
* enu2aer<br />
* enu2ecef<br />
* enu2ecefv<br />
* enu2geodetic<br />
* etopo<br />
* etopo5<br />
* flat2ecc<br />
* flatearthpoly<br />
* framem<br />
* gc2sc<br />
* gcm<br />
* gcpmap<br />
* gcxgc<br />
* gcxsc<br />
* geocentricLatitude<br />
* geodetic2aer<br />
* geodetic2ecef<br />
* geodetic2enu<br />
* geodetic2ned<br />
* geodeticLatitudeFromGeocentric<br />
* geodeticLatitudeFromParametric<br />
* geoloc2grid<br />
* geopoint<br />
* geoquadline<br />
* geoquadpt<br />
* georasterref<br />
* geoshape<br />
* geoshow<br />
* geotiff2mstruct<br />
* geotiffinfo '''[2]'''<br />
* geotiffread '''[1]'''<br />
* geotiffwrite<br />
* getm<br />
* getworldfilename<br />
* globedem<br />
* globedems<br />
* gpxread '''[6]'''<br />
* gradientm<br />
* grid2image<br />
* gridm<br />
* gshhs<br />
* gtextm<br />
* gtopo30<br />
* gtopo30s<br />
* handlem<br />
* imbedm<br />
* ingeoquad<br />
* inputm<br />
* interpm<br />
* intersectgeoquad<br />
* intrplat<br />
* intrplon<br />
* ismap<br />
* ispolycw<br />
* kmlwrite<br />
* kmlwriteline<br />
* kmlwritepoint<br />
* lightm<br />
* lightmui<br />
* linecirc<br />
* linem<br />
* los2<br />
* ltln2val<br />
* lv2ecef<br />
* majaxis<br />
* makeattribspec<br />
* makedbfspec<br />
* makerefmat<br />
* map.geodesy.AuthalicLatitudeConverter<br />
* map.geodesy.ConformalLatitudeConverter<br />
* map.geodesy.IsometricLatitudeConverter<br />
* map.geodesy.RectifyingLatitudeConverter<br />
* map.geodesy.isdegree<br />
* map.rasterref.GeographicRasterReference<br />
* map.rasterref.MapRasterReference<br />
* maplist<br />
* mapoutline<br />
* mappoint<br />
* mapprofile<br />
* maprasterref<br />
* maps<br />
* mapshape<br />
* mapshow '''[3]'''<br />
* maptool<br />
* maptrim<br />
* maptriml<br />
* maptrimp<br />
* maptrims<br />
* mapview<br />
* mdistort<br />
* meridianarc<br />
* meridianfwd<br />
* meshlsrm<br />
* meshm<br />
* mfwdtran<br />
* minaxis<br />
* minvtran<br />
* mlabel<br />
* mlabelzero22pi<br />
* n2ecc<br />
* ned2aer<br />
* ned2ecef<br />
* ned2ecefv<br />
* ned2geodetic<br />
* newpole<br />
* northarrow<br />
* oblateSpheroid<br />
* org2pol<br />
* originui<br />
* outlinegeoquad<br />
* panzoom<br />
* parallelui<br />
* parametricLatitude<br />
* pcolorm<br />
* plabel<br />
* plot3m<br />
* plotm<br />
* polcmap<br />
* poly2ccw<br />
* poly2cw<br />
* poly2fv<br />
* polybool '''[4]'''<br />
* polyjoin<br />
* polymerge<br />
* polysplit<br />
* polyxpoly<br />
* projfwd<br />
* projinv<br />
* projlist<br />
* putpole<br />
* quiver3m<br />
* quiverm<br />
* rad2nm<br />
* rad2sm<br />
* rcurve<br />
* reducem<br />
* referenceEllipsoid<br />
* referenceSphere<br />
* refmatToGeoRasterReference<br />
* refmatToMapRasterReference<br />
* refmatToWorldFileMatrix<br />
* refvecToGeoRasterReference<br />
* resizem<br />
* rhxrh<br />
* rotatem<br />
* rotatetext<br />
* rsphere<br />
* satbath<br />
* scaleruler<br />
* scatterm<br />
* scircle1<br />
* scircle2<br />
* scircleg<br />
* scirclui<br />
* scxsc<br />
* sdtsdemread '''[1]'''<br />
* sdtsinfo '''[2]'''<br />
* sectorg<br />
* setm<br />
* shaderel<br />
* showaxes<br />
* sm2deg<br />
* sm2km<br />
* sm2nm<br />
* sm2rad<br />
* stem3m<br />
* str2angle<br />
* surfacem<br />
* surflm<br />
* surflsrm<br />
* surfm<br />
* symbolm<br />
* tbase<br />
* tightmap<br />
* tissot<br />
* track1<br />
* track2<br />
* trackg<br />
* trackui<br />
* unwrapMultipart<br />
* usamap<br />
* usgs24kdem<br />
* usgsdem<br />
* usgsdems<br />
* utmgeoid<br />
* utmzone<br />
* utmzoneui<br />
* vec2mtx<br />
* vfwdtran<br />
* viewshed<br />
* vinvtran<br />
* vmap0data<br />
* vmap0read<br />
* vmap0rhead<br />
* vmap0ui<br />
* webmap<br />
* WebMapServer<br />
* wgs84Ellipsoid<br />
* wmcenter<br />
* wmclose<br />
* wmlimits<br />
* wmline<br />
* wmmarker<br />
* wmprint<br />
* wmremove<br />
* WMSCapabilities<br />
* wmsfind<br />
* wmsinfo<br />
* WMSLayer<br />
* WMSMapRequest<br />
* wmsread<br />
* wmsupdate<br />
* wmzoom<br />
* worldFileMatrixToRefmat<br />
* worldfileread<br />
* worldfilewrite<br />
* worldmap<br />
* zdatam<br />
</div><br />
<br />
* [1] ''In mapping-1.2.1, rasterread can read any raster file that the GDAL library supports; see http://www.gdal.org/frmt_various.html. No separate functions for individual file format are required. There's some work to do on unifying output formats.''<br />
* [2] ''As [1], rasterinfo does the job.''<br />
* [3] ''As of mapping-1.2.0, there's a basic mapshow.''<br />
* [4] ''See oc_polybool in the OF octclip package.''<br />
* [5] ''See OF image package''<br />
* [6] ''There's a basic gpxread in the 1.2.2 version (to be released)<br />
<br />
==== Grouped list ====<br />
(see numbered notes above)<br />
{| {{table}}<br />
| arcgridread '''[1]'''||File Import and Export||Standard File Formats<br />
|-<br />
| geotiff2mstruct||File Import and Export||Standard File Formats<br />
|-<br />
| geotiffinfo '''[2]'''||File Import and Export||Standard File Formats<br />
|-<br />
| geotiffread '''[1]'''||File Import and Export||Standard File Formats<br />
|-<br />
| geotiffwrite||File Import and Export||Standard File Formats<br />
|-<br />
| sdtsdemread '''[1]'''||File Import and Export||Standard File Formats<br />
|-<br />
| sdtsinfo '''[2]'''||File Import and Export||Standard File Formats<br />
|-<br />
| worldfileread||File Import and Export||Standard File Formats<br />
|-<br />
| worldfilewrite||File Import and Export||Standard File Formats<br />
|-<br />
| getworldfilename||File Import and Export||Standard File Formats<br />
|-<br />
| gpxread||File Import and Export||Standard File Formats<br />
|-<br />
| kmlwrite||File Import and Export||Standard File Formats<br />
|-<br />
| kmlwriteline||File Import and Export||Standard File Formats<br />
|-<br />
| kmlwritepoint||File Import and Export||Standard File Formats<br />
|-<br />
| makeattribspec||File Import and Export||Standard File Formats<br />
|-<br />
| makedbfspec||File Import and Export||Standard File Formats<br />
|-<br />
| imread '''[5]'''||File Import and Export||Standard File Formats<br />
|-<br />
| imwrite '''[5]'''||File Import and Export||Standard File Formats<br />
|-<br />
| demdataui||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| dted||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| dteds||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| etopo||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| etopo5||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| globedem||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| globedems||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| gtopo30||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| gtopo30s||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| satbath||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| tbase||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| usgs24kdem||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| usgsdem '''[1]'''||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| usgsdems '''[1]'''||File Import and Export||Gridded Terrain and Bathymetry Products<br />
|-<br />
| avhrrgoode||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| avhrrlambert||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| egm96geoid||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| gshhs||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| vmap0data||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| vmap0read||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| vmap0rhead||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| vmap0ui||File Import and Export||Specific Vector and Gridded Data Products<br />
|-<br />
| WebMapServer||Web Maps||Web Map Service<br />
|-<br />
| WMSCapabilities||Web Maps||Web Map Service<br />
|-<br />
| WMSLayer||Web Maps||Web Map Service<br />
|-<br />
| WMSMapRequest||Web Maps||Web Map Service<br />
|-<br />
| wmsfind||Web Maps||Web Map Service<br />
|-<br />
| wmsinfo||Web Maps||Web Map Service<br />
|-<br />
| wmsread||Web Maps||Web Map Service<br />
|-<br />
| wmsupdate||Web Maps||Web Map Service<br />
|-<br />
| webmap||Web Maps||Web Map Display<br />
|-<br />
| wmclose||Web Maps||Web Map Display<br />
|-<br />
| wmprint||Web Maps||Web Map Display<br />
|-<br />
| wmmarker||Web Maps||Web Map Display<br />
|-<br />
| wmline||Web Maps||Web Map Display<br />
|-<br />
| wmremove||Web Maps||Web Map Display<br />
|-<br />
| wmcenter||Web Maps||Web Map Display<br />
|-<br />
| wmzoom||Web Maps||Web Map Display<br />
|-<br />
| wmlimits||Web Maps||Web Map Display<br />
|-<br />
| axesm||Map Display||Map Layout and Axes<br />
|-<br />
| axesmui||Map Display||Map Layout and Axes<br />
|-<br />
| clma||Map Display||Map Layout and Axes<br />
|-<br />
| gcm||Map Display||Map Layout and Axes<br />
|-<br />
| getm||Map Display||Map Layout and Axes<br />
|-<br />
| handlem||Map Display||Map Layout and Axes<br />
|-<br />
| handlem-ui||Map Display||Map Layout and Axes<br />
|-<br />
| ismap||Map Display||Map Layout and Axes<br />
|-<br />
| setm||Map Display||Map Layout and Axes<br />
|-<br />
| showaxes||Map Display||Map Layout and Axes<br />
|-<br />
| tightmap||Map Display||Map Layout and Axes<br />
|-<br />
| usamap||Map Display||Map Layout and Axes<br />
|-<br />
| worldmap||Map Display||Map Layout and Axes<br />
|-<br />
| namem||Map Display||Map Layout and Axes<br />
|-<br />
| tagm||Map Display||Map Layout and Axes<br />
|-<br />
| framem||Map Display||Map Layout and Axes<br />
|-<br />
| ingeoquad||Map Display||Map Layout and Axes<br />
|-<br />
| gridm||Map Display||Map Layout and Axes<br />
|-<br />
| angl2str||Map Display||Map Layout and Axes<br />
|-<br />
| mlabel||Map Display||Map Layout and Axes<br />
|-<br />
| mlabelzero22pi||Map Display||Map Layout and Axes<br />
|-<br />
| northarrow||Map Display||Map Layout and Axes<br />
|-<br />
| plabel||Map Display||Map Layout and Axes<br />
|-<br />
| rotatetext||Map Display||Map Layout and Axes<br />
|-<br />
| scaleruler||Map Display||Map Layout and Axes<br />
|-<br />
| geoshow||Map Display||Vector and Raster Map Display<br />
|-<br />
| grid2image||Map Display||Vector and Raster Map Display<br />
|-<br />
| linem||Map Display||Vector and Raster Map Display<br />
|-<br />
| mapshow '''[3]'''||Map Display||Vector and Raster Map Display<br />
|-<br />
| meshm||Map Display||Vector and Raster Map Display<br />
|-<br />
| pcolorm||Map Display||Vector and Raster Map Display<br />
|-<br />
| plotm||Map Display||Vector and Raster Map Display<br />
|-<br />
| plot3m||Map Display||Vector and Raster Map Display<br />
|-<br />
| surfm||Map Display||Vector and Raster Map Display<br />
|-<br />
| usamap||Map Display||Vector and Raster Map Display<br />
|-<br />
| worldmap||Map Display||Vector and Raster Map Display<br />
|-<br />
| camposm||Map Display||3-D Map Display<br />
|-<br />
| camtargm||Map Display||3-D Map Display<br />
|-<br />
| camupm||Map Display||3-D Map Display<br />
|-<br />
| daspectm||Map Display||3-D Map Display<br />
|-<br />
| demcmap||Map Display||3-D Map Display<br />
|-<br />
| lightm||Map Display||3-D Map Display<br />
|-<br />
| lightmui||Map Display||3-D Map Display<br />
|-<br />
| meshlsrm||Map Display||3-D Map Display<br />
|-<br />
| shaderel||Map Display||3-D Map Display<br />
|-<br />
| surflm||Map Display||3-D Map Display<br />
|-<br />
| surflsrm||Map Display||3-D Map Display<br />
|-<br />
| surfacem||Map Display||3-D Map Display<br />
|-<br />
| zdatam||Map Display||3-D Map Display<br />
|-<br />
| clabelm||Map Display||Contour Maps<br />
|-<br />
| clegendm||Map Display||Contour Maps<br />
|-<br />
| contourcbar||Map Display||Contour Maps<br />
|-<br />
| contourcmap||Map Display||Contour Maps<br />
|-<br />
| contourm||Map Display||Contour Maps<br />
|-<br />
| contour3m||Map Display||Contour Maps<br />
|-<br />
| contourfm||Map Display||Contour Maps<br />
|-<br />
| quiverm||Map Display||Thematic Maps<br />
|-<br />
| quiver3m||Map Display||Thematic Maps<br />
|-<br />
| scatterm||Map Display||Thematic Maps<br />
|-<br />
| stem3m||Map Display||Thematic Maps<br />
|-<br />
| symbolm||Map Display||Thematic Maps<br />
|-<br />
| clrmenu||Map Display||Interaction with Maps<br />
|-<br />
| gcpmap||Map Display||Interaction with Maps<br />
|-<br />
| gtextm||Map Display||Interaction with Maps<br />
|-<br />
| inputm||Map Display||Interaction with Maps<br />
|-<br />
| maptool||Map Display||Interaction with Maps<br />
|-<br />
| maptrim||Map Display||Interaction with Maps<br />
|-<br />
| mapview||Map Display||Interaction with Maps<br />
|-<br />
| originui||Map Display||Interaction with Maps<br />
|-<br />
| parallelui||Map Display||Interaction with Maps<br />
|-<br />
| bufferm||Data Analysis||Vector Data<br />
|-<br />
| closePolygonParts||Data Analysis||Vector Data<br />
|-<br />
| extractfield||Data Analysis||Vector Data<br />
|-<br />
| flatearthpoly||Data Analysis||Vector Data<br />
|-<br />
| interpm||Data Analysis||Vector Data<br />
|-<br />
| intrplat||Data Analysis||Vector Data<br />
|-<br />
| intrplon||Data Analysis||Vector Data<br />
|-<br />
| linecirc||Data Analysis||Vector Data<br />
|-<br />
| polcmap||Data Analysis||Vector Data<br />
|-<br />
| polyjoin||Data Analysis||Vector Data<br />
|-<br />
| polymerge||Data Analysis||Vector Data<br />
|-<br />
| polysplit||Data Analysis||Vector Data<br />
|-<br />
| reducem||Data Analysis||Vector Data<br />
|-<br />
| removeExtraNanSeparators||Data Analysis||Vector Data<br />
|-<br />
| ispolycw||Data Analysis||Vector Data<br />
|-<br />
| poly2ccw||Data Analysis||Vector Data<br />
|-<br />
| poly2cw||Data Analysis||Vector Data<br />
|-<br />
| poly2fv||Data Analysis||Vector Data<br />
|-<br />
| polybool '''[4]'''||Data Analysis||Vector Data<br />
|-<br />
| polyxpoly||Data Analysis||Vector Data<br />
|-<br />
| geopoint||Data Analysis||Vector Data<br />
|-<br />
| geoshape||Data Analysis||Vector Data<br />
|-<br />
| mappoint||Data Analysis||Vector Data<br />
|-<br />
| mapshape||Data Analysis||Vector Data<br />
|-<br />
| map.rasterref.GeographicRasterReference||Data Analysis||Raster Data and Representations<br />
|-<br />
| map.rasterref.MapRasterReference||Data Analysis||Raster Data and Representations<br />
|-<br />
| geoloc2grid||Data Analysis||Raster Data and Representations<br />
|-<br />
| imbedm||Data Analysis||Raster Data and Representations<br />
|-<br />
| ltln2val||Data Analysis||Raster Data and Representations<br />
|-<br />
| mapoutline||Data Analysis||Raster Data and Representations<br />
|-<br />
| resizem||Data Analysis||Raster Data and Representations<br />
|-<br />
| limitm||Data Analysis||Raster Data and Representations<br />
|-<br />
| georasterref||Data Analysis||Raster Data and Representations<br />
|-<br />
| makerefmat||Data Analysis||Raster Data and Representations<br />
|-<br />
| maprasterref||Data Analysis||Raster Data and Representations<br />
|-<br />
| refmatToGeoRasterReference||Data Analysis||Raster Data and Representations<br />
|-<br />
| refmatToMapRasterReference||Data Analysis||Raster Data and Representations<br />
|-<br />
| refmatToWorldFileMatrix||Data Analysis||Raster Data and Representations<br />
|-<br />
| refvecToGeoRasterReference||Data Analysis||Raster Data and Representations<br />
|-<br />
| worldFileMatrixToRefmat||Data Analysis||Raster Data and Representations<br />
|-<br />
| mapprofile||Data Analysis||Conversion Between Vector and Raster Data<br />
|-<br />
| vec2mtx||Data Analysis||Conversion Between Vector and Raster Data<br />
|-<br />
| gradientm||Data Analysis||Terrain Data Analysis<br />
|-<br />
| los2||Data Analysis||Terrain Data Analysis<br />
|-<br />
| viewshed||Data Analysis||Terrain Data Analysis<br />
|-<br />
| wgs84Ellipsoid||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| earthRadius||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| rcurve||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| rsphere||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| geocentricLatitude||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| parametricLatitude||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| geodeticLatitudeFromGeocentric||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| geodeticLatitudeFromParametric||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| axes2ecc||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| majaxis||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| minaxis||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| ecc2flat||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| flat2ecc||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| ecc2n||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| n2ecc||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| oblateSpheroid||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| referenceEllipsoid||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| referenceSphere||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| map.geodesy.AuthalicLatitudeConverter||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| map.geodesy.ConformalLatitudeConverter||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| map.geodesy.IsometricLatitudeConverter||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| map.geodesy.RectifyingLatitudeConverter||Coordinates, Geodesy, and Projections||Modeling the Earth<br />
|-<br />
| str2angle||Coordinates, Geodesy, and Projections||Lengths and Angles<br />
|-<br />
| unwrapMultipart||Coordinates, Geodesy, and Projections||Lengths and Angles<br />
|-<br />
| map.geodesy.isdegree||Coordinates, Geodesy, and Projections||Lengths and Angles<br />
|-<br />
| azimuth||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| departure||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| distance||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| gc2sc||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| gcxgc||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| gcxsc||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| meridianarc||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| meridianfwd||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| reckon||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| rhxrh||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| track1||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| track2||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| trackg||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| trackui||Coordinates, Geodesy, and Projections||Great Circles, Geodesics, and Rhumb Lines<br />
|-<br />
| ellipse1||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| deg2km||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| deg2nm||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| deg2sm||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| gcxsc||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| rad2nm||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| rad2sm||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| scircle1||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| scircle2||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| scircleg||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| scirclui||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| scxsc||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| sectorg||Coordinates, Geodesy, and Projections||Small Circles, Ellipses, and Spherical Distance<br />
|-<br />
| areaint||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| areamat||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| areaquad||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| bufgeoquad||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| geoquadline||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| geoquadpt||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| ingeoquad||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| intersectgeoquad||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| outlinegeoquad||Coordinates, Geodesy, and Projections||Zones, Lunes, Quadrangles, and Other Areas<br />
|-<br />
| map.rasterref.GeographicRasterReference||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| map.rasterref.MapRasterReference||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| antipode||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| mfwdtran||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| minvtran||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| newpole||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| org2pol||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| projfwd||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| projinv||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| putpole||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| rotatem||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| defaultm||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| geotiff2mstruct||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| maplist||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| maps||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| mfwdtran||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| minvtran||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| projlist||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| vfwdtran||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| vinvtran||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| clipdata||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| distortcalc||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| maptriml||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| maptrimp||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| maptrims||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| mdistort||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| tissot||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| utmgeoid||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| utmzone||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| utmzoneui||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ecef2geodetic||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| geodetic2ecef||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| geodetic2enu||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| geodetic2ned||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| geodetic2aer||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| enu2geodetic||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ned2geodetic||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| aer2geodetic||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ecef2enu||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ecef2ned||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ecef2aer||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| enu2ecef||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ned2ecef||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| aer2ecef||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| aer2enu||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| aer2ned||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| enu2aer||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ned2aer||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ecef2enuv||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ecef2nedv||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| enu2ecefv||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| ned2ecefv||Coordinates, Geodesy, and Projections||Coordinate Systems<br />
|-<br />
| <br />
|}<br />
<br />
=== Missing options ===<br />
TBD<br />
<br />
=== Contributing ===<br />
* See for example [[User:Sandeepmv#Y:_Your_task]]<br />
* geod toolbox [https://drive.google.com/file/d/0B-I95wETyqQidnZWbm5TbzZRcHc/edit?usp=sharing] (BSD-licensed, available by the author outside of File Exchange)<br />
<br />
[[Category:Octave-Forge]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=Status_of_bugs&diff=7435Status of bugs2016-01-30T21:27:27Z<p>83.163.225.168: /* Bugs */</p>
<hr />
<div>To report a bug, please use the [https://savannah.gnu.org/bugs/?group=octave bug tracker].<br />
<br />
This page is intended to allow those developers without the authority to change the status of bugs on bug tracker to point out bugs that have been fixed, or other suggested status changes, to those who do have authority to update bug tracker. The goal is that this will be a short list, which is easier for the core developers to scan than the entire bug tracker list. Currently, it is just Lachlan's thoughts on the 50 or so "Patch submitted" bugs.<br />
<br />
N/C means no change.<br />
<br />
"Types" are chosen so that importance is typically decreasing if sorted by type: C=crash, E=error (wrong result), M=gives error on code Matlab accepts, O=bug in existing Octave functionality that is not part of Matlab, P=performance, X=extension (request to accept code Matlab doesn't accept).)<br />
<br />
==Bugs==<br />
{|class="wikitable sortable"<br />
!Bug ID !! Status !! Suggested action !! Type !! comments<br />
|-<br />
|{{bug|31626}} || Patch || Apply patch || X ||<br />
|-<br />
|{{bug|32008}} || Patch || N/C || M || Needs more documentation so a committer can understand it thoroughly<br />
|-<br />
|{{bug|32088}} || Patch || "In progress" || || Existing patch does not meet requirements<br />
|-<br />
<!-- {{bug|32839}} || Patch || N/C || O || I (Lachlan) don't understand pkg build / install<br />
|- --><br />
|{{bug|32885}} || Patch || Review latest patch || E || Patch is simple..<br />
|-<br />
|{{bug|32924}} || Patch || "Postponed" || E || Jordi was going to fix this in 2011. Should re-allocated to "None"?<br />
|-<br />
|{{bug|33503}} || Patch || "closed" || -- || This is just a matter of roundoff<br />
|-<br />
|{{bug|33523}} || Confirmed || Review patch || M ||<br />
|-<br />
|{{bug|34363}} || Patch || "None" || P || All supplied patches have been applied<br />
|-<br />
|{{bug|34624}} || Patch || "Duplicate"? || O || Duplicate of #44095?<br />
|-<br />
|{{bug|36372}} || Patch || See if "reverse ordinal" is useful || P,X || Patch fails to apply to dev- "5 out of 5 hunks failed"<br />
|-<br />
|{{bug|36646}} || Patch || "In progress"? || M || The patches fix some statistical tests but not all. No progress for two years.<br />
|-<br />
|{{bug|41315}} || Open || "Fixed" || || Fixed in 4.0.1<br />
|-<br />
|{{bug|41512}} || Open || Review patch || M ||<br />
|-<br />
|{{bug|42705}} || Open || Review patch || O || Patch will need polishing after approval-in-principle<br />
|-<br />
|{{bug|42825}} || Confirmed || Review patch || E ||<br />
|-<br />
|{{bug|42850}} || Confirmed || Review patch || C ||<br />
|-<br />
|{{bug|43038}} || Needs Info || Mark as duplicate of {{bug|33523}} || M ||<br />
|-<br />
|{{bug|43511}} || None || Review patch || E ||<br />
|-<br />
|{{bug|43925}} || None || Review patch || e || Patch also fixes {{bug|44498}}<br />
|-<br />
|{{bug|44498}} || None || see {{bug|43925}} || e || Disregard patch; use patch for {{bug|43925}}.<br />
|-<br />
|{{bug|45219}} || None || Review patch || M ||<br />
|-<br />
|{{bug|45654}} || Open || Review patch || O || Patch will need polishing after approval-in-principle<br />
|-<br />
|{{bug|46933}} || Confirmed || Review patch || C || The patch solves the crash problem. Please review<br />
|-<br />
|}<br />
<br />
==Breakpoint bugs==<br />
{|class="wikitable sortable"<br />
!Bug ID !! Fixed? !! comments<br />
|-<br />
|{{bug|33411}} || No || Patch supplied<br />
|-<br />
|{{bug|34804}} || No || Patch supplied<br />
|-<br />
|{{bug|36576}} || Yes || <br />
|-<br />
|{{bug|39329}} || I think so || <br />
|-<br />
|{{bug|41514}} || No || <br />
|-<br />
|{{bug|41540}} || No || The fix is simple.<br />
|-<br />
|{{bug|41556}} || Yes || <br />
|-<br />
|{{bug|41845}} || Almost || Must use dbstop syntax, but functionality is there<br />
|-<br />
|{{bug|42708}} || Yes || By Dan Sebald's patch<br />
|-</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=Windows_Installer&diff=7049Windows Installer2015-12-12T23:52:31Z<p>83.163.225.168: /* Creating Octave development versions for Windows with mxe-octave */</p>
<hr />
<div>:''This article is about how to make the Windows installer; if you'd like just to use the installer, see [[Octave for Microsoft Windows]].''<br />
GNU Octave is primarily developed on GNU/Linux and other POSIX conformal systems. There have been many efforts in the past to build ports of GNU Octave for Windows. Take a look at the various ports of Octave available for Windows [http://wiki.octave.org/Octave_for_Windows here].<br />
<br />
Recently some work has been done in maintaining a unified build system [http://wiki.octave.org/MXE '''mxe-octave'''] (a fork of [http://mxe.cc/ MXE]) which anyone can use to produce cross as well as native builds of GNU Octave for Windows and Mac OS X platforms. This page contains instructions about creating a Windows installer using mxe-octave.<br />
<br />
==Steps to create Windows Installer==<br />
<br />
# [http://wiki.octave.org/Windows_Installer#Installing_requirements_of_MXE_Octave Install all requirements of MXE Octave].<br />
# <code>hg clone http://hg.octave.org/mxe-octave/</code><br />
# <code>cd mxe-octave</code><br />
# <code>autoconf</code><br />
# <code>./configure</code><br />
# <code>make nsis-installer</code><br />
<br />
===Tweaks===<br />
<br />
* Use <code>make tar-dist</code> or <code>make zip-dist</code> instead of <code>nsis-installer</code> if you want to build just an archive of the files to install on Windows instead of an installer wizard.<br />
* By default, packages will be built one at a time, but you may use <code>make JOBS=4</code> (choose a number other than 4 that is appropriate for your system) to build each package in parallel. You may also combine this with the <code>-j</code> option for Make to build more than one package at a time, but be careful as using <code>make -j4 JOBS=4</code> can result in as many as 16 jobs running at once.<br />
* Use <code>./configure --disable-strip-dist-files</code> if you want to keep debug symbols in the installed binaries for debugging on Windows.<br />
* Include gdb in the installer by running <code>make gdb</code> before making the <code>nsis-installer</code> target.<br />
<br />
===Creating Octave development versions for Windows with mxe-octave===<br />
To roll your own octave for windows version with your favorite mods and patches, you can do as follows:<br />
<br />
# Make the cross-build environment for Octave (=mxe-octave; see above)<br />
# Build an Octave dist archive in Linux<br />
# Move that into mxe-octave and cross-build Octave + windows installer.<br />
For ensuing builds after a first build, you'll only need to follow steps 2 + a little amended step 3 (see below)<br />
<br />
====Step 1: Prepare mxe-octave====<br />
Clone the mxe-octave reop to some directory of your choice:<br />
http://hg/octave.org/mxe-octave <name of mxe-octave build dir><br />
where <name of mxe-octave build dir> is some other name than just the default "mxe-octave". <br />
Once downloaded, go into the <name of mxe-octave build dir> subdir and do:<br />
.autoconf<br />
./configure <options you want><br />
make nsis-installer JOBS=<some number><br />
Your author usually has "--enable-devel-tools --enable-windows-64 --enable-octave=default --enable-binary-packages" as configure options and use JOBS=7 on my core i5 system.<br />
* the first configure option also includes gdb and an MSYS shell in the binary<br />
* the second avoids the ~700 MB max. array size limit for 32-bit executables but Octave will only run on 64-bit Windows (most Windows systems are 64 bit anyway these days). Note: this option does NOT imply 64-bit indexing<br />
* the third option is just for a placeholder; it'll invoke src/default-octave.mk (one of the three octave .mk files in mxe: src/stable-octave.mk and src/octave.mk, corresponding to the "--enable-octave=" configure option), I found that octave.mk lags a bit behind<br />
* the fourth option cross-compiles the binary modules in Octave-Forge packages, which wil save time when installing them once in Windows.<br />
If you seriously want to work with gdb, also have --disable-strip-dist-files as configure option. However, in that case chances are that you cannot build an .exe installer anymore as it becomes too big for NSIS (that has a 2 GB installer file size limit) so instead of "make nsis-installer" you'll need to invoke <br />
make zip-dist <options><br />
....and this results in all Octave dependencies being built in mxe-octave, plus (stable) Octave, plus an initial version of a binary Octave-Windows installer in the <mxe-octave build>/dist/ subdirectoy.<br />
<br />
It can happen that you meet problems with Java. To build Octave with Java support built-in, mxe-octave needs:<br />
* A Java JDK (Java Development Kit) on the '''host''' system. IOW, the javac (Java compiler) and jar (Java archiver) executables should be in the PATH.<br />
* Java include files for windows (win32, even for w64 builds). They should reside in "<mxe-octave build dir>/usr/x86_64-w64-mingw32/include/java/win32". If not present, mxe-octave downloads them but this can occasionally go wrong. On a multi-boot system a solution (note: dirty hack warning!) is symlinking to the Windows include files on the Windows partition from the mxe-octave location.<br />
<br />
====Step 2: To build your first Octave-for Windows development version:====<br />
* build Octave on Linux (in separate source and build trees) including your favorite mods and patches.<br />
* once Octave runs fine in Linux (using make check and trying your mods using ./run-octave & from the build dir, all of this still on the Linux side), do:<br />
make all dist<br />
* This will produce a dist archive called "octave-<version>.tar.gz" in the top directory. Move or copy this dist archive to the <mxe-octave build>/pkg folder (or symlink to it from there)<br />
<br />
Note that this step requires the Octave be configured with Java (i.e., you need javac and jar on your system).<br />
==== Step 3: Building the Octave installer====<br />
* be sure to adapt <mxe-octave build>/src/default-octave.mk to read "## No Checksum" at the $(PKG)_CHECKSUM line and check octave version and archive type (tar.gz rather than tar.bz2). The checksum is only needed when you download a dist archive from the Internet, not so much when you copy it within your own home network, let alone your own computer.<br />
* check if in the top of the main Makefile "default-octave" is mentioned for OCTAVE_TARGET rather than "stable-octave" of just "octave" (that name refers to the .mk filename in the src folder).<br />
* ... and then run (in the <mxe-octave build> folder)<br />
make nsis-installer <options><br />
-or-<br />
make zip-dist <options><br />
====Step 3A (second and later builds)====<br />
For next builds, mxe-octave is already configured and all dependencies have been built so the only thing to do is having a new Octave version + installer built:<br />
* move/copy the dist archive from step 2 into mxe-octave's pkg subdir<br />
* in <mxe-octave build> root dir do:<br />
touch src/default-octave.mk<br />
(to be sure mxe-octave picks up the new Octave archive). If you've renamed the dist archive, be sure it matches with the package name in src/default-octave.mk.<br />
Then do:<br />
make nsis-installer<br />
-or-<br />
make zip-dist<br />
====Step 4: Install on Windows====<br />
* move the installer in <mxe-octave build>/dist/ to the Windows side (USB thumb drive, LAN copy, whatever).<br />
* install it there.<br />
If you've made a zip-dist you'll have to manually create the desktop and Start Menu shortcuts (for octave and the MSYS-shell).<br />
<br />
====Remarks====<br />
* If you have several mxe-octavebuild dirs (for e.g., stable and several development versions) it is handy to have a separate pkg subdir symlinked to from all mxe-octave build dirs. That will save a lot of downloading bandwidth.<br />
* To keep mxe-octave up-to-date, from time to time do:<br />
hg -v pull<br />
hg -v update<br />
* However, do not keep mxe-octave build dirs for too long. I'd suggest to wipe a build dir after at most two or three months and start over with a fresh clone a la Step 1.<br />
* In the mean time, regularly clean up <mxe-octave build>/log to save disk space. After a first successful build there's no more use for the log subdirs for each package, so you can wipe them all.<br />
<br />
====If things go wrong====<br />
It is possible that, for example, the build of Octave in step 2 works but that if fails in step 3. Here are some troubleshooting tips.<br />
* The error message displayed by make is simply the last 10 lines of the log file. This may truncate the actual error message.<br />
* Sometimes running "make" a second time without changing anything will fix the problem. In particular, autotools rebuilds some files in the first make which may cause the second make to succeed.<br />
* If it is building Octave that failed, the source will be left in <mxe-octave build>/tmp-default-octave and it is possible to run "configure && make" in that directory.<br />
* The configuration will be for the target system, not your own. In particular, if you have not installed all of the packages that MXE-octave installs, then your configuration will be different. However, some configuration variables will differ even if you have the same packages, and some compiler features may be available on the host system that are not available in cross-compile mode.<br />
* A possible causes for build failure is having files in your local source or build directory that are not listed in the module.mk files; these are not copied into the dist archive.<br />
* (philip) On my core i5 desktop system with a fast SSD, mxe-octave builds usually fails at libmng, suspectedly because of a race condition related to disk I/O. A way to get past this is by specifying "make nsis-installer JOBS=1", if required repeatedly (sometimes 5 or 6 times), interrupting the build in the next step/dependency once libmng has been built fine, and restarting with "make nsis-installer JOBS=<higher number>". As of Dec. 2015 it is only libmng that has this issue.<br />
<br />
==Installing requirements of MXE Octave==<br />
MXE Octave requires a recent Unix system where all components as stated below are installed.<br />
<br />
===Debian (GNU/kFreeBSD & GNU/Linux)===<br />
aptitude install -R autoconf automake bash bison bzip2 \<br />
cmake flex gettext git g++ intltool \<br />
libffi-dev libtool libltdl-dev \<br />
mercurial openssl libssl-dev \<br />
libxml-parser-perl make patch perl \<br />
pkg-config scons sed unzip wget \<br />
xz-utils yasm autopoint zip<br />
<br />
On 64-bit Debian, install also:<br />
<br />
aptitude install -R g++-multilib libc6-dev-i386<br />
<br />
If you are using Ubuntu, then you can do <code>apt-get install foo</code> instead of <code>aptitude install -R foo</code>.<br />
<br />
On a fesh Linux Mint 16 x86_64, in addition to the above also install:<br />
<br />
sudo apt-get install libc6-dev-i386 gcc-multilib libgmp3-dev libmpfr4 libmpfr-dev<br />
sudo apt-get build-dep gcc-4.8<br />
<br />
If not installed you will get error messages like "/usr/include/stdc-predef.h:30:26: fatal error: bits/predefs.h: No such file or directory" or "/usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/4.8/libgcc.a when searching for -lgcc" when compiling ocaml-core.<br />
The packages libgmp3-dev libmpfr4 libmpfr-dev libmpc-dev are needed for compiling the build-gcc.<br />
<br />
===Fedora===<br />
yum install autoconf automake bash bison bzip2 cmake \<br />
flex gcc-c++ gettext git intltool make sed \<br />
libffi-devel libtool openssl-devel patch perl pkgconfig \<br />
scons yasm unzip wget xz<br />
On 64-bit Fedora, there are [http://wiki.octave.org/Windows_Installer#Open_Issues_with_NSIS open issues with the NSIS] package. <br />
<br />
===FreeBSD===<br />
pkg_add -r automake111 autoconf268 bash bison cmake \<br />
flex gettext git gmake gsed intltool libffi libtool \<br />
openssl patch perl p5-XML-Parser pkg-config \<br />
scons unzip wget yasm<br />
<br />
Ensure that /usr/local/bin precedes /usr/bin in your $PATH:<br><br />
For C style shells, edit .cshrc<br />
setenv PATH /usr/local/bin:$PATH<br />
For Bourne shells, edit .profile<br />
export PATH = /usr/local/bin:$PATH<br />
<br />
On 64-bit FreeBSD, there are [http://wiki.octave.org/Windows_Installer#Open_Issues_with_NSIS open issues with the NSIS] package.<br />
<br />
===Frugalware===<br />
pacman-g2 -S autoconf automake bash bzip2 bison cmake \<br />
flex gcc gettext git intltool make sed libffi libtool \<br />
openssl patch perl perl-xml-parser pkgconfig \<br />
scons unzip wget xz xz-lzma yasm<br />
On 64-bit Frugalware, there are [http://wiki.octave.org/Windows_Installer#Open_Issues_with_NSIS open issues with the NSIS] package. <br />
<br />
===Gentoo===<br />
emerge sys-devel/autoconf sys-devel/automake \<br />
app-shells/bash sys-devel/bison app-arch/bzip2 \<br />
dev-util/cmake sys-devel/flex sys-devel/gcc \<br />
sys-devel/gettext dev-vcs/git \<br />
dev-util/intltool sys-devel/make sys-apps/sed \<br />
dev-libs/libffi sys-devel/libtool dev-libs/openssl sys-devel/patch \<br />
dev-lang/perl dev-perl/XML-Parser \<br />
dev-util/pkgconfig dev-util/scons app-arch/unzip \<br />
net-misc/wget app-arch/xz-utils dev-lang/yasm<br />
<br />
===Mac OS X===<br />
Install [http://developer.apple.com/xcode/ Xcode 4] and [http://www.macports.org/ MacPorts], then run:<br />
<br />
sudo port install autoconf automake bison cmake flex \<br />
gettext git-core gsed intltool libffi libtool \<br />
openssl p5-xml-parser pkgconfig scons \<br />
wget xz yasm<br />
<br />
Mac OS X versions ≤ 10.6 are no longer supported.<br />
<br />
===MingW===<br />
Make sure to update and upgrade packages as some of the default versions of packages are too old to work correctly.<br />
mingw-get update<br />
mingw-get upgrade<br />
<br />
And then get required packages.<br />
mingw-get install autoconf bash msys-bison msys-flex gcc gcc-c++ \<br />
gcc-fortran gettext msys-m4 msys-make msys-sed \<br />
libiconv msys-openssl msys-patch msys-perl \<br />
msys-libarchive msys-unzip msys-wget msys-bsdtar<br />
<br />
You will also need to install Windows versions of Python and Ghostscript and ensure they are in visible in the PATH.<br />
<br />
===OpenSUSE===<br />
zypper install -R autoconf automake bash bison bzip2 \<br />
cmake flex gcc-c++ gettext-tools git \<br />
intltool libffi-devel libtool make openssl \<br />
libopenssl-devel patch perl \<br />
perl-XML-Parser pkg-config scons \<br />
sed unzip wget xz yasm<br />
<br />
On 64-bit openSUSE, install also:<br />
zypper install -R gcc-32bit glibc-devel-32bit \<br />
libgcc46-32bit libgomp46-32bit \<br />
libstdc++46-devel-32bit<br />
<br />
==Creating an NSIS based installer==<br />
The <code>make nsis-installer</code> command produces a NSIS installer that is ready to be distributed. <br />
<br />
[[Category:Packaging]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=5031IO package2014-08-19T18:53:47Z<p>83.163.225.168: /* xlswrite / odswrite versus xlsopen / odsopen ..... xlsclose / odsclose */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
=== About read/write support ===<br />
<br />
Most people need this package to read and write Excel files. But the io package can read/write Open/Libre Office, Gnumeric and some less important files too.<br />
<br />
<pre><nowiki><br />
File extension COM POI POI/OOXML JXL OXS UNO OTK JOD OCT<br />
--------------------------------------------------------------<br />
.xls (Excel95) R R R<br />
.xls (Excel97-2003) + + + + + +<br />
.xlsx (Excel2007+) ~ + (+) + +<br />
.xlsb, .xlsm ~ ? R R?<br />
.wk1 + R<br />
.wks + R<br />
.dbf + +<br />
.ods ~ + + + +<br />
.sxc + +<br />
.fods +<br />
.uos +<br />
.dif + +<br />
.csv + R<br />
.gnumeric +<br />
--------------------------------------------------------------<br />
<br />
R : only read; + : full read/write; ~ : dependent on Excel version<br />
</nowiki></pre><br />
<br />
<br />
==== xlswrite / odswrite versus xlsopen / odsopen ..... xlsclose / odsclose ====<br />
<br />
Matlab users are used to xlsread and xlswrite, functions that can only read data from, or write data to, one sheet in a spreadsheet file at a time. For each operation, xlsread and xlswrite first have to read the entire spreadsheet file, for write operations xlswrite also has to finally write it out completely to disk.<br />
There are faster ways, but then you'll have to dive into ActiveX/COM/VisualBasic programming.<br />
<br />
If you want to move multiple pieces of data to/from a spreadsheet file, the io package offers a much more versatile scheme:<br />
* First open the spreadsheet file using xlsopen (for Excel or gnumeric files) or odsopen (.ods or .gnumeric). <br />
'''NOTE''': the output of these functions is a file pointer handle that you should treat carefully!<br />
* (for reading data) Read the data using raw_data = xls2oct (<fileptr> [,sheet#] [,cellrange] [,options])<br />
* Next, optionally split the data in numerical, text and raw data and optionally get the limits of where these came from:<br />
[num, txt, raw, lims] = parsecell (data, <fileptr.lims>)<br />
* (for writing data) Write the data using <fileptr> = oct2xls (data, <fileptr> [,sheet#] [,cellrange] [,options])<br />
* When you're finished, DO NOT FORGET to close the file pointer handle:<br />
<fileptr> = xlsclose (<fileptr>)</pre><br />
<br />
Mixing read and write operations in any order is permitted (the only exception: not with the JXL -JExcelAPI- interface).<br />
The same goes for odsopen-ods2oct-oct2ods-odsclose sequences.<br />
<br />
Obviously this is much more flexible (and FASTER) than xlsread and xlswrite. In fact, Octave's io package xlsread is a mere wrapper for an xlsopen-xls2oct-parsecell-xlsclose sequence. Similarly for xlswrite, odsread, and odswrite.<br />
<br />
==== .xls ~= .xlsx ====<br />
<br />
'''This is the most important information you have to keep in mind when you have to work with "Excel" files.''' <br />
* .xls - is an outdated default binary file format from <= Office 2003 - '''try to avoid this format!'''<br />
* .xlsx - is the new default file format since Office 2007. [https://en.wikipedia.org/wiki/OOXML It consists of xml files stored in a .zip container.] - '''always save in or convert to this format!'''<br />
* The ''(new)'' OCT interface can read ''(since version 1.2.5)'' and write ''(since version 2.2.0)'' .xlsx files dependency-free! No need of MS Windows+Office nor Java.<br />
* Windows is notorious for hiding "known" file extensions. However in Windows Explorer it is easy to change this and have Windows show all file extensions.<br />
<br />
<br />
==== different interfaces ====<br />
<br />
The io package comes with different interfaces to read/write different file formats.<br />
# COM<br />
## This ''(interface)'' is only available on MS Windows '''and''' with an MS Office installation.<br />
# [POI, POI/OOXML, JXL, OXS, UNO, OTK, JOD]<br />
## These are java-based interfaces. They are generally slower than Octave's native OCT interface; OTOH they offer more flexibility. Generally the OCT interface offers sufficient flexibility and speed. <br />
# OCT<br />
## This is the new impressive and fast ''(mostly written in Octave itself! + two C files to bypass bottlenecks)'' interface which presently supports .xlsx, .ods and .gnumeric files.<br />
(Note that .ods is a complicated file format with many gotchas that doesn't lend itself for fast file I/O. So unfortunately the fastest .ods interface is the Java-based jOpenDocument (JOD) (luckily it is GPL). However if speed is not an issue or if you hate Java, the OCT interface still performs fast enough.)<br />
<br />
So, if you want to read/write '''.xlsx''' files, you'll only need the io-package >=2.2.0. <br />
<br />
But if you have to read/write '''.xls''' files, you'll need either<br />
* MS Windows with MS Office backings - or<br />
* Octave built with --enable-java, + a Java JRE or -JDK, and one or more of the Java interfaces (i.e., the class libs)!<br />
<br />
If you want to read/write .gnumeric files, the OCT interface is even the only option.<br />
<br />
For some rarely used file formats you'll need LibreOffice + Octave built with Java enabled + a Java JRE or -JDK. But OK, once there you can enjoy formats then like Unified Office Format, Data Interchange Format, SYLK, OpenDocument Flat XML, the old OpenOffice.org .sxc format and some others you may have heard of ;-)<br />
<br />
<br />
==== force an interface ====<br />
<br />
If you don't want that the io-autodetect take control, you can easily force the usage of an interface. Examples:<br />
<br />
Force native OCT interface - only for .xlsx, .ods, .gnumeric<br />
<pre>OCT = xlsread ('file.xlsx', 1, [], 'OCT');</pre><br />
<br />
Force COM interface - may only work with .xls, .xlsx on Windows OS and available office installation.<br />
<pre>COM = xlsread ('file.xlsx', 1, [], 'COM');</pre><br />
<br />
Force POI interface - may only work if you've did javaaddpath for the Apache POI .jar files - only .xls<br />
<pre>POI = xlsread ('file.xls', 1, [], 'POI');</pre><br />
<br />
And so on ...<br />
<br />
<br />
==== Java example ====<br />
<br />
# Again: You only need Java if you have to read/write .xls files! You don't need this for .xlsx files!<br />
# Make sure you've setup everything with java correctly<br />
# get e.g. apache poi jar library files and add them with javaaddpath<br />
<pre><nowiki><br />
octave:1> javaaddpath('~/poi_library/poi-3.8-20120326.jar');<br />
octave:2> javaaddpath('~/poi_library/poi-ooxml-3.8-20120326.jar');<br />
octave:3> javaaddpath('~/poi_library/poi-ooxml-schemas-3.8-20120326.jar');<br />
octave:4> javaaddpath('~/poi_library/xmlbeans-2.3.0.jar');<br />
octave:5> javaaddpath('~/poi_library/dom4j-1.6.1.jar');<br />
octave:6> <br />
octave:6> pkg load io<br />
octave:7> chk_spreadsheet_support <br />
ans = 6<br />
octave:8> javaclasspath <br />
STATIC JAVA PATH<br />
<br />
- empty -<br />
<br />
DYNAMIC JAVA PATH<br />
<br />
/home/markus/poi_library/poi-3.8-20120326.jar<br />
/home/markus/poi_library/poi-ooxml-3.8-20120326.jar<br />
/home/markus/poi_library/poi-ooxml-schemas-3.8-20120326.jar<br />
/home/markus/poi_library/xmlbeans-2.3.0.jar<br />
/home/markus/poi_library/dom4j-1.6.1.jar<br />
<br />
</nowiki></pre><br />
<br />
An easier way is to collect all required Java class libs fo spreadsheet I/O (the .jar files) in one subdir and have chk_spreadsheet_support .m sort it all out:<br />
<pre><nowiki><br />
octave:8> chk_spreadsheet_support ('/full/path/to/subdir/with/.jar/files')<br />
</nowiki></pre><br />
<br />
For UNO (LibreOffice-behind-the-scenes) the call is a bit different:<br />
<pre><nowiki><br />
octave:8> chk_spreadsheet_support ('', 0, '/full/path/to/LibreOffice/installation')<br />
</nowiki></pre><br />
<br />
(On Windows, the io package tries to automatically find all required Java class libs and LibreOffice. To help it, put the Java class libs in you user profile (home directory) in a subdir "java", e.g., C:\Users\Eddy\java. chk_spreadsheet_support searches that location automagically.<br />
On Linux this automatic searching has been disabled as the io package took ages (well, minutes) to load...)<br />
<br />
<br />
Anyway, the chk_spreadsheet_support output should be now > 0.<br />
<br />
<pre><nowiki><br />
0 No spreadsheet I/O support found<br />
---------- XLS (Excel) interfaces: ----------<br />
1 = COM (ActiveX / Excel)<br />
2 = POI (Java / Apache POI)<br />
4 = POI+OOXML (Java / Apache POI)<br />
8 = JXL (Java / JExcelAPI)<br />
16 = OXS (Java / OpenXLS)<br />
--- ODS (OpenOffice.org Calc) interfaces ----<br />
32 = OTK (Java/ ODF Toolkit)<br />
64 = JOD (Java / jOpenDocument)<br />
----------------- XLS & ODS: ----------------<br />
128 = UNO (Java / UNO bridge - OpenOffice.org)<br />
</nowiki></pre><br />
<br />
And reading/writing .xls files should work.<br />
<br />
== Detailed Information (TL) ==<br />
<br />
The following might be more interesting if you're interested in how things work inside the io package.<br />
<br />
=== ODS support ===<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
==== Files content ====<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
==== Required support software ====<br />
For the OCT interface (since 1.2.4/1.2.5, read-only support!):<br />
* Nothing except unzip<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 has been confirmed to work with odfdom 0.8.7 and earlier. odfdom-0.8.8 hasn't been tested with other xercesImpl.jar releases yet). Get them here:<br />
** http://incubator.apache.org/odftoolkit/downloads.html (contains odfdom-0.8.8)<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.3 (final) is the most recent one and recommended for Octave).<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org or www.libreoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the Java classpath.<br />
<br />
==== Usage ====<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
==== Spreadsheet formula support ====<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
==== Gotchas ====<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
===== Date and time in ODS =====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year). (Why mention MS-Excel here? See below:) <br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
===== Java memory pool allocation size =====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
===== Reading cells containing errors =====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc due to e.g., invalid formulas, may have a 0 (zero) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's :<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
<br />
NOT fixed in jOpenDocument version 1.2 & 1.3 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
==== Matlab compatibility ====<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
==== Comparison of interfaces ====<br />
The OCT interface (present as of io-1.2.4) offers read support for ODS 1.2, complete with all the options of ODFtoolkit and UNO, but fairly slow.<br />
<br />
The OTK interface (ODFtoolkit) is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The JOD (jOpenDocument) interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
==== Troubleshooting ====<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
==== Development ====<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
==== ODFDOM versions ====<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on versions 0.8.7 and 0.8.8 have been tested too - these needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet. Version 0.8.9 introduced an undocumented dependency on some obscure Java class lib - I think due to a bit sloppy development procedures. Anyway I couldn't get it to work.<br />
So at the moment (Summer 2013 = last I looked) only odfdom versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
=== XLS support ===<br />
==== Files content ====<br />
* '''xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users.<br />
<br />
==== Required support software ====<br />
For the OCT interface (since 1.2.4/1.2.5, read-only support for OOXML (.xlsx)!):<br />
* Nothing except unzip<br />
<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed. Note that 64-bit MS-Office has no support for COMx /ActiveX so you might have to resort to the Java interfaces below<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED. Currently (2013) windows-1.2.1 is the best option.<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java JRE or JDK > 1.6.0 (hasn't been tested with earlier versions). Although not an Octave issue, as to security you'd better get the latest Java version anyway.<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''' or '''xmlbeans.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath.<br />
<br />
==== Usage ====<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen - xls2oct - xlsclose - parsecell''' and '''xlsopen - oct2xls - xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you can do with the OCT interface (specify "oct" for the REQINTF parameter). For other Excel file types you need MS-Excel for Windows (or later version) and the windows package (specify "com" for REQINTF), and/or Apache POI and Java support (then the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed).<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI.<br />
<br />
==== Spreadsheet formula support ====<br />
When using the COM, POI, JXL, and UNO interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results, or the literal formula text strings (also works with OCT interface);<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL) nor OpenXLS (OXS). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL' or 'OXS', do not expect meaningful results when reading those files later on ,unless you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion.<br />
<br />
==== Matlab compatibility ====<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
** Octave can read either formula results (evaluated formulas) or the formula text strings; Matlab can't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
==== Comparison of interfaces & usage ====<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.8) not all Excel functions have been implemented. Obviously, as new functions are added in every new Excel release it's hard to catch up for Apache POI. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
'''OCT''' offers read support for OOXML files (.xlsx) only, but it is by far the fastest read option; faster than Excel itself.<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
==== Troubleshooting ====<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
==== Development ====<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
=== OCT interface ===<br />
<br />
Since io package version 1.2.4, an interface called "OCT" was added. Except for unzip, it has no dependencies. It's still experimental but fast! Feel free to test it and give us a feedback.<br />
Currently it supports reading and writing .xlsx, .ods and .gnumeric files (the latter in yet-to-be-released io-2.2.2).<br />
If <br />
<pre>chk_spreadsheet_support == 0</pre><br />
<br />
it's used automatically (default interface). Otherwise you can force the usage like <br />
<br />
<pre>m = xlsread ('file.xlsx', 1, [], 'OCT');</pre><br />
<br />
Since io package version 2.2.0, the "OCT" interface has experimental write support for .xlsx and .ods formats, since io-2.2.2 (expected mid-May 2014) also for gnumeric. If you can't wait for gnumeric I/O you can checkout a snapshot from svn (see octave.sf.net, http://sourceforge.net/p/octave/code/HEAD/tree/trunk/octave-forge/main/io/)<br />
<br />
[[Category:Octave-Forge]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=5030IO package2014-08-19T18:52:32Z<p>83.163.225.168: /* About read/write support */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
=== About read/write support ===<br />
<br />
Most people need this package to read and write Excel files. But the io package can read/write Open/Libre Office, Gnumeric and some less important files too.<br />
<br />
<pre><nowiki><br />
File extension COM POI POI/OOXML JXL OXS UNO OTK JOD OCT<br />
--------------------------------------------------------------<br />
.xls (Excel95) R R R<br />
.xls (Excel97-2003) + + + + + +<br />
.xlsx (Excel2007+) ~ + (+) + +<br />
.xlsb, .xlsm ~ ? R R?<br />
.wk1 + R<br />
.wks + R<br />
.dbf + +<br />
.ods ~ + + + +<br />
.sxc + +<br />
.fods +<br />
.uos +<br />
.dif + +<br />
.csv + R<br />
.gnumeric +<br />
--------------------------------------------------------------<br />
<br />
R : only read; + : full read/write; ~ : dependent on Excel version<br />
</nowiki></pre><br />
<br />
<br />
==== xlswrite / odswrite versus xlsopen / odsopen ..... xlsclose / odsclose ====<br />
<br />
Matlab users are used to xlsread and xlswrite, functions that can only read data from, or write data to, one sheet in a spreadsheet file at a time. For each operation, xlsread and xlswrite first have to read the entire spreadsheet file, for write operations xlswrite also has to finally write it out completely to disk.<br />
There are faster ways, but then you'll have to dive into ActiveX/COM/VisualBasic programming.<br />
<br />
If you want to move multiple pieces of data to/from a spreadsheet file, the io package offers a much more versatile scheme:<br />
* First open the spreadsheet file using xlsopen (for Excel or gnumeric files) or odsopen (.ods or .gnumeric). <br />
'''NOTE''': the output of these functions is a file pointer handle that you should treat carefully!<br />
* (for reading data) Read the data using raw_data = xls2oct (fileptr [,sheet#] [,cellrange] [,options])<br />
* Next, optionally split the data in numerical, text and raw data and optionally get the limits of where these came from:<br />
[num, txt, raw, lims] = parsecell (data, <fileptr.lims>)<br />
* (for writing data) Write the data using <fileptr> = oct2xls (data, <fileptr> [,sheet#] [,cellrange] [,options])<br />
* When you're finished, DO NOT FORGET to colse the file pointer handle:<br />
<fileptr> = xlsclose (<fileptr>)</pre><br />
<br />
Mixing read and write operations in any order is permitted (the only exception: not with the JXL -JExcelAPI- interface).<br />
The same goes for odsopen-ods2oct-oct2ods-odsclose sequences.<br />
<br />
Obviously this is much more flexible (and FASTER) than xlsread and xlswrite. In fact, Octave's io package xlsread is a mere wrapper for an xlsopen-xls2oct-parsecell-xlsclose sequence. Similarly for xlswrite, odsread, and odswrite.<br />
<br />
<br />
==== .xls ~= .xlsx ====<br />
<br />
'''This is the most important information you have to keep in mind when you have to work with "Excel" files.''' <br />
* .xls - is an outdated default binary file format from <= Office 2003 - '''try to avoid this format!'''<br />
* .xlsx - is the new default file format since Office 2007. [https://en.wikipedia.org/wiki/OOXML It consists of xml files stored in a .zip container.] - '''always save in or convert to this format!'''<br />
* The ''(new)'' OCT interface can read ''(since version 1.2.5)'' and write ''(since version 2.2.0)'' .xlsx files dependency-free! No need of MS Windows+Office nor Java.<br />
* Windows is notorious for hiding "known" file extensions. However in Windows Explorer it is easy to change this and have Windows show all file extensions.<br />
<br />
<br />
==== different interfaces ====<br />
<br />
The io package comes with different interfaces to read/write different file formats.<br />
# COM<br />
## This ''(interface)'' is only available on MS Windows '''and''' with an MS Office installation.<br />
# [POI, POI/OOXML, JXL, OXS, UNO, OTK, JOD]<br />
## These are java-based interfaces. They are generally slower than Octave's native OCT interface; OTOH they offer more flexibility. Generally the OCT interface offers sufficient flexibility and speed. <br />
# OCT<br />
## This is the new impressive and fast ''(mostly written in Octave itself! + two C files to bypass bottlenecks)'' interface which presently supports .xlsx, .ods and .gnumeric files.<br />
(Note that .ods is a complicated file format with many gotchas that doesn't lend itself for fast file I/O. So unfortunately the fastest .ods interface is the Java-based jOpenDocument (JOD) (luckily it is GPL). However if speed is not an issue or if you hate Java, the OCT interface still performs fast enough.)<br />
<br />
So, if you want to read/write '''.xlsx''' files, you'll only need the io-package >=2.2.0. <br />
<br />
But if you have to read/write '''.xls''' files, you'll need either<br />
* MS Windows with MS Office backings - or<br />
* Octave built with --enable-java, + a Java JRE or -JDK, and one or more of the Java interfaces (i.e., the class libs)!<br />
<br />
If you want to read/write .gnumeric files, the OCT interface is even the only option.<br />
<br />
For some rarely used file formats you'll need LibreOffice + Octave built with Java enabled + a Java JRE or -JDK. But OK, once there you can enjoy formats then like Unified Office Format, Data Interchange Format, SYLK, OpenDocument Flat XML, the old OpenOffice.org .sxc format and some others you may have heard of ;-)<br />
<br />
<br />
==== force an interface ====<br />
<br />
If you don't want that the io-autodetect take control, you can easily force the usage of an interface. Examples:<br />
<br />
Force native OCT interface - only for .xlsx, .ods, .gnumeric<br />
<pre>OCT = xlsread ('file.xlsx', 1, [], 'OCT');</pre><br />
<br />
Force COM interface - may only work with .xls, .xlsx on Windows OS and available office installation.<br />
<pre>COM = xlsread ('file.xlsx', 1, [], 'COM');</pre><br />
<br />
Force POI interface - may only work if you've did javaaddpath for the Apache POI .jar files - only .xls<br />
<pre>POI = xlsread ('file.xls', 1, [], 'POI');</pre><br />
<br />
And so on ...<br />
<br />
<br />
==== Java example ====<br />
<br />
# Again: You only need Java if you have to read/write .xls files! You don't need this for .xlsx files!<br />
# Make sure you've setup everything with java correctly<br />
# get e.g. apache poi jar library files and add them with javaaddpath<br />
<pre><nowiki><br />
octave:1> javaaddpath('~/poi_library/poi-3.8-20120326.jar');<br />
octave:2> javaaddpath('~/poi_library/poi-ooxml-3.8-20120326.jar');<br />
octave:3> javaaddpath('~/poi_library/poi-ooxml-schemas-3.8-20120326.jar');<br />
octave:4> javaaddpath('~/poi_library/xmlbeans-2.3.0.jar');<br />
octave:5> javaaddpath('~/poi_library/dom4j-1.6.1.jar');<br />
octave:6> <br />
octave:6> pkg load io<br />
octave:7> chk_spreadsheet_support <br />
ans = 6<br />
octave:8> javaclasspath <br />
STATIC JAVA PATH<br />
<br />
- empty -<br />
<br />
DYNAMIC JAVA PATH<br />
<br />
/home/markus/poi_library/poi-3.8-20120326.jar<br />
/home/markus/poi_library/poi-ooxml-3.8-20120326.jar<br />
/home/markus/poi_library/poi-ooxml-schemas-3.8-20120326.jar<br />
/home/markus/poi_library/xmlbeans-2.3.0.jar<br />
/home/markus/poi_library/dom4j-1.6.1.jar<br />
<br />
</nowiki></pre><br />
<br />
An easier way is to collect all required Java class libs fo spreadsheet I/O (the .jar files) in one subdir and have chk_spreadsheet_support .m sort it all out:<br />
<pre><nowiki><br />
octave:8> chk_spreadsheet_support ('/full/path/to/subdir/with/.jar/files')<br />
</nowiki></pre><br />
<br />
For UNO (LibreOffice-behind-the-scenes) the call is a bit different:<br />
<pre><nowiki><br />
octave:8> chk_spreadsheet_support ('', 0, '/full/path/to/LibreOffice/installation')<br />
</nowiki></pre><br />
<br />
(On Windows, the io package tries to automatically find all required Java class libs and LibreOffice. To help it, put the Java class libs in you user profile (home directory) in a subdir "java", e.g., C:\Users\Eddy\java. chk_spreadsheet_support searches that location automagically.<br />
On Linux this automatic searching has been disabled as the io package took ages (well, minutes) to load...)<br />
<br />
<br />
Anyway, the chk_spreadsheet_support output should be now > 0.<br />
<br />
<pre><nowiki><br />
0 No spreadsheet I/O support found<br />
---------- XLS (Excel) interfaces: ----------<br />
1 = COM (ActiveX / Excel)<br />
2 = POI (Java / Apache POI)<br />
4 = POI+OOXML (Java / Apache POI)<br />
8 = JXL (Java / JExcelAPI)<br />
16 = OXS (Java / OpenXLS)<br />
--- ODS (OpenOffice.org Calc) interfaces ----<br />
32 = OTK (Java/ ODF Toolkit)<br />
64 = JOD (Java / jOpenDocument)<br />
----------------- XLS & ODS: ----------------<br />
128 = UNO (Java / UNO bridge - OpenOffice.org)<br />
</nowiki></pre><br />
<br />
And reading/writing .xls files should work.<br />
<br />
== Detailed Information (TL) ==<br />
<br />
The following might be more interesting if you're interested in how things work inside the io package.<br />
<br />
=== ODS support ===<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
==== Files content ====<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
==== Required support software ====<br />
For the OCT interface (since 1.2.4/1.2.5, read-only support!):<br />
* Nothing except unzip<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 has been confirmed to work with odfdom 0.8.7 and earlier. odfdom-0.8.8 hasn't been tested with other xercesImpl.jar releases yet). Get them here:<br />
** http://incubator.apache.org/odftoolkit/downloads.html (contains odfdom-0.8.8)<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.3 (final) is the most recent one and recommended for Octave).<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org or www.libreoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the Java classpath.<br />
<br />
==== Usage ====<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
==== Spreadsheet formula support ====<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
==== Gotchas ====<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
===== Date and time in ODS =====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year). (Why mention MS-Excel here? See below:) <br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
===== Java memory pool allocation size =====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
===== Reading cells containing errors =====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc due to e.g., invalid formulas, may have a 0 (zero) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's :<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
<br />
NOT fixed in jOpenDocument version 1.2 & 1.3 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
==== Matlab compatibility ====<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
==== Comparison of interfaces ====<br />
The OCT interface (present as of io-1.2.4) offers read support for ODS 1.2, complete with all the options of ODFtoolkit and UNO, but fairly slow.<br />
<br />
The OTK interface (ODFtoolkit) is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The JOD (jOpenDocument) interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
==== Troubleshooting ====<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
==== Development ====<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
==== ODFDOM versions ====<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on versions 0.8.7 and 0.8.8 have been tested too - these needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet. Version 0.8.9 introduced an undocumented dependency on some obscure Java class lib - I think due to a bit sloppy development procedures. Anyway I couldn't get it to work.<br />
So at the moment (Summer 2013 = last I looked) only odfdom versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
=== XLS support ===<br />
==== Files content ====<br />
* '''xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users.<br />
<br />
==== Required support software ====<br />
For the OCT interface (since 1.2.4/1.2.5, read-only support for OOXML (.xlsx)!):<br />
* Nothing except unzip<br />
<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed. Note that 64-bit MS-Office has no support for COMx /ActiveX so you might have to resort to the Java interfaces below<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED. Currently (2013) windows-1.2.1 is the best option.<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java JRE or JDK > 1.6.0 (hasn't been tested with earlier versions). Although not an Octave issue, as to security you'd better get the latest Java version anyway.<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''' or '''xmlbeans.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath.<br />
<br />
==== Usage ====<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen - xls2oct - xlsclose - parsecell''' and '''xlsopen - oct2xls - xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you can do with the OCT interface (specify "oct" for the REQINTF parameter). For other Excel file types you need MS-Excel for Windows (or later version) and the windows package (specify "com" for REQINTF), and/or Apache POI and Java support (then the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed).<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI.<br />
<br />
==== Spreadsheet formula support ====<br />
When using the COM, POI, JXL, and UNO interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results, or the literal formula text strings (also works with OCT interface);<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL) nor OpenXLS (OXS). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL' or 'OXS', do not expect meaningful results when reading those files later on ,unless you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion.<br />
<br />
==== Matlab compatibility ====<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
** Octave can read either formula results (evaluated formulas) or the formula text strings; Matlab can't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
==== Comparison of interfaces & usage ====<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.8) not all Excel functions have been implemented. Obviously, as new functions are added in every new Excel release it's hard to catch up for Apache POI. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
'''OCT''' offers read support for OOXML files (.xlsx) only, but it is by far the fastest read option; faster than Excel itself.<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
==== Troubleshooting ====<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
==== Development ====<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
=== OCT interface ===<br />
<br />
Since io package version 1.2.4, an interface called "OCT" was added. Except for unzip, it has no dependencies. It's still experimental but fast! Feel free to test it and give us a feedback.<br />
Currently it supports reading and writing .xlsx, .ods and .gnumeric files (the latter in yet-to-be-released io-2.2.2).<br />
If <br />
<pre>chk_spreadsheet_support == 0</pre><br />
<br />
it's used automatically (default interface). Otherwise you can force the usage like <br />
<br />
<pre>m = xlsread ('file.xlsx', 1, [], 'OCT');</pre><br />
<br />
Since io package version 2.2.0, the "OCT" interface has experimental write support for .xlsx and .ods formats, since io-2.2.2 (expected mid-May 2014) also for gnumeric. If you can't wait for gnumeric I/O you can checkout a snapshot from svn (see octave.sf.net, http://sourceforge.net/p/octave/code/HEAD/tree/trunk/octave-forge/main/io/)<br />
<br />
[[Category:Octave-Forge]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=5029IO package2014-08-19T18:08:22Z<p>83.163.225.168: /* About read/write support (TL;DR) */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
=== About read/write support ===<br />
<br />
Most people need this package to read and write Excel files. But the io package can read/write Open/Libre Office, Gnumeric and some less important files too.<br />
<br />
<pre><nowiki><br />
File extension COM POI POI/OOXML JXL OXS UNO OTK JOD OCT<br />
--------------------------------------------------------------<br />
.xls (Excel95) R R R<br />
.xls (Excel97-2003) + + + + + +<br />
.xlsx (Excel2007+) ~ + (+) + +<br />
.xlsb, .xlsm ~ ? R R?<br />
.wk1 + R<br />
.wks + R<br />
.dbf + +<br />
.ods ~ + + + +<br />
.sxc + +<br />
.fods +<br />
.uos +<br />
.dif + +<br />
.csv + R<br />
.gnumeric +<br />
--------------------------------------------------------------<br />
<br />
R : only read; + : full read/write; ~ : dependent on Excel version<br />
</nowiki></pre><br />
<br />
<br />
==== xlswrite / odswrite versus xlsopen / odsopen ..... xlsclose / odsclose ====<br />
<br />
Matlab users are used to xlsread and xlswrite, functions that can only read data from, or write data to, one sheet in a spreadsheet file at a time. For each read, xlsread has to first read the entire spreadsheet file, for write operations xlswrite also has to finally write it out completely to disk.<br />
For those of you who love this for OpenOffice/LibreOffice files, I've made odswrite and odsread who portray the same inefficiency.<br />
<br />
If you want to move multiple pieces of data to/from a spreadsheet file, the io package offers a much more versatile scheme:<br />
* First open the spreadsheet file using xlsopen (for Excel or gnumeric files) of odsopen (.ods or .gnumeric). <br />
'''NOTE''': the output of these functions is a file pointer handle that you should treat carefully!<br />
* (for reading data) Read the data using raw_data = xls2oct (fileptr [,sheet#] [,cellrange] [,options])<br />
* Next, optionally split the data in numerical, text and raw data and optionally get the limits of where these came from:<br />
[num, txt, raw, lims] = parsecell (data, <fileptr.lims>)<br />
* (for writing data) Write the data using <fileptr> = oct2xls (data, <fileptr> [,sheet#] [,cellrange] [,options])<br />
* When you're finished, DO NOT FORGET to colse the file pointer handle:<br />
<fileptr> = xlsclose (<fileptr>)</pre><br />
<br />
Mixing read and write operations in any order is permitted (the only exception: not with the JXL -JExcelAPI- interface).<br />
The same goes for odsopen-ods2oct-oct2ods-odsclose sequences.<br />
<br />
Obviously this is much more flexible (and FASTER) than xlsread and xlswrite. In fact, Octave's io package xlsread is a mere wrapper for an xlsopen-xls2oct-parsecell-xlsclose sequence. Similarly for xlswrite, odsread, and odswrite.<br />
<br />
<br />
==== .xls ~= .xlsx ====<br />
<br />
'''This is the most important information you have to keep in mind when you have to work with "Excel" files.''' <br />
* .xls - is an outdated default binary file format from <= Office 2003 - '''try to avoid this format!'''<br />
* .xlsx - is the new default file format since Office 2007. [https://en.wikipedia.org/wiki/OOXML It consists of xml files stored in a .zip container.] - '''always save in or convert to this format!'''<br />
* The ''(new)'' OCT interface can read ''(since version 1.2.5)'' and write ''(since version 2.2.0)'' .xlsx files dependency-free! No need of MS Windows+Office nor Java.<br />
* Windows is notorious for hiding "known" file extensions. However in Windows Explorer it is easy to change this and have Windows show all file extensions.<br />
<br />
<br />
==== different interfaces ====<br />
<br />
The io package comes with different interfaces to read/write different file formats.<br />
# COM<br />
## This ''(interface)'' is only available on MS Windows '''and''' with an MS Office installation.<br />
# [POI, POI/OOXML, JXL, OXS, UNO, OTK, JOD]<br />
## These are java-based interfaces. They are generally slower than Octave's native OCT interface; OTOH they offer more flexibility. Generally the OCT interface offers sufficient flexibility and speed. <br />
# OCT<br />
## This is the new impressive and fast ''(mostly written in Octave itself! + two C files to bypass bottlenecks)'' interface which presently supports .xlsx, .ods and .gnumeric files.<br />
(Note that .ods is a complicated file format with many gotchas that doesn't lend itself for fast file I/O. So unfortunately the fastest .ods interface is the Java-based jOpenDocument (JOD) (luckily it is GPL). However if speed is not an issue or if you hate Java, the OCT interface still performs fast enough.)<br />
<br />
So, if you want to read/write '''.xlsx''' files, you'll only need the io-package >=2.2.0. <br />
<br />
But if you have to read/write '''.xls''' files, you'll need either<br />
* MS Windows with MS Office backings - or<br />
* Octave built with --enable-java, + a Java JRE or -JDK, and one or more of the Java interfaces (i.e., the class libs)!<br />
<br />
If you want to read/write .gnumeric files, the OCT interface is even the only option.<br />
<br />
For some rarely used file formats you'll need LibreOffice + Octave built with Java enabled + a Java JRE or -JDK. But OK, once there you can enjoy formats then like Unified Office Format, Data Interchange Format, SYLK, OpenDocument Flat XML, the old OpenOffice.org .sxc format and some others you may have heard of ;-)<br />
<br />
<br />
==== force an interface ====<br />
<br />
If you don't want that the io-autodetect take control, you can easily force the usage of an interface. Examples:<br />
<br />
Force native OCT interface - only for .xlsx, .ods, .gnumeric<br />
<pre>OCT = xlsread ('file.xlsx', 1, [], 'OCT');</pre><br />
<br />
Force COM interface - may only work with .xls, .xlsx on Windows OS and available office installation.<br />
<pre>COM = xlsread ('file.xlsx', 1, [], 'COM');</pre><br />
<br />
Force POI interface - may only work if you've did javaaddpath for the Apache POI .jar files - only .xls<br />
<pre>POI = xlsread ('file.xls', 1, [], 'POI');</pre><br />
<br />
And so on ...<br />
<br />
<br />
==== Java example ====<br />
<br />
# Again: You only need Java if you have to read/write .xls files! You don't need this for .xlsx files!<br />
# Make sure you've setup everything with java correctly<br />
# get e.g. apache poi jar library files and add them with javaaddpath<br />
<pre><nowiki><br />
octave:1> javaaddpath('~/poi_library/poi-3.8-20120326.jar');<br />
octave:2> javaaddpath('~/poi_library/poi-ooxml-3.8-20120326.jar');<br />
octave:3> javaaddpath('~/poi_library/poi-ooxml-schemas-3.8-20120326.jar');<br />
octave:4> javaaddpath('~/poi_library/xmlbeans-2.3.0.jar');<br />
octave:5> javaaddpath('~/poi_library/dom4j-1.6.1.jar');<br />
octave:6> <br />
octave:6> pkg load io<br />
octave:7> chk_spreadsheet_support <br />
ans = 6<br />
octave:8> javaclasspath <br />
STATIC JAVA PATH<br />
<br />
- empty -<br />
<br />
DYNAMIC JAVA PATH<br />
<br />
/home/markus/poi_library/poi-3.8-20120326.jar<br />
/home/markus/poi_library/poi-ooxml-3.8-20120326.jar<br />
/home/markus/poi_library/poi-ooxml-schemas-3.8-20120326.jar<br />
/home/markus/poi_library/xmlbeans-2.3.0.jar<br />
/home/markus/poi_library/dom4j-1.6.1.jar<br />
<br />
</nowiki></pre><br />
<br />
An easier way is to collect all required Java class libs fo spreadsheet I/O (the .jar files) in one subdir and have chk_spreadsheet_support .m sort it all out:<br />
<pre><nowiki><br />
octave:8> chk_spreadsheet_support ('/full/path/to/subdir/with/.jar/files')<br />
</nowiki></pre><br />
<br />
For UNO (LibreOffice-behind-the-scenes) the call is a bit different:<br />
<pre><nowiki><br />
octave:8> chk_spreadsheet_support ('', 0, '/full/path/to/LibreOffice/installation')<br />
</nowiki></pre><br />
<br />
(On Windows, the io package tries to automatically find all required Java class libs and LibreOffice. To help it, put the Java class libs in you user profile (home directory) in a subdir "java", e.g., C:\Users\Eddy\java. chk_spreadsheet_support searches that location automagically.<br />
On Linux this automatic searching has been disabled as the io package took ages (well, minutes) to load...)<br />
<br />
<br />
Anyway, the chk_spreadsheet_support output should be now > 0.<br />
<br />
<pre><nowiki><br />
0 No spreadsheet I/O support found<br />
---------- XLS (Excel) interfaces: ----------<br />
1 = COM (ActiveX / Excel)<br />
2 = POI (Java / Apache POI)<br />
4 = POI+OOXML (Java / Apache POI)<br />
8 = JXL (Java / JExcelAPI)<br />
16 = OXS (Java / OpenXLS)<br />
--- ODS (OpenOffice.org Calc) interfaces ----<br />
32 = OTK (Java/ ODF Toolkit)<br />
64 = JOD (Java / jOpenDocument)<br />
----------------- XLS & ODS: ----------------<br />
128 = UNO (Java / UNO bridge - OpenOffice.org)<br />
</nowiki></pre><br />
<br />
And reading/writing .xls files should work.<br />
<br />
== Detailed Information (TL) ==<br />
<br />
The following might be more interesting if you're interested in how things work inside the io package.<br />
<br />
=== ODS support ===<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
==== Files content ====<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
==== Required support software ====<br />
For the OCT interface (since 1.2.4/1.2.5, read-only support!):<br />
* Nothing except unzip<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 has been confirmed to work with odfdom 0.8.7 and earlier. odfdom-0.8.8 hasn't been tested with other xercesImpl.jar releases yet). Get them here:<br />
** http://incubator.apache.org/odftoolkit/downloads.html (contains odfdom-0.8.8)<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.3 (final) is the most recent one and recommended for Octave).<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org or www.libreoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the Java classpath.<br />
<br />
==== Usage ====<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
==== Spreadsheet formula support ====<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
==== Gotchas ====<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
===== Date and time in ODS =====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year). (Why mention MS-Excel here? See below:) <br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
===== Java memory pool allocation size =====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
===== Reading cells containing errors =====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc due to e.g., invalid formulas, may have a 0 (zero) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's :<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
<br />
NOT fixed in jOpenDocument version 1.2 & 1.3 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
==== Matlab compatibility ====<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
==== Comparison of interfaces ====<br />
The OCT interface (present as of io-1.2.4) offers read support for ODS 1.2, complete with all the options of ODFtoolkit and UNO, but fairly slow.<br />
<br />
The OTK interface (ODFtoolkit) is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The JOD (jOpenDocument) interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
==== Troubleshooting ====<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
==== Development ====<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
==== ODFDOM versions ====<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on versions 0.8.7 and 0.8.8 have been tested too - these needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet. Version 0.8.9 introduced an undocumented dependency on some obscure Java class lib - I think due to a bit sloppy development procedures. Anyway I couldn't get it to work.<br />
So at the moment (Summer 2013 = last I looked) only odfdom versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
=== XLS support ===<br />
==== Files content ====<br />
* '''xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users.<br />
<br />
==== Required support software ====<br />
For the OCT interface (since 1.2.4/1.2.5, read-only support for OOXML (.xlsx)!):<br />
* Nothing except unzip<br />
<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed. Note that 64-bit MS-Office has no support for COMx /ActiveX so you might have to resort to the Java interfaces below<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED. Currently (2013) windows-1.2.1 is the best option.<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java JRE or JDK > 1.6.0 (hasn't been tested with earlier versions). Although not an Octave issue, as to security you'd better get the latest Java version anyway.<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''' or '''xmlbeans.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath.<br />
<br />
==== Usage ====<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen - xls2oct - xlsclose - parsecell''' and '''xlsopen - oct2xls - xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you can do with the OCT interface (specify "oct" for the REQINTF parameter). For other Excel file types you need MS-Excel for Windows (or later version) and the windows package (specify "com" for REQINTF), and/or Apache POI and Java support (then the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed).<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI.<br />
<br />
==== Spreadsheet formula support ====<br />
When using the COM, POI, JXL, and UNO interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results, or the literal formula text strings (also works with OCT interface);<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL) nor OpenXLS (OXS). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL' or 'OXS', do not expect meaningful results when reading those files later on ,unless you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion.<br />
<br />
==== Matlab compatibility ====<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
** Octave can read either formula results (evaluated formulas) or the formula text strings; Matlab can't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
==== Comparison of interfaces & usage ====<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.8) not all Excel functions have been implemented. Obviously, as new functions are added in every new Excel release it's hard to catch up for Apache POI. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
'''OCT''' offers read support for OOXML files (.xlsx) only, but it is by far the fastest read option; faster than Excel itself.<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
==== Troubleshooting ====<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
==== Development ====<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
=== OCT interface ===<br />
<br />
Since io package version 1.2.4, an interface called "OCT" was added. Except for unzip, it has no dependencies. It's still experimental but fast! Feel free to test it and give us a feedback.<br />
Currently it supports reading and writing .xlsx, .ods and .gnumeric files (the latter in yet-to-be-released io-2.2.2).<br />
If <br />
<pre>chk_spreadsheet_support == 0</pre><br />
<br />
it's used automatically (default interface). Otherwise you can force the usage like <br />
<br />
<pre>m = xlsread ('file.xlsx', 1, [], 'OCT');</pre><br />
<br />
Since io package version 2.2.0, the "OCT" interface has experimental write support for .xlsx and .ods formats, since io-2.2.2 (expected mid-May 2014) also for gnumeric. If you can't wait for gnumeric I/O you can checkout a snapshot from svn (see octave.sf.net, http://sourceforge.net/p/octave/code/HEAD/tree/trunk/octave-forge/main/io/)<br />
<br />
[[Category:Octave-Forge]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=5025IO package2014-08-19T17:39:40Z<p>83.163.225.168: /* About read/write support (TL;DR) */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
=== About read/write support (TL;DR) ===<br />
<br />
Most people need this package to read and write Excel files. But the io package can read/write Open/Libre Office, Gnumeric and some less important files too.<br />
<br />
<pre><nowiki><br />
File extension COM POI POI/OOXML JXL OXS UNO OTK JOD OCT<br />
--------------------------------------------------------------<br />
.xls (Excel95) R R R<br />
.xls (Excel97-2003) + + + + + +<br />
.xlsx (Excel2007+) ~ + (+) + +<br />
.xlsb, .xlsm ~ ? R R?<br />
.wk1 + R<br />
.wks + R<br />
.dbf + +<br />
.ods ~ + + + +<br />
.sxc + +<br />
.fods +<br />
.uos +<br />
.dif + +<br />
.csv + R<br />
.gnumeric +<br />
--------------------------------------------------------------<br />
<br />
R : only read; + : full read/write; ~ : dependent on Excel version<br />
</nowiki></pre><br />
<br />
<br />
==== .xls ~= .xlsx ====<br />
<br />
'''This is the most important information you have to keep in mind when you have to work with "Excel" files.''' <br />
* .xls - is an outdated default binary file format from <= Office 2003 - '''try to avoid this format!'''<br />
* .xlsx - is the new default file format since Office 2007. [https://en.wikipedia.org/wiki/OOXML It consists of xml files stored in a .zip container.] - '''always save in or convert to this format!'''<br />
** The ''(new)'' OCT interface can read ''(since version 1.2.5)'' and write ''(since version 2.2.0)'' .xlsx files dependency-free! No need of MS Windows+Office nor Java.<br />
* Windows is notorious for hiding "known" file extensions. However in Windows Explorer it is easy to change this and have Windows show all file extensions.<br />
<br />
==== different interfaces ====<br />
<br />
The io package comes with different interfaces to read/write different file formats.<br />
# COM<br />
## This ''(interface)'' is only available on MS Windows '''and''' with an MS Office installation.<br />
# [POI, POI/OOXML, JXL, OXS, UNO, OTK, JOD]<br />
## These are java-based interfaces. They are generally slower than Octave's native OCT interface; OTOH they offer more flexibility. Generally the OCT interface offers sufficient flexibility and speed. <br />
# OCT<br />
## This is the new impressive and fast ''(mostly written in Octave itself! + two C files to bypass bottlenecks)'' interface which presently supports .xlsx, .ods and .gnumeric files.<br />
(Note that .ods is a complicated file format with many gotchas that doesn't lend itself for fast file I/O. So unfortunately the fastest .ods interface is the Java-based jOpenDocument (JOD) (luckily it is GPL). However if speed is not an issue or if you hate Java, the OCT interface still performs fast enough.)<br />
<br />
So, if you want to read/write '''.xlsx''' files, you'll only need the io-package >=2.2.0. <br />
<br />
But if you have to read/write '''.xls''' files, you'll need either<br />
* MS Windows with MS Office backings - or<br />
* Octave built with --enable-java, + a Java JRE or -JDK, and one or more of the Java interfaces (i.e., the class libs)!<br />
<br />
If you want to read/write .gnumeric files, the OCT interface is even the only option.<br />
<br />
For some rarely used file formats you'll need LibreOffice + Octave built with Java enabled + a Java JRE or -JDK. But OK, once there you can enjoy formats then like Unified Office Format, Data Interchange Format, SYLK, OpenDocument Flat XML, the old OpenOffice.org .sxc format and some others you may have heard of ;-)<br />
<br />
==== force an interface ====<br />
<br />
If you don't want that the io-autodetect take control, you can easily force the usage of an interface. Examples:<br />
<br />
Force native OCT interface - only for .xlsx, .ods, .gnumeric<br />
<pre>OCT = xlsread ('file.xlsx', 1, [], 'OCT');</pre><br />
<br />
Force COM interface - may only work with .xls, .xlsx on Windows OS and available office installation.<br />
<pre>COM = xlsread ('file.xlsx', 1, [], 'COM');</pre><br />
<br />
Force POI interface - may only work if you've did javaaddpath for the Apache POI .jar files - only .xls<br />
<pre>POI = xlsread ('file.xls', 1, [], 'POI');</pre><br />
<br />
And so on ...<br />
<br />
==== Java example ====<br />
<br />
# Again: You only need Java if you have to read/write .xls files! You don't need this for .xlsx files!<br />
# Make sure you've setup everything with java correctly<br />
# get e.g. apache poi jar library files and add them with javaaddpath<br />
<pre><nowiki><br />
octave:1> javaaddpath('~/poi_library/poi-3.8-20120326.jar');<br />
octave:2> javaaddpath('~/poi_library/poi-ooxml-3.8-20120326.jar');<br />
octave:3> javaaddpath('~/poi_library/poi-ooxml-schemas-3.8-20120326.jar');<br />
octave:4> javaaddpath('~/poi_library/xmlbeans-2.3.0.jar');<br />
octave:5> javaaddpath('~/poi_library/dom4j-1.6.1.jar');<br />
octave:6> <br />
octave:6> pkg load io<br />
octave:7> chk_spreadsheet_support <br />
ans = 6<br />
octave:8> javaclasspath <br />
STATIC JAVA PATH<br />
<br />
- empty -<br />
<br />
DYNAMIC JAVA PATH<br />
<br />
/home/markus/poi_library/poi-3.8-20120326.jar<br />
/home/markus/poi_library/poi-ooxml-3.8-20120326.jar<br />
/home/markus/poi_library/poi-ooxml-schemas-3.8-20120326.jar<br />
/home/markus/poi_library/xmlbeans-2.3.0.jar<br />
/home/markus/poi_library/dom4j-1.6.1.jar<br />
<br />
</nowiki></pre><br />
<br />
An easier way is to collect all required Java class libs fo spreadsheet I/O (the .jar files) in one subdir and have chk_spreadsheet_support .m sort it all out:<br />
<pre><nowiki><br />
octave:8> chk_spreadsheet_support ('/full/path/to/subdir/with/.jar/files')<br />
</nowiki></pre><br />
<br />
For UNO (LibreOffice-behind-the-scenes) the call is a bit different:<br />
<pre><nowiki><br />
octave:8> chk_spreadsheet_support ('', 0, '/full/path/to/LibreOffice/installation')<br />
</nowiki></pre><br />
<br />
(On Windows, the io package tries to automatically find all required Java class libs and LibreOffice. To help it, put the Java class libs in you user profile (home directory) in a subdir "java", e.g., C:\Users\Eddy\java. chk_spreadsheet_support searches that location automagically.<br />
On Linux this automatic searching has been disabled as the io package took ages (well, minutes) to load...)<br />
<br />
<br />
Anyway, the chk_spreadsheet_support output should be now > 0.<br />
<br />
<pre><nowiki><br />
0 No spreadsheet I/O support found<br />
---------- XLS (Excel) interfaces: ----------<br />
1 = COM (ActiveX / Excel)<br />
2 = POI (Java / Apache POI)<br />
4 = POI+OOXML (Java / Apache POI)<br />
8 = JXL (Java / JExcelAPI)<br />
16 = OXS (Java / OpenXLS)<br />
--- ODS (OpenOffice.org Calc) interfaces ----<br />
32 = OTK (Java/ ODF Toolkit)<br />
64 = JOD (Java / jOpenDocument)<br />
----------------- XLS & ODS: ----------------<br />
128 = UNO (Java / UNO bridge - OpenOffice.org)<br />
</nowiki></pre><br />
<br />
And reading/writing .xls files should work.<br />
<br />
== Detailed Information (TL) ==<br />
<br />
The following might be more interesting if you're interested in how things work inside the io package.<br />
<br />
=== ODS support ===<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
==== Files content ====<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
==== Required support software ====<br />
For the OCT interface (since 1.2.4/1.2.5, read-only support!):<br />
* Nothing except unzip<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 has been confirmed to work with odfdom 0.8.7 and earlier. odfdom-0.8.8 hasn't been tested with other xercesImpl.jar releases yet). Get them here:<br />
** http://incubator.apache.org/odftoolkit/downloads.html (contains odfdom-0.8.8)<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.3 (final) is the most recent one and recommended for Octave).<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org or www.libreoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the Java classpath.<br />
<br />
==== Usage ====<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
==== Spreadsheet formula support ====<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
==== Gotchas ====<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
===== Date and time in ODS =====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year). (Why mention MS-Excel here? See below:) <br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
===== Java memory pool allocation size =====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
===== Reading cells containing errors =====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc due to e.g., invalid formulas, may have a 0 (zero) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's :<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
<br />
NOT fixed in jOpenDocument version 1.2 & 1.3 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
==== Matlab compatibility ====<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
==== Comparison of interfaces ====<br />
The OCT interface (present as of io-1.2.4) offers read support for ODS 1.2, complete with all the options of ODFtoolkit and UNO, but fairly slow.<br />
<br />
The OTK interface (ODFtoolkit) is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The JOD (jOpenDocument) interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
==== Troubleshooting ====<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
==== Development ====<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
==== ODFDOM versions ====<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on versions 0.8.7 and 0.8.8 have been tested too - these needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet. Version 0.8.9 introduced an undocumented dependency on some obscure Java class lib - I think due to a bit sloppy development procedures. Anyway I couldn't get it to work.<br />
So at the moment (Summer 2013 = last I looked) only odfdom versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
=== XLS support ===<br />
==== Files content ====<br />
* '''xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users.<br />
<br />
==== Required support software ====<br />
For the OCT interface (since 1.2.4/1.2.5, read-only support for OOXML (.xlsx)!):<br />
* Nothing except unzip<br />
<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed. Note that 64-bit MS-Office has no support for COMx /ActiveX so you might have to resort to the Java interfaces below<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED. Currently (2013) windows-1.2.1 is the best option.<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java JRE or JDK > 1.6.0 (hasn't been tested with earlier versions). Although not an Octave issue, as to security you'd better get the latest Java version anyway.<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''' or '''xmlbeans.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath.<br />
<br />
==== Usage ====<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen - xls2oct - xlsclose - parsecell''' and '''xlsopen - oct2xls - xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you can do with the OCT interface (specify "oct" for the REQINTF parameter). For other Excel file types you need MS-Excel for Windows (or later version) and the windows package (specify "com" for REQINTF), and/or Apache POI and Java support (then the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed).<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI.<br />
<br />
==== Spreadsheet formula support ====<br />
When using the COM, POI, JXL, and UNO interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results, or the literal formula text strings (also works with OCT interface);<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL) nor OpenXLS (OXS). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL' or 'OXS', do not expect meaningful results when reading those files later on ,unless you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion.<br />
<br />
==== Matlab compatibility ====<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
** Octave can read either formula results (evaluated formulas) or the formula text strings; Matlab can't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
==== Comparison of interfaces & usage ====<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.8) not all Excel functions have been implemented. Obviously, as new functions are added in every new Excel release it's hard to catch up for Apache POI. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
'''OCT''' offers read support for OOXML files (.xlsx) only, but it is by far the fastest read option; faster than Excel itself.<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
==== Troubleshooting ====<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
==== Development ====<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
=== OCT interface ===<br />
<br />
Since io package version 1.2.4, an interface called "OCT" was added. Except for unzip, it has no dependencies. It's still experimental but fast! Feel free to test it and give us a feedback.<br />
Currently it supports reading and writing .xlsx, .ods and .gnumeric files (the latter in yet-to-be-released io-2.2.2).<br />
If <br />
<pre>chk_spreadsheet_support == 0</pre><br />
<br />
it's used automatically (default interface). Otherwise you can force the usage like <br />
<br />
<pre>m = xlsread ('file.xlsx', 1, [], 'OCT');</pre><br />
<br />
Since io package version 2.2.0, the "OCT" interface has experimental write support for .xlsx and .ods formats, since io-2.2.2 (expected mid-May 2014) also for gnumeric. If you can't wait for gnumeric I/O you can checkout a snapshot from svn (see octave.sf.net, http://sourceforge.net/p/octave/code/HEAD/tree/trunk/octave-forge/main/io/)<br />
<br />
[[Category:Octave-Forge]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=5024IO package2014-08-19T17:37:11Z<p>83.163.225.168: /* About read/write support (TL;DR) */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
=== About read/write support (TL;DR) ===<br />
<br />
Most people need this package to read and write Excel files. But the io package can read/write Open/Libre Office, Gnumeric and some less important files too.<br />
<br />
<pre><nowiki><br />
File extension COM POI POI/OOXML JXL OXS UNO OTK JOD OCT<br />
--------------------------------------------------------------<br />
.xls (Excel95) R R R<br />
.xls (Excel97-2003) + + + + + +<br />
.xlsx (Excel2007+) ~ + (+) R +<br />
.xlsb, .xlsm ~ ? R R?<br />
.wk1 + R<br />
.wks + R<br />
.dbf + +<br />
.ods ~ + + + +<br />
.sxc + +<br />
.fods +<br />
.uos +<br />
.dif +<br />
.csv + R<br />
.gnumeric +<br />
--------------------------------------------------------------<br />
<br />
R : only read; + : full read/write; ~ : dependent on Excel version<br />
</nowiki></pre><br />
<br />
<br />
==== .xls ~= .xlsx ====<br />
<br />
'''This is the most important information you have to keep in mind when you have to work with "Excel" files.''' <br />
* .xls - is an outdated default binary file format from <= Office 2003 - '''try to avoid this format!'''<br />
* .xlsx - is the new default file format since Office 2007. [https://en.wikipedia.org/wiki/OOXML It consists of xml files stored in a .zip container.] - '''always save in or convert to this format!'''<br />
** The ''(new)'' OCT interface can read ''(since version 1.2.5)'' and write ''(since version 2.2.0)'' .xlsx files dependency-free! No need of MS Windows+Office nor Java.<br />
* Windows is notorious for hiding "known" file extensions. However in Windows Explorer it is easy to change this and have Windows show all file extensions.<br />
<br />
==== different interfaces ====<br />
<br />
The io package comes with different interfaces to read/write different file formats.<br />
# COM<br />
## This ''(interface)'' is only available on MS Windows '''and''' with an MS Office installation.<br />
# [POI, POI/OOXML, JXL, OXS, UNO, OTK, JOD]<br />
## These are java-based interfaces. They are generally slower than Octave's native OCT interface; OTOH they offer more flexibility. Generally the OCT interface offers sufficient flexibility and speed. <br />
# OCT<br />
## This is the new impressive and fast ''(mostly written in Octave itself! + two C files to bypass bottlenecks)'' interface which presently supports .xlsx, .ods and .gnumeric files.<br />
(Note that .ods is a complicated file format with many gotchas that doesn't lend itself for fast file I/O. So unfortunately the fastest .ods interface is the Java-based jOpenDocument (JOD) (luckily it is GPL). However if speed is not an issue or if you hate Java, the OCT interface still performs fast enough.)<br />
<br />
So, if you want to read/write '''.xlsx''' files, you'll only need the io-package >=2.2.0. <br />
<br />
But if you have to read/write '''.xls''' files, you'll need either<br />
* MS Windows with MS Office backings - or<br />
* Octave built with --enable-java, + a Java JRE or -JDK, and one or more of the Java interfaces (i.e., the class libs)!<br />
<br />
If you want to read/write .gnumeric files, the OCT interface is even the only option.<br />
<br />
For some rarely used file formats you'll need LibreOffice + Octave built with Java enabled + a Java JRE or -JDK. But OK, once there you can enjoy formats then like Unified Office Format, Data Interchange Format, SYLK, OpenDocument Flat XML, the old OpenOffice.org .sxc format and some others you may have heard of ;-)<br />
<br />
==== force an interface ====<br />
<br />
If you don't want that the io-autodetect take control, you can easily force the usage of an interface. Examples:<br />
<br />
Force native OCT interface - only for .xlsx, .ods, .gnumeric<br />
<pre>OCT = xlsread ('file.xlsx', 1, [], 'OCT');</pre><br />
<br />
Force COM interface - may only work with .xls, .xlsx on Windows OS and available office installation.<br />
<pre>COM = xlsread ('file.xlsx', 1, [], 'COM');</pre><br />
<br />
Force POI interface - may only work if you've did javaaddpath for the Apache POI .jar files - only .xls<br />
<pre>POI = xlsread ('file.xls', 1, [], 'POI');</pre><br />
<br />
And so on ...<br />
<br />
==== Java example ====<br />
<br />
# Again: You only need Java if you have to read/write .xls files! You don't need this for .xlsx files!<br />
# Make sure you've setup everything with java correctly<br />
# get e.g. apache poi jar library files and add them with javaaddpath<br />
<pre><nowiki><br />
octave:1> javaaddpath('~/poi_library/poi-3.8-20120326.jar');<br />
octave:2> javaaddpath('~/poi_library/poi-ooxml-3.8-20120326.jar');<br />
octave:3> javaaddpath('~/poi_library/poi-ooxml-schemas-3.8-20120326.jar');<br />
octave:4> javaaddpath('~/poi_library/xmlbeans-2.3.0.jar');<br />
octave:5> javaaddpath('~/poi_library/dom4j-1.6.1.jar');<br />
octave:6> <br />
octave:6> pkg load io<br />
octave:7> chk_spreadsheet_support <br />
ans = 6<br />
octave:8> javaclasspath <br />
STATIC JAVA PATH<br />
<br />
- empty -<br />
<br />
DYNAMIC JAVA PATH<br />
<br />
/home/markus/poi_library/poi-3.8-20120326.jar<br />
/home/markus/poi_library/poi-ooxml-3.8-20120326.jar<br />
/home/markus/poi_library/poi-ooxml-schemas-3.8-20120326.jar<br />
/home/markus/poi_library/xmlbeans-2.3.0.jar<br />
/home/markus/poi_library/dom4j-1.6.1.jar<br />
<br />
</nowiki></pre><br />
<br />
An easier way is to collect all required Java class libs fo spreadsheet I/O (the .jar files) in one subdir and have chk_spreadsheet_support .m sort it all out:<br />
<pre><nowiki><br />
octave:8> chk_spreadsheet_support ('/full/path/to/subdir/with/.jar/files')<br />
</nowiki></pre><br />
<br />
For UNO (LibreOffice-behind-the-scenes) the call is a bit different:<br />
<pre><nowiki><br />
octave:8> chk_spreadsheet_support ('', 0, '/full/path/to/LibreOffice/installation')<br />
</nowiki></pre><br />
<br />
(On Windows, the io package tries to automatically find all required Java class libs and LibreOffice. To help it, put the Java class libs in you user profile (home directory) in a subdir "java", e.g., C:\Users\Eddy\java. chk_spreadsheet_support searches that location automagically.<br />
On Linux this automatic searching has been disabled as the io package took ages (well, minutes) to load...)<br />
<br />
<br />
Anyway, the chk_spreadsheet_support output should be now > 0.<br />
<br />
<pre><nowiki><br />
0 No spreadsheet I/O support found<br />
---------- XLS (Excel) interfaces: ----------<br />
1 = COM (ActiveX / Excel)<br />
2 = POI (Java / Apache POI)<br />
4 = POI+OOXML (Java / Apache POI)<br />
8 = JXL (Java / JExcelAPI)<br />
16 = OXS (Java / OpenXLS)<br />
--- ODS (OpenOffice.org Calc) interfaces ----<br />
32 = OTK (Java/ ODF Toolkit)<br />
64 = JOD (Java / jOpenDocument)<br />
----------------- XLS & ODS: ----------------<br />
128 = UNO (Java / UNO bridge - OpenOffice.org)<br />
</nowiki></pre><br />
<br />
And reading/writing .xls files should work.<br />
<br />
== Detailed Information (TL) ==<br />
<br />
The following might be more interesting if you're interested in how things work inside the io package.<br />
<br />
=== ODS support ===<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
==== Files content ====<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
==== Required support software ====<br />
For the OCT interface (since 1.2.4/1.2.5, read-only support!):<br />
* Nothing except unzip<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 has been confirmed to work with odfdom 0.8.7 and earlier. odfdom-0.8.8 hasn't been tested with other xercesImpl.jar releases yet). Get them here:<br />
** http://incubator.apache.org/odftoolkit/downloads.html (contains odfdom-0.8.8)<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.3 (final) is the most recent one and recommended for Octave).<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org or www.libreoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the Java classpath.<br />
<br />
==== Usage ====<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
==== Spreadsheet formula support ====<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
==== Gotchas ====<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
===== Date and time in ODS =====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year). (Why mention MS-Excel here? See below:) <br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
===== Java memory pool allocation size =====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
===== Reading cells containing errors =====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc due to e.g., invalid formulas, may have a 0 (zero) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's :<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
<br />
NOT fixed in jOpenDocument version 1.2 & 1.3 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
==== Matlab compatibility ====<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
==== Comparison of interfaces ====<br />
The OCT interface (present as of io-1.2.4) offers read support for ODS 1.2, complete with all the options of ODFtoolkit and UNO, but fairly slow.<br />
<br />
The OTK interface (ODFtoolkit) is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The JOD (jOpenDocument) interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
==== Troubleshooting ====<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
==== Development ====<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
==== ODFDOM versions ====<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on versions 0.8.7 and 0.8.8 have been tested too - these needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet. Version 0.8.9 introduced an undocumented dependency on some obscure Java class lib - I think due to a bit sloppy development procedures. Anyway I couldn't get it to work.<br />
So at the moment (Summer 2013 = last I looked) only odfdom versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
=== XLS support ===<br />
==== Files content ====<br />
* '''xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users.<br />
<br />
==== Required support software ====<br />
For the OCT interface (since 1.2.4/1.2.5, read-only support for OOXML (.xlsx)!):<br />
* Nothing except unzip<br />
<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed. Note that 64-bit MS-Office has no support for COMx /ActiveX so you might have to resort to the Java interfaces below<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED. Currently (2013) windows-1.2.1 is the best option.<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java JRE or JDK > 1.6.0 (hasn't been tested with earlier versions). Although not an Octave issue, as to security you'd better get the latest Java version anyway.<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''' or '''xmlbeans.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath.<br />
<br />
==== Usage ====<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen - xls2oct - xlsclose - parsecell''' and '''xlsopen - oct2xls - xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you can do with the OCT interface (specify "oct" for the REQINTF parameter). For other Excel file types you need MS-Excel for Windows (or later version) and the windows package (specify "com" for REQINTF), and/or Apache POI and Java support (then the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed).<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI.<br />
<br />
==== Spreadsheet formula support ====<br />
When using the COM, POI, JXL, and UNO interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results, or the literal formula text strings (also works with OCT interface);<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL) nor OpenXLS (OXS). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL' or 'OXS', do not expect meaningful results when reading those files later on ,unless you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion.<br />
<br />
==== Matlab compatibility ====<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
** Octave can read either formula results (evaluated formulas) or the formula text strings; Matlab can't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
==== Comparison of interfaces & usage ====<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.8) not all Excel functions have been implemented. Obviously, as new functions are added in every new Excel release it's hard to catch up for Apache POI. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
'''OCT''' offers read support for OOXML files (.xlsx) only, but it is by far the fastest read option; faster than Excel itself.<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
==== Troubleshooting ====<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
==== Development ====<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
=== OCT interface ===<br />
<br />
Since io package version 1.2.4, an interface called "OCT" was added. Except for unzip, it has no dependencies. It's still experimental but fast! Feel free to test it and give us a feedback.<br />
Currently it supports reading and writing .xlsx, .ods and .gnumeric files (the latter in yet-to-be-released io-2.2.2).<br />
If <br />
<pre>chk_spreadsheet_support == 0</pre><br />
<br />
it's used automatically (default interface). Otherwise you can force the usage like <br />
<br />
<pre>m = xlsread ('file.xlsx', 1, [], 'OCT');</pre><br />
<br />
Since io package version 2.2.0, the "OCT" interface has experimental write support for .xlsx and .ods formats, since io-2.2.2 (expected mid-May 2014) also for gnumeric. If you can't wait for gnumeric I/O you can checkout a snapshot from svn (see octave.sf.net, http://sourceforge.net/p/octave/code/HEAD/tree/trunk/octave-forge/main/io/)<br />
<br />
[[Category:Octave-Forge]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=Java_package&diff=3230Java package2013-10-11T13:36:02Z<p>83.163.225.168: /* Make sure that the build environment is configured properly */</p>
<hr />
<div>Octave is an easy to use but powerful environment for mathematical calculations, which can easily be extended by packages. Its features are close to the commercial tool Matlab so that it can often be used as a replacement. Java on the other hand offers a rich, object oriented and platform independent environment for many applications. The core Java classes can be easily extended by many freely available libraries. This document refers to the package <code>java</code>, which is part of the GNU Octave project. This package allows you to access Java classes from inside Octave. Thus it is possible to use existing class files or complete Java libraries directly from Octave.<br />
<br />
This description is based on the Octave package {{Codeline|java-1.2.8}}. The {{Forge|java}} package usually installs its script files (.m) in the directory {{Path|.../share/Octave/packages/java-1.2.8}} and its binary (.oct) files in {{Path|.../libexec/Octave/packages/java-1.2.8}}. You can get help on specific functions in Octave by executing the help command<br />
with the name of a function from this package:<br />
octave> help javaObject<br />
<br />
You can view the whole doc file in Octave by executing the info command with just the word java:<br />
octave> doc java<br />
<br />
Note on calling Octave from Java: the java package is designed for calling Java from Octave. If you want to call Octave from Java, you might want to use a library like [http://kenai.com/projects/javaOctave javaOctave] or [http://jopas.sourceforge.net joPas]. <br />
<br />
=FAQ=<br />
==How to distinguish between Octave and Matlab?==<br />
Octave and Matlab are very similar, but handle Java slightly different. Therefore it may be necessary to [[Compatibility#Are_we_running_octave.3F|detect the environment]] and use the appropriate functions.<br />
<br />
==How to make Java classes available to Octave?==<br />
Java finds classes by searching a {{Codeline|classpath}}. This is a list of Java archive files and/or directories containing class files. In Octave and Matlab the {{Codeline|classpath}} is composed of two parts:<br />
*the static {{Codeline|classpath}} is initialized once at startup of the JVM, and;<br />
*the dynamic {{Codeline|classpath}} which can be modified at runtime.<br />
<br />
Octave searches the static {{Codeline|classpath}} first, then the dynamic {{Codeline|classpath}}. Classes appearing in the static as well as in the dynamic {{Codeline|classpath}} will therefore be found in the static {{Codeline|classpath}} and loaded from this location.<br />
<br />
Classes which shall be used regularly or must be available to all users should be added to the static {{Codeline|classpath}}. The static {{Codeline|classpath}} is populated once from the contents of a plain text file named {{Path|classpath.txt}} when the Java Virtual Machine starts. This file contains one line for each individual {{Codeline|classpath}} to be added to the static {{Codeline|classpath}}. These lines can identify single class files, directories containing class files or Java archives with complete class file hierarchies. Comment lines starting with a {{Codeline|#}} or a {{Codeline|%}} character are ignored.<br />
<br />
The search rules for the file {{Path|classpath.txt}} are:<br />
*First, Octave searches for the file {{Path|classpath.txt}} in your home directory, If such a file is found, it is read and defines the initial static {{Codeline|classpath}}. Thus it is possible to build an initial static {{Codeline|classpath}} on a "per user" basis.<br />
*Next, Octave looks for another file {{Path|classpath.txt}} in the package installation directory. This is where {{Path|javaclasspath.m}} resides, usually something like<br />
:<pre>...\share\Octave\packages\java-1.2.8.</pre><br />
:you can find this directory by executing the command {{Codeline|pkg list}}. If this file exists, its contents is also appended to the static {{Codeline|classpath}}. Note that the archives and class directories defined in this file will affect all users.<br />
<br />
Classes which are used only by a specific script should be placed in the dynamic {{Codeline|classpath}}. This portion of the {{Codeline|classpath}} can be modified at runtime using the {{Codeline|javaaddpath}} and {{Codeline|javarmpath}} functions. Example:<br />
octave> base_path = "C:/Octave/java_files";<br />
octave> % add two JARchives to the dynamic classpath<br />
octave> javaaddpath ([base_path, "/someclasses.jar"]);<br />
octave> javaaddpath ([base_path, "/moreclasses.jar"]);<br />
octave> % check the dynamic classpath<br />
octave> p = javaclasspath;<br />
octave> disp (p{1});<br />
C:/Octave/java_files/someclasses.jar<br />
octave> disp (p{2});<br />
C:/Octave/java_files/moreclasses.jar<br />
octave> % remove the first element from the classpath<br />
octave> javarmpath ([base_path, "/someclasses.jar"]);<br />
octave> p = javaclasspath;<br />
octave> disp (p{1});<br />
C:/Octave/java_files/moreclasses.jar<br />
octave> % provoke an error<br />
octave> disp (p{2});<br />
error: A(I): Index exceeds matrix dimension.<br />
<br />
Another way to add files to the dynamic {{Codeline|classpath}} exclusively for your user account is to use the file {{Path|.octaverc}} which is stored in your home directory. All Octave commands in this file are executed each time you start a new instance of Octave. The following example adds the directory {{Path|~/octave}} to Octave’s search path and the archive {{Path|myclasses.jar}} in this directory to the Java search path.<br />
<br />
{{File|octaverc|<pre><br />
addpath ("~/octave");<br />
javaaddpath ("~/octave/myclasses.jar");</pre>}}<br />
<br />
== How to create an instance of a Java class? ==<br />
If your code shall work under Octave as well as Matlab you should use the function {{Codeline|javaObject}} to create Java objects. The function {{Codeline|java_new}} is Octave specific and does not exist in the Matlab environment. Using [[Compatibility#Are_we_running_octave.3F|{{Codeline|is_octave()}}]] to distinguish between environments<br />
<br />
if (is_octave)<br />
Passenger = java_new ('package.FirstClass', row, seat); % works only in Octave <br />
else<br />
Passenger = javaObject ('package.FirstClass', row, seat); % actually works in both Octave and matlab<br />
end<br />
<br />
==How can I handle memory limitations?==<br />
In order to execute Java code Octave creates a Java Virtual Machine (JVM). Such a JVM allocates a fixed amount of initial memory and may expand this pool up to a fixed maximum memory limit. The default values depend on the Java version (see {{Codeline|help javamem}}). The memory pool is shared by all Java objects running in the JVM. This strict memory limit is intended mainly to avoid that runaway applications inside web browsers or in enterprise servers can consume all memory and crash the system. When the maximum memory limit is hit, Java code will throw exceptions so that applications will fail or behave unexpectedly.<br />
<br />
In Octave as well as in Matlab, you can specify options for the creation of the JVM inside a file named {{Path|java.opts}}. This is a text file where you can enter lines containing {{Codeline|-X}} and {{Codeline|-D}} options handed to the JVM during initialization.<br />
<br />
In Octave, the Java options file must be located in the directory where {{Path|javaclasspath.m}} resides, i.e. the package installation directory, usually something like {{Path|...\share\Octave\packages\java-1.2.8}}. You can find this directory by executing {{Codeline|pkg list}}.<br />
<br />
In Matlab, the options file goes into the {{Path|MATLABROOT/bin/ARCH}} directory or in your personal Matlab startup directory (can be determined by a {{Codeline|pwd}} command). MATLABROOT is the Matlab root directory and ARCH is your system architecture, which you find by issuing the commands {{Codeline|matlabroot}} respectively {{Codeline|computer('arch')}}.<br />
<br />
The {{Codeline|-X}} options allow you to increase the maximum amount of memory available to the JVM to 256 Megabytes by adding the following line to the {{java.opts}} file:{{File|java.opts|<pre>-Xmx256m</pre>}}<br />
<br />
The maximum possible amount of memory depends on your system. On a Windows system with 2 Gigabytes main memory you should be able to set this maximum to about 1 Gigabyte.<br />
<br />
If your application requires a large amount of memory from the beginning, you can also specify the initial amount of memory allocated to the JVM. Adding the following line to the {{Path|java.opts}} file starts the JVM with 64 Megabytes of initial memory:{{File|java.opts|<pre>-Xms64m</pre>}}<br />
<br />
For more details on the available {{Codeline|-X}} options of your Java Virtual Machine issue the command {{Codeline|java -X}} at the operating system command prompt and consult the Java documentation.<br />
<br />
The {{Codeline|-D}} options can be used to define system properties which can then be used by Java<br />
classes inside Octave. System properties can be retrieved by using the {{Codeline|getProperty()}}<br />
methods of the {{Codeline|java.lang.System}} class. The following example line defines the property<br />
{{Codeline|MyProperty}} and assigns it the string {{Codeline|12.34}}.<br />
-DMyProperty=12.34<br />
The value of this property can then be retrieved as a string by a Java object or in Octave:<br />
octave> javaMethod("java.lang.System", "getProperty", "MyProperty");<br />
ans = 12.34<br />
<br />
==How to install the java package in Octave?==<br />
===Uninstall the currently installed package java===<br />
Check whether the java package is already installed by issuing the {{Codeline|pkg list}} command:<br />
octave> pkg list<br />
Package Name | Version | Installation directory<br />
--------------+---------+-----------------------<br />
java *| 1.2.8 | /home/octavio/octave/java-1.2.8<br />
<br />
If the java package appears in the list you must uninstall it first by issuing the command<br />
octave> pkg uninstall java<br />
octave> pkg list<br />
<br />
Now the java package should not be listed anymore. If you have used the java package during the current session of Octave, you have to exit and restart Octave before you can uninstall the package. This is because the system keeps certain libraries in memory after they have been loaded once.<br />
<br />
===Make sure that the build environment is configured properly===<br />
The installation process requires that the environment variable {{Codeline|JAVA_HOME}} points to the Java Development Kit (JDK) on your computer.<br />
*Note that JDK is not equal to JRE (Java Runtime Environment). The JDK home directory contains subdirectories with include, library and executable files which are required to compile the java package. These files are not part of the JRE, so you definitely need the JDK.<br />
*Do not use backslashes but ordinary slashes in the path. Set the environment variable {{Codeline|JAVA_HOME}} according to your local JDK installation. Please adapt the path in the following examples according to the JDK installation on your system. If you are using a Windows system that might be:<br />
:<pre>octave> setenv ("JAVA_HOME", "C:/Program Files/Java/jdk1.6.0_33");</pre><br />
:On Linux systems the location of the Java JDK varies from distro to distro. It could look like:<br />
:<pre>octave> setenv ("JAVA_HOME", "/usr/local/jdk1.6.0_33");</pre><br />
:or maybe something like (on e.g., Mageia, Fedora and Ubuntu):<br />
:<pre>octave> setenv ("JAVA_HOME", "/usr/lib/jvm/java-1.7.0-openjdk.i386");</pre><br />
:If you are on Linux and can't find out what JAVA_HOME should look like, the following trick may help.<br />
:Start a shell and issue the command:<br />
: which javac<br />
:Usually this gives you a symlink, indicated by a "->", e.g., <br />
: lrwxrwxrwx 1 root root 21 Apr 28 21:00 /usr/bin/javac -> /etc/alternatives/javac*<br />
:Now just follow the targets (to the right of the "->") until you arrive at the real file:<br />
: ls -l /etc/alternatives/javac<br />
: lrwxrwxrwx 1 root root 44 Jul 17 23:41 /etc/alternatives/javac -> /usr/lib/jvm/java-1.7.0-openjdk.i386/bin/jar*<br />
: ls -l /usr/lib/jvm/java-1.7.0-openjdk.i386/bin/jar*<br />
: -rwxr-xr-x 1 root root 3832 Jun 23 03:11 /usr/lib/jvm/java-1.7.0-openjdk.i386/bin/javac*<br />
:(The "real file doesn't have "l" file attribute and is probably only writable by root.)<br />
:Once you get there, JAVA_HOME should be set to the full path of the executable excluding the "/bin/javac" part (i.e., "/usr/lib/jvm/java-1.7.0-openjdk.i386").<br />
:Note, that on all systems you must use the forward slash {{Codeline|/}} as the separator, not the backslash {{Codeline|\}}. If on a Windows system the environment variable {{Codeline|JAVA_HOME}} is already defined using the backslash, you can easily change this by issuing the following Octave command before starting the installation:<br />
:<pre>octave> setenv ("JAVA_HOME", strrep (getenv ("JAVA_HOME"), '\', '/'))</pre><br />
*The Java executables (especially the Java compiler, javac or javac.exe, and the Java archiver, jar or jar.exe) should be in the PATH. On Linux they're often symlinked to from /usr/bin (see above) but on Windows that is usually not the case. To that end, during installation of the Java package version 1.2.9+ a file "preinstall.m" is run; preinstall.m takes care of required settings, provided the JAVA_HOME environment variable has been set properly.<br />
:If you insist on manually adding the Java executables path to the Windows PATH, do as follows:<br />
:Check if by any chance the executables are in the PATH by issuing the Octave command:<br />
:<pre>octave> system ('javac -version 2> nul')</pre><br />
:If this returns zero you're OK. If it doesn't return zero (i.e., the command "javac -version" doesn't return normally), the command:<br />
:<pre>octave> setenv ("PATH", [ getenv("JAVA_HOME"), filesep, "bin", pathsep, getenv("PATH") ])</pre><br />
:should do the trick (watch out that 'getenv("PATH")' contains no spaces). Better don't fiddle with the Windows PATH through the Control Panel etc. The above procedure (the same that preinstall.m invokes) only adapts the PATH for the current Octave session and ensures that the Java executables you need are first in the PATH, before any others on your system. Again: provided you've setup JAVA_HOME correctly.<br />
<br />
===Compile and install the package in Octave===<br />
[[Octave_Forge#installing_packages|Install the package]] from octave-forge.<br />
<br />
Note:<br />
On Windows (MinGW) systems the Java package can be (slightly) miscompiled; until now errors have only been reported when using Java Swing stuff. To fix this, the following compiler flags have to be added:<br />
<br />
-Wl,--kill,-at to the $(MKOCTFILE) in the Makefile<br />
<br />
see:<br />
[http://sourceforge.net/mailarchive/forum.php?thread_name=CAB-99LuCbL3u0LmZuAmneRb_0G8pjmyha9viFpncMnjC%3DeBxJA%40mail.gmail.com&forum_name=octave-dev]<br />
(scroll down a bit for the relevant postings)<br />
<br />
<br />
On Linux 64-bit systems, the libjvm.so library is installed only in <JAVA_HOME>/jre/lib/<ARCH>/server/ because the <JAVA_HOME>/jre/lib/<ARCH>/client/ folder is not available. This will lead to runtime execution errors because the library is not found.<br />
<br />
There are two solutions:<br />
* install the java package as is and create a symlink named ''server'' in <JAVA_HOME>/jre/lib/<ARCH>/ pointing to the ''client'' folder;<br />
* patch the java package as indicated in [http://savannah.gnu.org/bugs/?39065 libjvm detection on 64 bit systems for java package] and then install it, no further modifications are needed<br />
<br />
===Test the java package installation===<br />
The following code creates a Java string object, which however is automatically converted<br />
to an Octave string:<br />
octave> s = javaObject ("java.lang.String", "Hello OctaveString")<br />
s = Hello OctaveString<br />
<br />
Note that the java package automatically transforms the Java String object to an Octave<br />
string. This means that you cannot apply Java String methods to the result.<br />
<br />
This "auto boxing" scheme seems to be implemented for the following Java classes:<br />
*{{Codeline|java.lang.Integer}}<br />
*{{Codeline|java.lang.Double}}<br />
*{{Codeline|java.lang.Boolean}}<br />
*{{Codeline|java.lang.String}}<br />
<br />
If you instead create an object for which no "auto-boxing" is implemented, {{Codeline|javaObject}}<br />
returns the genuine Java object:<br />
octave> v = javaObject ("java.util.Vector")<br />
v =<br />
<Java object: java.util.Vector><br />
octave> v.add(12);<br />
octave> v.get(0)<br />
ans = 12<br />
<br />
If you have created such a Java object, you can apply all methods of the Java class to<br />
the returned object. Note also that for some objects you must specify an initializer:<br />
% not:<br />
octave> d = javaObject ("java.lang.Double")<br />
error: [java] java.lang.NoSuchMethodException: java.lang.Double<br />
% but:<br />
octave> d = javaObject ("java.lang.Double", 12.34)<br />
d = 12.340<br />
<br />
==Which TEX symbols are implemented in the dialog functions?==<br />
The dialog functions contain a translation table for TEX like symbol codes. Thus messages<br />
and labels can be tailored to show some common mathematical symbols or Greek characters.<br />
No further TEX formatting codes are supported. The characters are translated to their<br />
Unicode equivalent. However, not all characters may be displayable on your system. This<br />
depends on the font used by the Java system on your computer.<br />
<br />
Each TEX symbol code must be terminated by a space character to make it distinguishable from<br />
the surrounding text. Therefore the string {{Codeline|\alpha &#61;12.0}} will produce the<br />
desired result, whereas {{Codeline|\alpha&#61;12.0}} would produce the literal text {{Codeline|\alpha&#61;12.0}}.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=2578IO package2013-04-27T19:26:10Z<p>83.163.225.168: /* Required support software */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
== ODS support ==<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
=== Files content ===<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
=== Required support software ===<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5 and 0.8.6+ work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 has been confirmed to work with odfdom 0.8.7 and earlier. odfdom-0.8.8 hasn't been tested with other xercesImpl.jar releases yet). Get them here:<br />
** http://incubator.apache.org/odftoolkit/downloads.html (contains odfdom-0.8.8)<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.2 (final) is the most recent one and recommended for Octave). jOpenDocument 1.3 beta 1 is supported in io-1.0.19 and is much faster with reading.<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the java classpath.<br />
<br />
=== Usage ===<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
=== Spreadsheet formula support ===<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
=== Gotchas ===<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
==== Date and time in ODS ====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year).<br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
==== Java memory pool allocation size ====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
==== Reading cells containing errors ====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc often contain invalid formulas but may have a 0 (null) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's (only with jOpenDocument 1.2b2, fixed in 1.2b3+ and 1.2 final):<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
* a valid range MUST be specified, I haven't found a way to discover the actual occupied rows and columns (jOpenDocument can give the physical ones (= capacity) but that doesn't help).<br />
<br />
NOT fixed in version 1.2 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
=== Matlab compatibility ===<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
=== Comparison of interfaces ===<br />
The ODFtoolkit is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The jOpenDocument interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
=== Development ===<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
=== ODFDOM versions ===<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on versions 0.8.7 and 0.8.8 have been tested too - these needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet.<br />
So at the moment (June 2012 = last I looked) only odfdom versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
== XLS support ==<br />
=== Files content ===<br />
* '''xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users.<br />
<br />
=== Required support software ===<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed. Note that 64-bit MS-Office has no support for COMx /ActiveX so you might have to resort to the Java interfaces below<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java jre or jdk > 1.6.0 (hasn't been tested with earlier versions)<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath.<br />
<br />
=== Usage ===<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen-xls2oct-xlsclose-parsecell''' and '''xlsopen-oct2xls-xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you either need MS-Excel 2007 for Windows (or later version) installed, and/or the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed.<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI. <br />
<br />
<br />
=== Spreadsheet formula support ===<br />
When using the COM, POI, JXL, and UNO interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL) nor OpenXLS (OXS). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL' or 'OXS', do not expect meaningful results when reading those files later on ,unless you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion.<br />
<br />
=== Matlab compatibility ===<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
** Octave can read either formula results (evaluated formulas) or the formula text strings; Matlab can't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
=== Comparison of interfaces & usage ===<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.8) not all Excel functions have been implemented. Obviously, as new functions are added in every new Excel release it's hard to catch up for Apache POI. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
=== Development ===<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=Projects&diff=1875Projects2012-08-30T15:20:00Z<p>83.163.225.168: /* Graphics */</p>
<hr />
<div>The list below summarizes features or bug fixes we would like to see in Octave. if you start working steadily on a project, please let octave-maintainers@octave.org know. We might have information that could help you. You should also read the [http://www.gnu.org/software/octave/doc/interpreter/Contributing-Guidelines.html#Contributing-Guidelines Contributing Guidelines chapter] in the [http://www.gnu.org/software/octave/docs.html Octave manual].<br />
<br />
This list is not exclusive -- there are many other things that might be good projects, but it might instead be something we already have. Also, some of the following items may not actually be considered good ideas now. So please check with octave-maintainers@octave.org before you start working on some large project.<br />
<br />
GSoC students, please see [[GSoC Project Ideas]].<br />
<br />
=Numerical=<br />
<br />
*Improve logm, and sqrtm (see this thread: http://octave.1599824.n4.nabble.com/matrix-functions-td3137935.html)<br />
<br />
*Improve complex mapper functions. See W. Kahan, ``Branch Cuts for Complex Elementary Functions, or Much Ado About Nothing's Sign Bit (in The State of the Art in Numerical Analysis, eds. Iserles and Powell, Clarendon Press, Oxford, 1987) for explicit trigonometric formulae.<br />
<br />
*Make functions like gamma() return the right IEEE Inf or NaN values for extreme args or other undefined cases.<br />
<br />
*Improve sqp.<br />
<br />
*Fix CollocWt? to handle Laguerre polynomials. Make it easy to extend it to other polynomial types.<br />
<br />
*Add optional arguments to colloc so that it's not restricted to Legendre polynomials.<br />
<br />
*Fix eig to also be able to solve the generalized eigenvalue problem, and to solve for eigenvalues and eigenvectors without performing a balancing step first.<br />
<br />
*Move rand, eye, xpow, xdiv, etc., functions to the matrix classes.<br />
<br />
*Use octave_allocator for memory management in Array classes once g++ supports static member templates.<br />
<br />
*Improve design of ODE, DAE, classes.<br />
<br />
*Make QR more memory efficient for large matrices when not all the columns of Q are required (apparently this is not handled by the lapack code yet).<br />
<br />
*Evaluate harmonics and cross-correlations of unevenly sampled and nonstationary time series, as in http://www.jstatsoft.org/v11/i02 (which has C code with interface to R).<br />
<br />
=GUI/IDE=<br />
<br />
*Søren Hauberg has suggested that we need C++ code that can:<br />
**Determine if a line of code could be fully parsed, i.e. it would return true for "plot (x, y);", but false for "while (true)".<br />
**Evaluate a line of code and return the output as a string (it would be best if it could provide three strings: output, warnings and errors).<br />
**Query defined variables, i.e. get a list of currently defined variables. Bonus points if it could tell you if anything had changed since the last time you checked the variables (could also be done with signals).<br />
*There is currently a GUI being developed, it's in [https://savannah.gnu.org/projects/octave/ savannah]. Further info can be found on the [http://octave-gsoc2012.blogspot.com GSoC 2012 GNU Octave GUI Development blog].<br />
<br />
=Sparse Matrices=<br />
<br />
*Improve QR factorization functions, using idea based on CSPARSE cs_dmsol.m<br />
<br />
*Improve QR fqctorization by replace CXSPARSE code with SPQR code, and make the linear solve return 2-norm solutions for ill-conditioned matrices based on this new code<br />
<br />
*Implement fourth argument to the sprand and sprandn, and addition arguments to sprandsym that the leading brand implements.<br />
<br />
*Sparse logical indexing in idx_vector class so that something like 'a=sprandn(1e6,1e6,1e-6); a(a<1) = 0' won't cause a memory overflow.<br />
<br />
*Other missing Functions<br />
**<strike>symmmd</strike> (Superseded by symamd)<br />
**<strike>colmmd</strike> (Superseded by colamd)<br />
**cholinc (or ichol)<br />
**<strike>bicg</strike> Moved into octave-core<br />
**<strike>gmres</strike>Moved into octave-core<br />
**lsqr<br />
**minres<br />
**qmr<br />
**symmlq<br />
<br />
=Strings=<br />
<br />
*Improve performance of string functions, particularly for searching and replacing.<br />
<br />
*Make find work for strings.<br />
<br />
*Consider making octave_print_internal() print some sort of text representation for unprintable characters instead of sending them directly to the terminal. (But don't do this for fprintf!)<br />
<br />
*Consider changing the default value of `string_fill_char' from SPC to NUL.<br />
<br />
=Other Data Types=<br />
<br />
*Template functions for mixed-type ops.<br />
<br />
*Convert other functions for use with the floating point type including quad, dasrt, daspk, etc.<br />
<br />
=Input/Output=<br />
<br />
*Make fread and fwrite work for complex data. Iostreams based versions of these functions would also be nice, and if you are working on them, it would be good to support other size specifications (integer*2, etc.).<br />
<br />
*Move some pr-output stuff to liboctave.<br />
<br />
*Make the cutoff point for changing to packed storage a user-preference variable with default value 8192.<br />
<br />
*Complain if there is not enough disk space available (I think there is simply not enough error checking in the code that handles writing data).<br />
<br />
*Make it possible to tie arbitrary input and output streams together, similar to the way iostreams can be tied together.<br />
<br />
=Interpreter=<br />
<br />
*Allow customization of the debug prompt.<br />
<br />
*Fix the parser so that<br />
<br />
if (expr) 'this is a string' end<br />
<br />
is parsed as IF expr STRING END.<br />
<br />
*Clean up functions in input.cc that handle user input (there currently seems to be some unnecessary duplication of code and it seems overly complex).<br />
<br />
*Consider allowing an arbitrary property list to be attached to any variable. This could be a more general way to handle the help string that can currently be added with `document'.<br />
<br />
*Allow more command line options to be accessible as built-in variables (--echo-commands, etc.).<br />
<br />
*Make the interpreter run faster.<br />
<br />
*Allow arbitrary lower bounds for array indexing.<br />
<br />
*Improve performance of recursive function calls.<br />
<br />
*Improve the way ignore_function_time_stamp works to allow selecting by individual directories or functions.<br />
<br />
*Add a command-line option to tell Octave to just do syntax checking and not execute statements.<br />
<br />
*Clean up symtab and variable stuff.<br />
<br />
*Input stream class for parser files -- must manage buffers for flex and context for global variable settings.<br />
<br />
*make parser do more semantic checking, continue after errors when compiling functions, etc.<br />
<br />
*Make LEXICAL_ERROR have a value that is the error message for parse_error() to print?<br />
<br />
*Add a run-time alias mechanism that would allow things like alias fun function_with_a_very_long_name so that `function_with_a_very_long_name' could be invoked as `fun'.<br />
<br />
*Allow local changes to variables to be written more compactly than is currently possible with unwind_protect. For example, <br />
<br />
function f ()<br />
local prefer_column_vectors = something;<br />
...<br />
endfunction<br />
<br />
<br />
would be equivalent to<br />
<br />
function f ()<br />
save_prefer_column_vectors = prefer_column_vectors;<br />
unwind_protect<br />
prefer_column_vectors = something;<br />
...<br />
unwind_protect_cleanup<br />
prefer_column_vectors = save_prefer_column_vectors;<br />
end_unwind_protect<br />
endfunction<br />
<br />
<br />
*Fix all function files to check for bogus inputs (wrong number or types of input arguments, wrong number of output arguments).<br />
<br />
*Handle options for built-in functions more consistently.<br />
<br />
*Too much time is spent allocating and freeing memory. What can be done to improve performance?<br />
<br />
*Error output from Fortran code is ugly. Something should be done to make it look better.<br />
<br />
*It would be nice if output from the Fortran routines could be passed through the pager.<br />
<br />
*Attempt to recognize common subexpressions in the parser.<br />
<br />
*Consider making it possible to specify an empty matrix with a syntax like [](e1, e2). Of course at least one of the expressions must be zero...<br />
<br />
*Is Matrix::fortran_vec() really necessary?<br />
<br />
*Rewrite whos and the symbol_record_info class. Write a built-in function that gives all the basic information, then write who and whos as M-files.<br />
<br />
*On systems that support matherr(), make it possible for users to enable the printing of warning messages.<br />
<br />
*Make it possible to mark variables and functions as read-only.<br />
<br />
*Make it possible to write a function that gets a reference to a matrix in memory and change one or more elements without generating a second copy of the data.<br />
<br />
*Use nanosleep instead of usleep if it is available? Apparently nanosleep is to be preferred over usleep on Solaris systems.<br />
<br />
*<strike>Per the following discussion, allow bsxfun style singleton dimension expansion as the default behavior for the builtin element-wise operators: http://octave.1599824.n4.nabble.com/Vector-approach-to-row-margin-frequencies-tp1636361p1636367.html</strike> This is done. <strike>Now [[User:JordiGH|I]] just have to document it.</strike> This is done too!<br />
<br />
* Start the development of classdef (already underway)<br />
<br />
=Graphics=<br />
<br />
*Correctly handle case where DISPLAY is unset. Provide --no-window-system or --nodisplay (?) option. Provide --display=DISPLAY option? How will this work with gnuplot (i.e., how do we know whether gnuplot requires an X display to display graphics)?<br />
<br />
* Implement transparency and lighting in OpenGL backend(s). A basic implementation was available in [http://octave.svn.sourceforge.net/viewvc/octave/trunk/octave-forge/extra/jhandles/ JHandles]. This needs to be ported/re-implement/re-engineered/optimized in the C++ OpenGL renderer of octave.<br />
<br />
* Implement a Cairo-based renderer for 2D-only graphics, with support for PS/PDF/SVG output (for printing).<br />
<br />
* On 'imagesc' plots, report the matrix values also based on the mouse position, updating on mouse moving.<br />
<br />
* Create a "getframe" function that receives a a graphics handle and returns a 3D matrix from the graphics window associated with that handle.<br />
<br />
=History=<br />
<br />
*Add an option to allow saving input from script files in the history list.<br />
<br />
*The history command should accept two numeric arguments to indicate a range of history entries to display, save or read.<br />
<br />
*Avoid writing the history file if the history list has not changed.<br />
<br />
*Avoid permission errors if the history file cannot be opened for writing.<br />
<br />
*Fix history problems — core dump if multiple processes are writing to the same history file?<br />
<br />
=Configuration and Installation=<br />
<br />
*Split config.h into a part for Octave-specific configuration things (this part can be installed) and the generic HAVE_X type of configure information that should not be installed.<br />
<br />
*Makefile changes:<br />
**eliminate for loops<br />
**define shell commands or eliminate them<br />
**consolidate targets<br />
<br />
*Make it possible to configure so that installed binaries and shared libraries are stripped.<br />
<br />
*Create a docs-only distribution?<br />
<br />
*Better binary packaging and distribution, especially on Windows.<br />
<br />
*Octave Emacs mode needs maintenance.<br />
<br />
=Documentation and On-Line Help=<br />
<br />
*Document new features.<br />
<br />
*Improve the Texinfo Documentation for the interpreter. It would be useful to have lots more examples, to not have so many forward references, and to not have very many simple lists of functions.<br />
<br />
*The docs should mention something about efficiency and that using array operations is almost always a good idea for speed.<br />
<br />
*Doxygen documentation for the C++ classes.<br />
<br />
*Make index entries more consistent to improve behavior of `help -i'.<br />
<br />
*Make `help -i' try to find a whole word match first.<br />
<br />
*Clean up help stuff.<br />
<br />
*Demo files.<br />
<br />
*Document C++ sources, to make it easier for newcomers to get into writing code.<br />
<br />
*Flesh out this wiki<br />
<br />
=Tests=<br />
*Improved set of tests:<br />
**Tests for various functions. Would be nice to have a test file corresponding to every function.<br />
**Tests for element by element operators: + - .* ./ .\ .^ | & < <= == >= > != !<br />
**Tests for boolean operators: && ||<br />
**Tests for other operators: * / \ ' .'<br />
**Tests from bug reports.<br />
**Tests for indexed assignment. Need to consider the following:<br />
***fortran-style indexing<br />
***zero-one indexing<br />
***assignment of empty matrix as well as values resizing<br />
**Tests for all internal functions.<br />
<br />
=Programming=<br />
<br />
*Add support for listeners (addlistener, dellistener, etc) on the C++ side.<br />
<br />
*C++ namespace for Octave library functions.<br />
<br />
*Better error messages for missing operators?<br />
<br />
*Eliminate duplicate enums in pt-exp.cc, pt-const.cc, and ov.cc.<br />
<br />
*Handle octave_print_internal() stuff at the liboctave level. Then the octave_value classes could just call on the print() methods for the underlying classes.<br />
<br />
*As much as possible, eliminate explicit checks for the types of octave_value objects so that user-defined types will automatically do the right thing in more cases.<br />
<br />
*Only include config.h in files that actually need it, instead of including it in every .cc file. Unfortunately, this might not be so easy to figure out.<br />
<br />
*GNU coding standards:<br />
**Add a `Makefile' target to the Makefiles.<br />
**Comments on #else and #endif preprocessor commands.<br />
**Change error message format to match standards everywhere.<br />
<br />
*Eliminate more global variables.<br />
<br />
*Move procstream to liboctave.<br />
<br />
*Use references and classes in more places.<br />
<br />
*Share more code among the various _options functions.<br />
<br />
=Miscellaneous=<br />
<br />
*Implement some functions for interprocess communication: bind, accept, connect, gethostbyname, etc.<br />
<br />
*The ability to transparently handle very large files: Juhana K Kouhia <kouhia@nic.funet.fi> wrote:<br />
*: If I have a one-dimensional signal data with the size 400 Mbytes, then what are my choices to operate with it:<br />
*:*I have to split the data<br />
*:*Octave has a virtual memory on its own and I don't have to worry about the splitting.<br />
*:If I split the data, then my easily programmed processing programs will become hard to program.<br />
*:If possible, I would like to have the virtual memory system in Octave i.e., the all big files, the user see as one big array or such. There could be several user selectable models to do the virtual memory depending on what kind of data the user have (1d, 2d) and in what order they are processed (stream or random access).<br />
<br />
Perhaps this can be done entirely with a library of M-files.<br />
<br />
*An interface to gdb. Michael Smolsky <fnsiguc@weizmann.weizmann.ac.il> wrote:<br />
*:I was thinking about a tool, which could be very useful for me in my numerical simulation work. It is an interconnection between gdb and octave. We are often managing very large arrays of data in our fortran or c codes, which might be studied with the help of octave at the algorithm development stages. Assume you're coding, say, wave equation. And want to debug the code. It would be great to pick some array from the memory of the code you're developing, fft it and see the image as a log-log plot of the spectral density. I'm facing similar problems now. To avoid high c-development cost, I develop in matlab/octave, and then rewrite into c. It might be so much easier, if I could off-load a c array right from the debugger into octave, study it, and, perhaps, change some [many] values with a convenient matlab/octave syntax, similar to <code>a(:,50:250)=zeros(100,200)</code>, and then store it back into the memory of my c code.<br />
<br />
*Add a definition to lgrind so that it supports Octave. (See http://www.tex.ac.uk/tex-archive/support/lgrind/ for more information about lgrind.)<br />
<br />
*Make the website prettier. Maybe a new design, maybe a more "corporate" design (if we're heading down the "paid support for Octave" path.<br />
<br />
*Agora -- website for rapid collaboration related to GNU Octave. Talk to [[User:JordiGH|Jordi]]<br />
<br />
*Move [http://octave.sourceforge.net/ Octave-Forge] to [http://savannah.gnu.org/projects/octave/ Savannah] so everything is hosted in the same place.<br />
<br />
=Performance=<br />
<br />
*A profiler for Octave would be a very useful tool. And now we have one! But it really needs a better interface.<br />
<br />
=Packaging=<br />
<br />
* create a system that allows packages to deprecate functions as in core. Possibilities are:<br />
** get pkg to accept a deprecated directory inside the package and add it to the search path. Functions in those directories would have to be treated the same as the ones inside the core deprecated<br />
** PKG_ADD can be used to hack this. Package developers would still have to actually write the warnings on the function code but this would allow to have the functions in a separate directory so they don't foget to remove them on the next release<br />
** the package developer can also use something like Make to create a ''normal'' package from something that actually had a more complex structure, inclusive deprecated directories<br />
* get pkg to resolve dependencies automatically by downloading and installing them too<br />
* allow to download and install multiple versions of the same package<br />
* make the package just a bit more verbose by default<br />
* make pkg a little more like apt-get<br />
* make pkg support more than one src directory<br />
<br />
=Always=<br />
<br />
*Squash bugs.</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=Java_package&diff=1604Java package2012-07-20T20:10:21Z<p>83.163.225.168: /* Make sure that the build environment is configured properly */</p>
<hr />
<div>Octave is an easy to use but powerful environment for mathematical calculations, which can easily be extended by packages. Its features are close to the commercial tool Matlab so that it can often be used as a replacement. Java on the other hand offers a rich, object oriented and platform independent environment for many applications. The core Java classes can be easily extended by many freely available libraries. This document refers to the package <code>java</code>, which is part of the GNU Octave project. This package allows you to access Java classes from inside Octave. Thus it is possible to use existing class files or complete Java libraries directly from Octave.<br />
<br />
This description is based on the Octave package {{Codeline|java-1.2.8}}. The {{Forge|java}} package usually installs its script files (.m) in the directory {{Path|.../share/Octave/packages/java-1.2.8}} and its binary (.oct) files in {{Path|.../libexec/Octave/packages/java-1.2.8}}. You can get help on specific functions in Octave by executing the help command<br />
with the name of a function from this package:<br />
octave> help javaObject<br />
<br />
You can view the whole doc file in Octave by executing the info command with just the word java:<br />
octave> doc java<br />
<br />
Note on calling Octave from Java: the java package is designed for calling Java from Octave. If you want to call Octave from Java, you might want to use a library like [http://kenai.com/projects/javaOctave javaOctave] or [http://jopas.sourceforge.net joPas]. <br />
<br />
=FAQ=<br />
==How to distinguish between Octave and Matlab?==<br />
Octave and Matlab are very similar, but handle Java slightly different. Therefore it may be necessary to [[Compatibility#Are_we_running_octave.3F|detect the environment]] and use the appropriate functions.<br />
<br />
==How to make Java classes available to Octave?==<br />
Java finds classes by searching a {{Codeline|classpath}}. This is a list of Java archive files and/or directories containing class files. In Octave and Matlab the {{Codeline|classpath}} is composed of two parts:<br />
*the static {{Codeline|classpath}} is initialized once at startup of the JVM, and;<br />
*the dynamic {{Codeline|classpath}} which can be modified at runtime.<br />
<br />
Octave searches the static {{Codeline|classpath}} first, then the dynamic {{Codeline|classpath}}. Classes appearing in the static as well as in the dynamic {{Codeline|classpath}} will therefore be found in the static {{Codeline|classpath}} and loaded from this location.<br />
<br />
Classes which shall be used regularly or must be available to all users should be added to the static {{Codeline|classpath}}. The static {{Codeline|classpath}} is populated once from the contents of a plain text file named {{Path|classpath.txt}} when the Java Virtual Machine starts. This file contains one line for each individual {{Codeline|classpath}} to be added to the static {{Codeline|classpath}}. These lines can identify single class files, directories containing class files or Java archives with complete class file hierarchies. Comment lines starting with a {{Codeline|#}} or a {{Codeline|%}} character are ignored.<br />
<br />
The search rules for the file {{Path|classpath.txt}} are:<br />
*First, Octave searches for the file {{Path|classpath.txt}} in your home directory, If such a file is found, it is read and defines the initial static {{Codeline|classpath}}. Thus it is possible to build an initial static {{Codeline|classpath}} on a "per user" basis.<br />
*Next, Octave looks for another file {{Path|classpath.txt}} in the package installation directory. This is where {{Path|javaclasspath.m}} resides, usually something like<br />
:<pre>...\share\Octave\packages\java-1.2.8.</pre><br />
:you can find this directory by executing the command {{Codeline|pkg list}}. If this file exists, its contents is also appended to the static {{Codeline|classpath}}. Note that the archives and class directories defined in this file will affect all users.<br />
<br />
Classes which are used only by a specific script should be placed in the dynamic {{Codeline|classpath}}. This portion of the {{Codeline|classpath}} can be modified at runtime using the {{Codeline|javaaddpath}} and {{Codeline|javarmpath}} functions. Example:<br />
octave> base_path = "C:/Octave/java_files";<br />
octave> % add two JARchives to the dynamic classpath<br />
octave> javaaddpath ([base_path, "/someclasses.jar"]);<br />
octave> javaaddpath ([base_path, "/moreclasses.jar"]);<br />
octave> % check the dynamic classpath<br />
octave> p = javaclasspath;<br />
octave> disp (p{1});<br />
C:/Octave/java_files/someclasses.jar<br />
octave> disp (p{2});<br />
C:/Octave/java_files/moreclasses.jar<br />
octave> % remove the first element from the classpath<br />
octave> javarmpath ([base_path, "/someclasses.jar"]);<br />
octave> p = javaclasspath;<br />
octave> disp (p{1});<br />
C:/Octave/java_files/moreclasses.jar<br />
octave> % provoke an error<br />
octave> disp (p{2});<br />
error: A(I): Index exceeds matrix dimension.<br />
<br />
Another way to add files to the dynamic {{Codeline|classpath}} exclusively for your user account is to use the file {{Path|.octaverc}} which is stored in your home directory. All Octave commands in this file are executed each time you start a new instance of Octave. The following example adds the directory {{Path|~/octave}} to Octave’s search path and the archive {{Path|myclasses.jar}} in this directory to the Java search path.<br />
<br />
{{File|octaverc|<pre><br />
addpath ("~/octave");<br />
javaaddpath ("~/octave/myclasses.jar");</pre>}}<br />
<br />
== How to create an instance of a Java class? ==<br />
If your code shall work under Octave as well as Matlab you should use the function {{Codeline|javaObject}} to create Java objects. The function {{Codeline|java_new}} is Octave specific and does not exist in the Matlab environment. Using [[Compatibility#Are_we_running_octave.3F|{{Codeline|is_octave()}}]] to distinguish between environments<br />
<br />
if (is_octave)<br />
Passenger = java_new ('package.FirstClass', row, seat); % works only in Octave <br />
else<br />
Passenger = javaObject ('package.FirstClass', row, seat); % actually works in both Octave and matlab<br />
end<br />
<br />
==How can I handle memory limitations?==<br />
In order to execute Java code Octave creates a Java Virtual Machine (JVM). Such a JVM allocates a fixed amount of initial memory and may expand this pool up to a fixed maximum memory limit. The default values depend on the Java version (see {{Codeline|help javamem}}). The memory pool is shared by all Java objects running in the JVM. This strict memory limit is intended mainly to avoid that runaway applications inside web browsers or in enterprise servers can consume all memory and crash the system. When the maximum memory limit is hit, Java code will throw exceptions so that applications will fail or behave unexpectedly.<br />
<br />
In Octave as well as in Matlab, you can specify options for the creation of the JVM inside a file named {{Path|java.opts}}. This is a text file where you can enter lines containing {{Codeline|-X}} and {{Codeline|-D}} options handed to the JVM during initialization.<br />
<br />
In Octave, the Java options file must be located in the directory where {{Path|javaclasspath.m}} resides, i.e. the package installation directory, usually something like {{Path|...\share\Octave\packages\java-1.2.8}}. You can find this directory by executing {{Codeline|pkg list}}.<br />
<br />
In Matlab, the options file goes into the {{Path|MATLABROOT/bin/ARCH}} directory or in your personal Matlab startup directory (can be determined by a {{Codeline|pwd}} command). MATLABROOT is the Matlab root directory and ARCH is your system architecture, which you find by issuing the commands {{Codeline|matlabroot}} respectively {{Codeline|computer('arch')}}.<br />
<br />
The {{Codeline|-X}} options allow you to increase the maximum amount of memory available to the JVM to 256 Megabytes by adding the following line to the {{java.opts}} file:{{File|java.opts|<pre>-Xmx256m</pre>}}<br />
<br />
The maximum possible amount of memory depends on your system. On a Windows system with 2 Gigabytes main memory you should be able to set this maximum to about 1 Gigabyte.<br />
<br />
If your application requires a large amount of memory from the beginning, you can also specify the initial amount of memory allocated to the JVM. Adding the following line to the {{Path|java.opts}} file starts the JVM with 64 Megabytes of initial memory:{{File|java.opts|<pre>-Xms64m</pre>}}<br />
<br />
For more details on the available {{Codeline|-X}} options of your Java Virtual Machine issue the command {{Codeline|java -X}} at the operating system command prompt and consult the Java documentation.<br />
<br />
The {{Codeline|-D}} options can be used to define system properties which can then be used by Java<br />
classes inside Octave. System properties can be retrieved by using the {{Codeline|getProperty()}}<br />
methods of the {{Codeline|java.lang.System}} class. The following example line defines the property<br />
{{Codeline|MyProperty}} and assigns it the string {{Codeline|12.34}}.<br />
-DMyProperty=12.34<br />
The value of this property can then be retrieved as a string by a Java object or in Octave:<br />
octave> javaMethod("java.lang.System", "getProperty", "MyProperty");<br />
ans = 12.34<br />
<br />
==How to install the java package in Octave?==<br />
===Uninstall the currently installed package java===<br />
Check whether the java package is already installed by issuing the {{Codeline|pkg list}} command:<br />
octave> pkg list<br />
Package Name | Version | Installation directory<br />
--------------+---------+-----------------------<br />
java *| 1.2.8 | /home/octavio/octave/java-1.2.8<br />
<br />
If the java package appears in the list you must uninstall it first by issuing the command<br />
octave> pkg uninstall java<br />
octave> pkg list<br />
<br />
Now the java package should not be listed anymore. If you have used the java package during the current session of Octave, you have to exit and restart Octave before you can uninstall the package. This is because the system keeps certain libraries in memory after they have been loaded once.<br />
<br />
===Make sure that the build environment is configured properly===<br />
The installation process requires that the environment variable {{Codeline|JAVA_HOME}} points to the Java Development Kit (JDK) on your computer.<br />
*Note that JDK is not equal to JRE (Java Runtime Environment). The JDK home directory contains subdirectories with include, library and executable files which are required to compile the java package. These files are not part of the JRE, so you definitely need the JDK.<br />
*Do not use backslashes but ordinary slashes in the path. Set the environment variable {{Codeline|JAVA_HOME}} according to your local JDK installation. Please adapt the path in the following examples according to the JDK installation on your system. If you are using a Windows system that might be:<br />
:<pre>octave> setenv ("JAVA_HOME", "C:/Java/jdk1.6.0_33");</pre><br />
:If you are using a Linux system this would look probably more like:<br />
:<pre>octave> setenv ("JAVA_HOME", "/usr/local/jdk1.6.0_33");</pre><br />
:Note, that on all systems you must use the forward slash {{Codeline|/}} as the separator, not the backslash {{Codeline|\}}. If on a Windows system the environment variable {{Codeline|JAVA_HOME}} is already defined using the backslash, you can easily change this by issuing the following Octave command before starting the installation:<br />
:<pre>octave> setenv ("JAVA_HOME", strrep (getenv ("JAVA_HOME"), '\', '/'))</pre><br />
*The Java executables (especially the Java compiler, javac) should be in the PATH. On Linux they're usually symlinked to from /usr/bin but on Windows that is usually not the case. If the Octave command:<br />
:<pre>octave> system ('javac -version 2> nul')</pre><br />
:doesn't return zero (i.e., the command "javac -version" doesn't return normally), the command:<br />
:<pre>octave> setenv ("PATH", [ JAVA_HOME filesep bin pathsep getenv("PATH") ])</pre><br />
:should do the trick. Watch out that 'getenv("PATH")' contains no spaces.<br />
<br />
===Compile and install the package in Octave===<br />
[[Octave_Forge#installing_packages|Install the package]] from octave-forge.<br />
<br />
Note:<br />
On Windows (MinGW) systems the Java package can be (slightly) miscompiled; until now errors have only been reported when using Java Swing stuff. To fix this, the following compiler flags have to be added:<br />
<br />
-Wl,--kill,-at to the $(MKOCTFILE) in the Makefile<br />
<br />
see:<br />
[http://sourceforge.net/mailarchive/forum.php?thread_name=CAB-99LuCbL3u0LmZuAmneRb_0G8pjmyha9viFpncMnjC%3DeBxJA%40mail.gmail.com&forum_name=octave-dev]<br />
(scroll down a bit for the relevant postings)<br />
<br />
<br />
On 64-bit systems, some files needed for compilation might reside in <JAVA_HOME>/jre/bin/server/ rather than in <JAVA_HOME>/jre/bin/client/ <br />
<br />
This will lead to build errors as the install script can't find needed files where it expects them to be.<br />
<br />
To fix this, a suitale symlink to client/ should suffice.<br />
<br />
===Test the java package installation===<br />
The following code creates a Java string object, which however is automatically converted<br />
to an Octave string:<br />
octave> s = javaObject ("java.lang.String", "Hello OctaveString")<br />
s = Hello OctaveString<br />
<br />
Note that the java package automatically transforms the Java String object to an Octave<br />
string. This means that you cannot apply Java String methods to the result.<br />
<br />
This "auto boxing" scheme seems to be implemented for the following Java classes:<br />
*{{Codeline|java.lang.Integer}}<br />
*{{Codeline|java.lang.Double}}<br />
*{{Codeline|java.lang.Boolean}}<br />
*{{Codeline|java.lang.String}}<br />
<br />
If you instead create an object for which no "auto-boxing" is implemented, {{Codeline|javaObject}}<br />
returns the genuine Java object:<br />
octave> v = javaObject ("java.util.Vector")<br />
v =<br />
<Java object: java.util.Vector><br />
octave> v.add(12);<br />
octave> v.get(0)<br />
ans = 12<br />
<br />
If you have created such a Java object, you can apply all methods of the Java class to<br />
the returned object. Note also that for some objects you must specify an initializer:<br />
% not:<br />
octave> d = javaObject ("java.lang.Double")<br />
error: [java] java.lang.NoSuchMethodException: java.lang.Double<br />
% but:<br />
octave> d = javaObject ("java.lang.Double", 12.34)<br />
d = 12.340<br />
<br />
==Which TEX symbols are implemented in the dialog functions?==<br />
The dialog functions contain a translation table for TEX like symbol codes. Thus messages<br />
and labels can be tailored to show some common mathematical symbols or Greek characters.<br />
No further TEX formatting codes are supported. The characters are translated to their<br />
Unicode equivalent. However, not all characters may be displayable on your system. This<br />
depends on the font used by the Java system on your computer.<br />
<br />
Each TEX symbol code must be terminated by a space character to make it distinguishable from<br />
the surrounding text. Therefore the string {{Codeline|\alpha &#61;12.0}} will produce the<br />
desired result, whereas {{Codeline|\alpha&#61;12.0}} would produce the literal text {{Codeline|\alpha&#61;12.0}}.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=1208IO package2012-06-15T19:55:43Z<p>83.163.225.168: /* Comparison of interfaces & usage */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
== ODS support ==<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
=== Files content ===<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
=== Required support software ===<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5 and 0.8.6+ work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 has been confirmed to work with odfdom 0.8.7 and earlier. odfdom-0.8.8 hasn't been tested with other xercesImpl.jar releases yet). Get them here:<br />
** http://incubator.apache.org/odftoolkit/downloads.html (contains odfdom-0.8.8)<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.2 (final) is the most recent one and recommended for Octave). jOpenDocument 1.3 beta 1 is supported in io-1.0.19 and is much faster with reading.<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the java classpath.<br />
<br />
=== Usage ===<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
=== Spreadsheet formula support ===<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
=== Gotchas ===<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
==== Date and time in ODS ====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year).<br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
==== Java memory pool allocation size ====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
==== Reading cells containing errors ====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc often contain invalid formulas but may have a 0 (null) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's (only with jOpenDocument 1.2b2, fixed in 1.2b3+ and 1.2 final):<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
* a valid range MUST be specified, I haven't found a way to discover the actual occupied rows and columns (jOpenDocument can give the physical ones (= capacity) but that doesn't help).<br />
<br />
NOT fixed in version 1.2 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
=== Matlab compatibility ===<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
=== Comparison of interfaces ===<br />
The ODFtoolkit is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The jOpenDocument interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
=== Development ===<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
=== ODFDOM versions ===<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on versions 0.8.7 and 0.8.8 have been tested too - these needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet.<br />
So at the moment (June 2012 = last I looked) only odfdom versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
== XLS support ==<br />
=== Files content ===<br />
* '''xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users.<br />
<br />
=== Required support software ===<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java jre or jdk > 1.6.0 (hasn't been tested with earlier versions)<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath. <br />
<br />
=== Usage ===<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen-xls2oct-xlsclose-parsecell''' and '''xlsopen-oct2xls-xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you either need MS-Excel 2007 for Windows (or later version) installed, and/or the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed.<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI. <br />
<br />
<br />
=== Spreadsheet formula support ===<br />
When using the COM, POI, JXL, and UNO interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL) nor OpenXLS (OXS). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL' or 'OXS', do not expect meaningful results when reading those files later on ,unless you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion.<br />
<br />
=== Matlab compatibility ===<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
** Octave can read either formula results (evaluated formulas) or the formula text strings; Matlab can't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
=== Comparison of interfaces & usage ===<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.8) not all Excel functions have been implemented. Obviously, as new functions are added in every new Excel release it's hard to catch up for Apache POI. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
=== Development ===<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=1207IO package2012-06-15T19:52:56Z<p>83.163.225.168: /* Matlab compatibility */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
== ODS support ==<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
=== Files content ===<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
=== Required support software ===<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5 and 0.8.6+ work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 has been confirmed to work with odfdom 0.8.7 and earlier. odfdom-0.8.8 hasn't been tested with other xercesImpl.jar releases yet). Get them here:<br />
** http://incubator.apache.org/odftoolkit/downloads.html (contains odfdom-0.8.8)<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.2 (final) is the most recent one and recommended for Octave). jOpenDocument 1.3 beta 1 is supported in io-1.0.19 and is much faster with reading.<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the java classpath.<br />
<br />
=== Usage ===<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
=== Spreadsheet formula support ===<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
=== Gotchas ===<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
==== Date and time in ODS ====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year).<br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
==== Java memory pool allocation size ====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
==== Reading cells containing errors ====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc often contain invalid formulas but may have a 0 (null) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's (only with jOpenDocument 1.2b2, fixed in 1.2b3+ and 1.2 final):<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
* a valid range MUST be specified, I haven't found a way to discover the actual occupied rows and columns (jOpenDocument can give the physical ones (= capacity) but that doesn't help).<br />
<br />
NOT fixed in version 1.2 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
=== Matlab compatibility ===<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
=== Comparison of interfaces ===<br />
The ODFtoolkit is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The jOpenDocument interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
=== Development ===<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
=== ODFDOM versions ===<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on versions 0.8.7 and 0.8.8 have been tested too - these needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet.<br />
So at the moment (June 2012 = last I looked) only odfdom versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
== XLS support ==<br />
=== Files content ===<br />
* '''xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users.<br />
<br />
=== Required support software ===<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java jre or jdk > 1.6.0 (hasn't been tested with earlier versions)<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath. <br />
<br />
=== Usage ===<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen-xls2oct-xlsclose-parsecell''' and '''xlsopen-oct2xls-xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you either need MS-Excel 2007 for Windows (or later version) installed, and/or the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed.<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI. <br />
<br />
<br />
=== Spreadsheet formula support ===<br />
When using the COM, POI, JXL, and UNO interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL) nor OpenXLS (OXS). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL' or 'OXS', do not expect meaningful results when reading those files later on ,unless you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion.<br />
<br />
=== Matlab compatibility ===<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
** Octave can read either formula results (evaluated formulas) or the formula text strings; Matlab can't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
=== Comparison of interfaces & usage ===<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.7) not all Excel functions have been implemented. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
=== Development ===<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=1206IO package2012-06-15T19:49:54Z<p>83.163.225.168: /* Spreadsheet formula support */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
== ODS support ==<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
=== Files content ===<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
=== Required support software ===<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5 and 0.8.6+ work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 has been confirmed to work with odfdom 0.8.7 and earlier. odfdom-0.8.8 hasn't been tested with other xercesImpl.jar releases yet). Get them here:<br />
** http://incubator.apache.org/odftoolkit/downloads.html (contains odfdom-0.8.8)<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.2 (final) is the most recent one and recommended for Octave). jOpenDocument 1.3 beta 1 is supported in io-1.0.19 and is much faster with reading.<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the java classpath.<br />
<br />
=== Usage ===<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
=== Spreadsheet formula support ===<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
=== Gotchas ===<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
==== Date and time in ODS ====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year).<br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
==== Java memory pool allocation size ====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
==== Reading cells containing errors ====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc often contain invalid formulas but may have a 0 (null) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's (only with jOpenDocument 1.2b2, fixed in 1.2b3+ and 1.2 final):<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
* a valid range MUST be specified, I haven't found a way to discover the actual occupied rows and columns (jOpenDocument can give the physical ones (= capacity) but that doesn't help).<br />
<br />
NOT fixed in version 1.2 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
=== Matlab compatibility ===<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
=== Comparison of interfaces ===<br />
The ODFtoolkit is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The jOpenDocument interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
=== Development ===<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
=== ODFDOM versions ===<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on versions 0.8.7 and 0.8.8 have been tested too - these needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet.<br />
So at the moment (June 2012 = last I looked) only odfdom versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
== XLS support ==<br />
=== Files content ===<br />
* '''xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users.<br />
<br />
=== Required support software ===<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java jre or jdk > 1.6.0 (hasn't been tested with earlier versions)<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath. <br />
<br />
=== Usage ===<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen-xls2oct-xlsclose-parsecell''' and '''xlsopen-oct2xls-xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you either need MS-Excel 2007 for Windows (or later version) installed, and/or the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed.<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI. <br />
<br />
<br />
=== Spreadsheet formula support ===<br />
When using the COM, POI, JXL, and UNO interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL) nor OpenXLS (OXS). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL' or 'OXS', do not expect meaningful results when reading those files later on ,unless you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion.<br />
<br />
=== Matlab compatibility ===<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below. When using the Java interfaces octave supplies some formula manipulation support.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
=== Comparison of interfaces & usage ===<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.7) not all Excel functions have been implemented. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
=== Development ===<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=1205IO package2012-06-15T19:46:37Z<p>83.163.225.168: /* ODFDOM versions */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
== ODS support ==<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
=== Files content ===<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
=== Required support software ===<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5 and 0.8.6+ work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 has been confirmed to work with odfdom 0.8.7 and earlier. odfdom-0.8.8 hasn't been tested with other xercesImpl.jar releases yet). Get them here:<br />
** http://incubator.apache.org/odftoolkit/downloads.html (contains odfdom-0.8.8)<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.2 (final) is the most recent one and recommended for Octave). jOpenDocument 1.3 beta 1 is supported in io-1.0.19 and is much faster with reading.<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the java classpath.<br />
<br />
=== Usage ===<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
=== Spreadsheet formula support ===<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
=== Gotchas ===<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
==== Date and time in ODS ====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year).<br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
==== Java memory pool allocation size ====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
==== Reading cells containing errors ====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc often contain invalid formulas but may have a 0 (null) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's (only with jOpenDocument 1.2b2, fixed in 1.2b3+ and 1.2 final):<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
* a valid range MUST be specified, I haven't found a way to discover the actual occupied rows and columns (jOpenDocument can give the physical ones (= capacity) but that doesn't help).<br />
<br />
NOT fixed in version 1.2 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
=== Matlab compatibility ===<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
=== Comparison of interfaces ===<br />
The ODFtoolkit is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The jOpenDocument interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
=== Development ===<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
=== ODFDOM versions ===<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on versions 0.8.7 and 0.8.8 have been tested too - these needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet.<br />
So at the moment (June 2012 = last I looked) only odfdom versions 0.7.5, 0.8.6, 0.8.7 and 0.8.8 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
== XLS support ==<br />
=== Files content ===<br />
* '''xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users.<br />
<br />
=== Required support software ===<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java jre or jdk > 1.6.0 (hasn't been tested with earlier versions)<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath. <br />
<br />
=== Usage ===<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen-xls2oct-xlsclose-parsecell''' and '''xlsopen-oct2xls-xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you either need MS-Excel 2007 for Windows (or later version) installed, and/or the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed.<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI. <br />
<br />
<br />
=== Spreadsheet formula support ===<br />
When using the POI and JXL interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results (like in COM interface), or the literal formula text strings;<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings. The former is also like in COM.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL', do not expect meaningful results when reading those files later on ''unless'' you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion. <br />
<br />
=== Matlab compatibility ===<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below. When using the Java interfaces octave supplies some formula manipulation support.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
=== Comparison of interfaces & usage ===<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.7) not all Excel functions have been implemented. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
=== Development ===<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=1204IO package2012-06-15T19:43:28Z<p>83.163.225.168: /* Required support software */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
== ODS support ==<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
=== Files content ===<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
=== Required support software ===<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5 and 0.8.6+ work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 has been confirmed to work with odfdom 0.8.7 and earlier. odfdom-0.8.8 hasn't been tested with other xercesImpl.jar releases yet). Get them here:<br />
** http://incubator.apache.org/odftoolkit/downloads.html (contains odfdom-0.8.8)<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.2 (final) is the most recent one and recommended for Octave). jOpenDocument 1.3 beta 1 is supported in io-1.0.19 and is much faster with reading.<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the java classpath.<br />
<br />
=== Usage ===<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
=== Spreadsheet formula support ===<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
=== Gotchas ===<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
==== Date and time in ODS ====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year).<br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
==== Java memory pool allocation size ====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
==== Reading cells containing errors ====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc often contain invalid formulas but may have a 0 (null) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's (only with jOpenDocument 1.2b2, fixed in 1.2b3+ and 1.2 final):<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
* a valid range MUST be specified, I haven't found a way to discover the actual occupied rows and columns (jOpenDocument can give the physical ones (= capacity) but that doesn't help).<br />
<br />
NOT fixed in version 1.2 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
=== Matlab compatibility ===<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
=== Comparison of interfaces ===<br />
The ODFtoolkit is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The jOpenDocument interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
=== Development ===<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
=== ODFDOM versions ===<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on (early 2011) version 0.8.7 has been tested too - this needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet.<br />
So at the moment (May 2011 = last I looked) only odfdom versions 0.7.5, 0.8.6 and 0.8.7 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
== XLS support ==<br />
=== Files content ===<br />
* '''xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users.<br />
<br />
=== Required support software ===<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java jre or jdk > 1.6.0 (hasn't been tested with earlier versions)<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath. <br />
<br />
=== Usage ===<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen-xls2oct-xlsclose-parsecell''' and '''xlsopen-oct2xls-xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you either need MS-Excel 2007 for Windows (or later version) installed, and/or the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed.<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI. <br />
<br />
<br />
=== Spreadsheet formula support ===<br />
When using the POI and JXL interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results (like in COM interface), or the literal formula text strings;<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings. The former is also like in COM.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL', do not expect meaningful results when reading those files later on ''unless'' you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion. <br />
<br />
=== Matlab compatibility ===<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below. When using the Java interfaces octave supplies some formula manipulation support.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
=== Comparison of interfaces & usage ===<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.7) not all Excel functions have been implemented. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
=== Development ===<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=1040IO package2012-04-23T20:28:42Z<p>83.163.225.168: /* Required support software */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
== ODS support ==<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
=== Files content ===<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
=== Required support software ===<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, and 0.8.7 work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 works with odfdom). Get them here:<br />
** http://odftoolkit.org/projects/odfdom/pages/Home<br />
** http://odftoolkit.org/projects/odfdom/downloads/directory/current-version<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.2 (final) is the most recent one and recommended for Octave). jOpenDocument 1.3 beta 1 is now also supported (but in svn only, to be released io-1.0.19) and is much faster with reading.<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the java classpath.<br />
<br />
=== Usage ===<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
=== Spreadsheet formula support ===<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
=== Gotchas ===<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
==== Date and time in ODS ====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year).<br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
==== Java memory pool allocation size ====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
==== Reading cells containing errors ====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc often contain invalid formulas but may have a 0 (null) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's (only with jOpenDocument 1.2b2, fixed in 1.2b3+ and 1.2 final):<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
* a valid range MUST be specified, I haven't found a way to discover the actual occupied rows and columns (jOpenDocument can give the physical ones (= capacity) but that doesn't help).<br />
<br />
NOT fixed in version 1.2 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
=== Matlab compatibility ===<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
=== Comparison of interfaces ===<br />
The ODFtoolkit is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The jOpenDocument interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
=== Development ===<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
=== ODFDOM versions ===<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on (early 2011) version 0.8.7 has been tested too - this needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet.<br />
So at the moment (May 2011 = last I looked) only odfdom versions 0.7.5, 0.8.6 and 0.8.7 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
== XLS support ==<br />
=== Files content ===<br />
* '''xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users.<br />
<br />
=== Required support software ===<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java jre or jdk > 1.6.0 (hasn't been tested with earlier versions)<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath. <br />
<br />
=== Usage ===<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen-xls2oct-xlsclose-parsecell''' and '''xlsopen-oct2xls-xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you either need MS-Excel 2007 for Windows (or later version) installed, and/or the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed.<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI. <br />
<br />
<br />
=== Spreadsheet formula support ===<br />
When using the POI and JXL interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results (like in COM interface), or the literal formula text strings;<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings. The former is also like in COM.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL', do not expect meaningful results when reading those files later on ''unless'' you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion. <br />
<br />
=== Matlab compatibility ===<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below. When using the Java interfaces octave supplies some formula manipulation support.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
=== Comparison of interfaces & usage ===<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.7) not all Excel functions have been implemented. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
=== Development ===<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=1039IO package2012-04-23T20:20:55Z<p>83.163.225.168: /* Files content */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
== ODS support ==<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
=== Files content ===<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
=== Required support software ===<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, and 0.8.7 work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 works with odfdom). Get them here:<br />
** http://odftoolkit.org/projects/odfdom/pages/Home<br />
** http://odftoolkit.org/projects/odfdom/downloads/directory/current-version<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.2 (final) is the most recent one and recommended for Octave). jOpenDocument 1.3 beta 1 is now also supported by the I/O package and is much faster with reading.<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the java classpath.<br />
<br />
=== Usage ===<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
=== Spreadsheet formula support ===<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
=== Gotchas ===<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
==== Date and time in ODS ====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year).<br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
==== Java memory pool allocation size ====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
==== Reading cells containing errors ====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc often contain invalid formulas but may have a 0 (null) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's (only with jOpenDocument 1.2b2, fixed in 1.2b3+ and 1.2 final):<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
* a valid range MUST be specified, I haven't found a way to discover the actual occupied rows and columns (jOpenDocument can give the physical ones (= capacity) but that doesn't help).<br />
<br />
NOT fixed in version 1.2 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
=== Matlab compatibility ===<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
=== Comparison of interfaces ===<br />
The ODFtoolkit is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The jOpenDocument interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
=== Development ===<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
=== ODFDOM versions ===<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on (early 2011) version 0.8.7 has been tested too - this needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet.<br />
So at the moment (May 2011 = last I looked) only odfdom versions 0.7.5, 0.8.6 and 0.8.7 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
== XLS support ==<br />
=== Files content ===<br />
* '''xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users.<br />
<br />
=== Required support software ===<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java jre or jdk > 1.6.0 (hasn't been tested with earlier versions)<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath. <br />
<br />
=== Usage ===<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen-xls2oct-xlsclose-parsecell''' and '''xlsopen-oct2xls-xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you either need MS-Excel 2007 for Windows (or later version) installed, and/or the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed.<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI. <br />
<br />
<br />
=== Spreadsheet formula support ===<br />
When using the POI and JXL interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results (like in COM interface), or the literal formula text strings;<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings. The former is also like in COM.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL', do not expect meaningful results when reading those files later on ''unless'' you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion. <br />
<br />
=== Matlab compatibility ===<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below. When using the Java interfaces octave supplies some formula manipulation support.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
=== Comparison of interfaces & usage ===<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.7) not all Excel functions have been implemented. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
=== Development ===<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=1038IO package2012-04-23T20:19:12Z<p>83.163.225.168: /* Required support software */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
== ODS support ==<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
=== Files content ===<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
=== Required support software ===<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, and 0.8.7 work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 works with odfdom). Get them here:<br />
** http://odftoolkit.org/projects/odfdom/pages/Home<br />
** http://odftoolkit.org/projects/odfdom/downloads/directory/current-version<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.2 (final) is the most recent one and recommended for Octave). jOpenDocument 1.3 beta 1 is now also supported by the I/O package and is much faster with reading.<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the java classpath.<br />
<br />
=== Usage ===<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
=== Spreadsheet formula support ===<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
=== Gotchas ===<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
==== Date and time in ODS ====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year).<br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
==== Java memory pool allocation size ====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
==== Reading cells containing errors ====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc often contain invalid formulas but may have a 0 (null) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's (only with jOpenDocument 1.2b2, fixed in 1.2b3+ and 1.2 final):<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
* a valid range MUST be specified, I haven't found a way to discover the actual occupied rows and columns (jOpenDocument can give the physical ones (= capacity) but that doesn't help).<br />
<br />
NOT fixed in version 1.2 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
=== Matlab compatibility ===<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
=== Comparison of interfaces ===<br />
The ODFtoolkit is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The jOpenDocument interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
=== Development ===<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
=== ODFDOM versions ===<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on (early 2011) version 0.8.7 has been tested too - this needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet.<br />
So at the moment (May 2011 = last I looked) only odfdom versions 0.7.5, 0.8.6 and 0.8.7 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
== XLS support ==<br />
=== Files content ===<br />
* xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users. <br />
<br />
=== Required support software ===<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java jre or jdk > 1.6.0 (hasn't been tested with earlier versions)<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath. <br />
<br />
=== Usage ===<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen-xls2oct-xlsclose-parsecell''' and '''xlsopen-oct2xls-xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you either need MS-Excel 2007 for Windows (or later version) installed, and/or the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed.<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI. <br />
<br />
<br />
=== Spreadsheet formula support ===<br />
When using the POI and JXL interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results (like in COM interface), or the literal formula text strings;<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings. The former is also like in COM.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL', do not expect meaningful results when reading those files later on ''unless'' you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion. <br />
<br />
=== Matlab compatibility ===<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below. When using the Java interfaces octave supplies some formula manipulation support.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
=== Comparison of interfaces & usage ===<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.7) not all Excel functions have been implemented. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
=== Development ===<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=520IO package2012-01-17T23:11:49Z<p>83.163.225.168: /* Matlab compatibility */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
== ODS support ==<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
=== Files content ===<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
=== Required support software ===<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, and 0.8.7 work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 works with odfdom). Get them here:<br />
** http://odftoolkit.org/projects/odfdom/pages/Home<br />
** http://odftoolkit.org/projects/odfdom/downloads/directory/current-version<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.2 (final) is the most recent one and recommended for Octave)<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the java classpath.<br />
<br />
=== Usage ===<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
=== Spreadsheet formula support ===<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
=== Gotchas ===<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
==== Date and time in ODS ====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year).<br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
==== Java memory pool allocation size ====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
==== Reading cells containing errors ====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc often contain invalid formulas but may have a 0 (null) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's (only with jOpenDocument 1.2b2, fixed in 1.2b3+ and 1.2 final):<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
* a valid range MUST be specified, I haven't found a way to discover the actual occupied rows and columns (jOpenDocument can give the physical ones (= capacity) but that doesn't help).<br />
<br />
NOT fixed in version 1.2 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
=== Matlab compatibility ===<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
=== Comparison of interfaces ===<br />
The ODFtoolkit is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The jOpenDocument interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
=== Development ===<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
=== ODFDOM versions ===<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on (early 2011) version 0.8.7 has been tested too - this needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet.<br />
So at the moment (May 2011 = last I looked) only odfdom versions 0.7.5, 0.8.6 and 0.8.7 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
== XLS support ==<br />
=== Files content ===<br />
* xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users. <br />
<br />
=== Required support software ===<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java jre or jdk > 1.6.0 (hasn't been tested with earlier versions)<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath. <br />
<br />
=== Usage ===<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen-xls2oct-xlsclose-parsecell''' and '''xlsopen-oct2xls-xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you either need MS-Excel 2007 for Windows (or later version) installed, and/or the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed.<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI. <br />
<br />
<br />
=== Spreadsheet formula support ===<br />
When using the POI and JXL interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results (like in COM interface), or the literal formula text strings;<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings. The former is also like in COM.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL', do not expect meaningful results when reading those files later on ''unless'' you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion. <br />
<br />
=== Matlab compatibility ===<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below. When using the Java interfaces octave supplies some formula manipulation support.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formats. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info). For dates before 1/1/1900, Octave returns dates as text strings.<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet).<br />
<br />
=== Comparison of interfaces & usage ===<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.7) not all Excel functions have been implemented. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
=== Development ===<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=519IO package2012-01-17T23:09:09Z<p>83.163.225.168: /* Comparison of interfaces & usage */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
== ODS support ==<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
=== Files content ===<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
=== Required support software ===<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, and 0.8.7 work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 works with odfdom). Get them here:<br />
** http://odftoolkit.org/projects/odfdom/pages/Home<br />
** http://odftoolkit.org/projects/odfdom/downloads/directory/current-version<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.2 (final) is the most recent one and recommended for Octave)<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the java classpath.<br />
<br />
=== Usage ===<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
=== Spreadsheet formula support ===<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
=== Gotchas ===<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
==== Date and time in ODS ====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year).<br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
==== Java memory pool allocation size ====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
==== Reading cells containing errors ====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc often contain invalid formulas but may have a 0 (null) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's (only with jOpenDocument 1.2b2, fixed in 1.2b3+ and 1.2 final):<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
* a valid range MUST be specified, I haven't found a way to discover the actual occupied rows and columns (jOpenDocument can give the physical ones (= capacity) but that doesn't help).<br />
<br />
NOT fixed in version 1.2 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
=== Matlab compatibility ===<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
=== Comparison of interfaces ===<br />
The ODFtoolkit is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The jOpenDocument interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
=== Development ===<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
=== ODFDOM versions ===<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on (early 2011) version 0.8.7 has been tested too - this needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet.<br />
So at the moment (May 2011 = last I looked) only odfdom versions 0.7.5, 0.8.6 and 0.8.7 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
== XLS support ==<br />
=== Files content ===<br />
* xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users. <br />
<br />
=== Required support software ===<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java jre or jdk > 1.6.0 (hasn't been tested with earlier versions)<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath. <br />
<br />
=== Usage ===<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen-xls2oct-xlsclose-parsecell''' and '''xlsopen-oct2xls-xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you either need MS-Excel 2007 for Windows (or later version) installed, and/or the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed.<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI. <br />
<br />
<br />
=== Spreadsheet formula support ===<br />
When using the POI and JXL interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results (like in COM interface), or the literal formula text strings;<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings. The former is also like in COM.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL', do not expect meaningful results when reading those files later on ''unless'' you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion. <br />
<br />
=== Matlab compatibility ===<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below. When using the Java interfaces octave supplies some formula manipulation support.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formates. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info).<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet). <br />
<br />
=== Comparison of interfaces & usage ===<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.7) not all Excel functions have been implemented. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.10^5 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets.<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
=== Development ===<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=Java_package&diff=518Java package2012-01-17T22:58:25Z<p>83.163.225.168: </p>
<hr />
<div>Octave is an easy to use but powerful environment for mathematical calculations, which can easily be extended by packages. Its features are close to the commercial tool Matlab so that it can often be used as a replacement. Java on the other hand offers a rich, object oriented and platform independent environment for many applications. The core Java classes can be easily extended by many freely available libraries. This document refers to the package <code>java</code>, which is part of the GNU Octave project. This package allows you to access Java classes from inside Octave. Thus it is possible to use existing class files or complete Java libraries directly from Octave.<br />
<br />
This description is based on the Octave package {{Codeline|java-1.2.8}}. The {{Forge|java}} package usually installs its script files (.m) in the directory {{Path|.../share/Octave/packages/java-1.2.8}} and its binary (.oct) files in {{Path|.../libexec/Octave/packages/java-1.2.8}}. You can get help on specific functions in Octave by executing the help command<br />
with the name of a function from this package:<br />
octave> help javaObject<br />
<br />
You can view the whole doc file in Octave by executing the info command with just the word java:<br />
octave> doc java<br />
<br />
Note on calling Octave from Java: the java package is designed for calling Java from Octave. If you want to call Octave from Java, you might want to use a library like [http://kenai.com/projects/javaOctave javaOctave] or [http://jopas.sourceforge.net joPas]. <br />
<br />
=FAQ=<br />
==How to distinguish between Octave and Matlab?==<br />
Octave and Matlab are very similar, but handle Java slightly different. Therefore it may be necessary to [[Compatibility#Are_we_running_octave.3F|detect the environment]] and use the appropriate functions.<br />
<br />
==How to make Java classes available to Octave?==<br />
Java finds classes by searching a {{Codeline|classpath}}. This is a list of Java archive files and/or directories containing class files. In Octave and Matlab the {{Codeline|classpath}} is composed of two parts:<br />
*the static {{Codeline|classpath}} is initialized once at startup of the JVM, and;<br />
*the dynamic {{Codeline|classpath}} which can be modified at runtime.<br />
<br />
Octave searches the static {{Codeline|classpath}} first, then the dynamic {{Codeline|classpath}}. Classes appearing in the static as well as in the dynamic {{Codeline|classpath}} will therefore be found in the static {{Codeline|classpath}} and loaded from this location.<br />
<br />
Classes which shall be used regularly or must be available to all users should be added to the static {{Codeline|classpath}}. The static {{Codeline|classpath}} is populated once from the contents of a plain text file named {{Path|classpath.txt}} when the Java Virtual Machine starts. This file contains one line for each individual {{Codeline|classpath}} to be added to the static {{Codeline|classpath}}. These lines can identify single class files, directories containing class files or Java archives with complete class file hierarchies. Comment lines starting with a {{Codeline|#}} or a {{Codeline|%}} character are ignored.<br />
<br />
The search rules for the file {{Path|classpath.txt}} are:<br />
*First, Octave searches for the file {{Path|classpath.txt}} in your home directory, If such a file is found, it is read and defines the initial static {{Codeline|classpath}}. Thus it is possible to build an initial static {{Codeline|classpath}} on a "per user" basis.<br />
*Next, Octave looks for another file {{Path|classpath.txt}} in the package installation directory. This is where {{Path|javaclasspath.m}} resides, usually something like<br />
:<pre>...\share\Octave\packages\java-1.2.8.</pre><br />
:you can find this directory by executing the command {{Codeline|pkg list}}. If this file exists, its contents is also appended to the static {{Codeline|classpath}}. Note that the archives and class directories defined in this file will affect all users.<br />
<br />
Classes which are used only by a specific script should be placed in the dynamic {{Codeline|classpath}}. This portion of the {{Codeline|classpath}} can be modified at runtime using the {{Codeline|javaaddpath}} and {{Codeline|javarmpath}} functions. Example:<br />
octave> base_path = "C:/Octave/java_files";<br />
octave> % add two JARchives to the dynamic classpath<br />
octave> javaaddpath ([base_path, "/someclasses.jar"]);<br />
octave> javaaddpath ([base_path, "/moreclasses.jar"]);<br />
octave> % check the dynamic classpath<br />
octave> p = javaclasspath;<br />
octave> disp (p{1});<br />
C:/Octave/java_files/someclasses.jar<br />
octave> disp (p{2});<br />
C:/Octave/java_files/moreclasses.jar<br />
octave> % remove the first element from the classpath<br />
octave> javarmpath ([base_path, "/someclasses.jar"]);<br />
octave> p = javaclasspath;<br />
octave> disp (p{1});<br />
C:/Octave/java_files/moreclasses.jar<br />
octave> % provoke an error<br />
octave> disp (p{2});<br />
error: A(I): Index exceeds matrix dimension.<br />
<br />
Another way to add files to the dynamic {{Codeline|classpath}} exclusively for your user account is to use the file {{Path|.octaverc}} which is stored in your home directory. All Octave commands in this file are executed each time you start a new instance of Octave. The following example adds the directory {{Path|~/octave}} to Octave’s search path and the archive {{Path|myclasses.jar}} in this directory to the Java search path.<br />
<br />
{{File|octaverc|<pre><br />
addpath ("~/octave");<br />
javaaddpath ("~/octave/myclasses.jar");</pre>}}<br />
<br />
== How to create an instance of a Java class? ==<br />
If your code shall work under Octave as well as Matlab you should use the function {{Codeline|javaObject}} to create Java objects. The function {{Codeline|java_new}} is Octave specific and does not exist in the Matlab environment. Using [[Compatibility#Are_we_running_octave.3F|{{Codeline|is_octave()}}]] to distinguish between environments<br />
<br />
if (is_octave)<br />
Passenger = java_new ('package.FirstClass', row, seat); % works only in Octave <br />
else<br />
Passenger = javaObject ('package.FirstClass', row, seat); % actually works in both Octave and matlab<br />
end<br />
<br />
==How can I handle memory limitations?==<br />
In order to execute Java code Octave creates a Java Virtual Machine (JVM). Such a JVM allocates a fixed amount of initial memory and may expand this pool up to a fixed maximum memory limit. The default values depend on the Java version (see {{Codeline|help javamem}}). The memory pool is shared by all Java objects running in the JVM. This strict memory limit is intended mainly to avoid that runaway applications inside web browsers or in enterprise servers can consume all memory and crash the system. When the maximum memory limit is hit, Java code will throw exceptions so that applications will fail or behave unexpectedly.<br />
<br />
In Octave as well as in Matlab, you can specify options for the creation of the JVM inside a file named {{Path|java.opts}}. This is a text file where you can enter lines containing {{Codeline|-X}} and {{Codeline|-D}} options handed to the JVM during initialization.<br />
<br />
In Octave, the Java options file must be located in the directory where {{Path|javaclasspath.m}} resides, i.e. the package installation directory, usually something like {{Path|...\share\Octave\packages\java-1.2.8}}. You can find this directory by executing {{Codeline|pkg list}}.<br />
<br />
In Matlab, the options file goes into the {{Path|MATLABROOT/bin/ARCH}} directory or in your personal Matlab startup directory (can be determined by a {{Codeline|pwd}} command). MATLABROOT is the Matlab root directory and ARCH is your system architecture, which you find by issuing the commands {{Codeline|matlabroot}} respectively {{Codeline|computer('arch')}}.<br />
<br />
The {{Codeline|-X}} options allow you to increase the maximum amount of memory available to the JVM to 256 Megabytes by adding the following line to the {{java.opts}} file:{{File|java.opts|<pre>-Xmx256m</pre>}}<br />
<br />
The maximum possible amount of memory depends on your system. On a Windows system with 2 Gigabytes main memory you should be able to set this maximum to about 1 Gigabyte.<br />
<br />
If your application requires a large amount of memory from the beginning, you can also specify the initial amount of memory allocated to the JVM. Adding the following line to the {{Path|java.opts}} file starts the JVM with 64 Megabytes of initial memory:{{File|java.opts|<pre>-Xms64m</pre>}}<br />
<br />
For more details on the available {{Codeline|-X}} options of your Java Virtual Machine issue the command {{Codeline|java -X}} at the operating system command prompt and consult the Java documentation.<br />
<br />
The {{Codeline|-D}} options can be used to define system properties which can then be used by Java<br />
classes inside Octave. System properties can be retrieved by using the {{Codeline|getProperty()}}<br />
methods of the {{Codeline|java.lang.System}} class. The following example line defines the property<br />
{{Codeline|MyProperty}} and assigns it the string {{Codeline|12.34}}.<br />
-DMyProperty=12.34<br />
The value of this property can then be retrieved as a string by a Java object or in Octave:<br />
octave> javaMethod("java.lang.System", "getProperty", "MyProperty");<br />
ans = 12.34<br />
<br />
==How to install the java package in Octave?==<br />
===Uninstall the currently installed package java===<br />
Check whether the java package is already installed by issuing the {{Codeline|pkg list}} command:<br />
octave> pkg list<br />
Package Name | Version | Installation directory<br />
--------------+---------+-----------------------<br />
java *| 1.2.8 | /home/octavio/octave/java-1.2.8<br />
<br />
If the java package appears in the list you must uninstall it first by issuing the command<br />
octave> pkg uninstall java<br />
octave> pkg list<br />
<br />
Now the java package should not be listed anymore. If you have used the java package during the current session of Octave, you have to exit and restart Octave before you can uninstall the package. This is because the system keeps certain libraries in memory after they have been loaded once.<br />
<br />
===Make sure that the build environment is configured properly===<br />
The installation process requires that the environment variable {{Codeline|JAVA_HOME}} points to the<br />
Java Development Kit (JDK) on your computer.<br />
*Note that JDK is not equal to JRE (Java Runtime Environment). The JDK home directory contains subdirectories with include, library and executable files which are required to compile the java package. These files are not part of the JRE, so you definitely need the JDK.<br />
*Do not use backslashes but ordinary slashes in the path. Set the environment variable {{Codeline|JAVA_HOME}} according to your local JDK installation. Please adapt the path in the following examples according to the JDK installation on your system. If you are using a Windows system that might be:<br />
:<pre>octave> setenv ("JAVA_HOME", "C:/Java/jdk1.6.0_21");</pre><br />
:If you are using a Linux system this would look probably more like:<br />
:<pre>octave> setenv ("JAVA_HOME", "/usr/local/jdk1.6.0_21");</pre><br />
:Note, that on all systems you must use the forward slash {{Codeline|/}} as the separator, not the backslash {{Codeline|\}}. If on a Windows system the environment variable {{Codeline|JAVA_HOME}} is already defined using the backslash, you can easily change this by issuing the following Octave command before starting the installation:<br />
:<pre>octave> setenv ("JAVA_HOME", strrep (getenv ("JAVA_HOME"), '\', '/'))</pre><br />
<br />
===Compile and install the package in Octave===<br />
[[Octave_Forge#installing_packages|Install the package]] from octave-forge.<br />
<br />
Note:<br />
On Windows (MinGW) systems the Java package can be (slightly) miscompiled; until now errors have only been reported when using Java Swing stuff. To fix this, the following compiler flags have to be added:<br />
<br />
-Wl,--kill,-at to the $(MKOCTFILE) in the Makefile<br />
<br />
see:<br />
[http://sourceforge.net/mailarchive/forum.php?thread_name=CAB-99LuCbL3u0LmZuAmneRb_0G8pjmyha9viFpncMnjC%3DeBxJA%40mail.gmail.com&forum_name=octave-dev]<br />
(scroll down a bit for the relevant postings)<br />
<br />
<br />
On 64-bit systems, some files needed for compilation might reside in <JAVA_HOME>/jre/bin/server/ rather than in <JAVA_HOME>/jre/bin/client/ <br />
<br />
This will lead to build errors as the install script can't find needed files where it expects them to be.<br />
<br />
To fix this, a suitale symlink to client/ should suffice.<br />
<br />
===Test the java package installation===<br />
The following code creates a Java string object, which however is automatically converted<br />
to an Octave string:<br />
octave> s = javaObject ("java.lang.String", "Hello OctaveString")<br />
s = Hello OctaveString<br />
<br />
Note that the java package automatically transforms the Java String object to an Octave<br />
string. This means that you cannot apply Java String methods to the result.<br />
<br />
This "auto boxing" scheme seems to be implemented for the following Java classes:<br />
*{{Codeline|java.lang.Integer}}<br />
*{{Codeline|java.lang.Double}}<br />
*{{Codeline|java.lang.Boolean}}<br />
*{{Codeline|java.lang.String}}<br />
<br />
If you instead create an object for which no "auto-boxing" is implemented, {{Codeline|javaObject}}<br />
returns the genuine Java object:<br />
octave> v = javaObject ("java.util.Vector")<br />
v =<br />
<Java object: java.util.Vector><br />
octave> v.add(12);<br />
octave> v.get(0)<br />
ans = 12<br />
<br />
If you have created such a Java object, you can apply all methods of the Java class to<br />
the returned object. Note also that for some objects you must specify an initializer:<br />
% not:<br />
octave> d = javaObject ("java.lang.Double")<br />
error: [java] java.lang.NoSuchMethodException: java.lang.Double<br />
% but:<br />
octave> d = javaObject ("java.lang.Double", 12.34)<br />
d = 12.340<br />
<br />
==Which TEX symbols are implemented in the dialog functions?==<br />
The dialog functions contain a translation table for TEX like symbol codes. Thus messages<br />
and labels can be tailored to show some common mathematical symbols or Greek characters.<br />
No further TEX formatting codes are supported. The characters are translated to their<br />
Unicode equivalent. However, not all characters may be displayable on your system. This<br />
depends on the font used by the Java system on your computer.<br />
<br />
Each TEX symbol code must be terminated by a space character to make it distinguishable from<br />
the surrounding text. Therefore the string {{Codeline|\alpha &#61;12.0}} will produce the<br />
desired result, whereas {{Codeline|\alpha&#61;12.0}} would produce the literal text {{Codeline|\alpha&#61;12.0}}.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=Creating_packages&diff=294Creating packages2011-12-11T10:40:54Z<p>83.163.225.168: /* See also */</p>
<hr />
<div>== Package structure ==<br />
=== Single package ===<br />
=== Multi package ===<br />
<br />
==== PKG_ADD ====<br />
<br />
==== PKG_DEL ====<br />
<br />
==== pre_install.m ====<br />
==== post_install.m ====<br />
==== on_uninstall.m ====<br />
<br />
== See also ==<br />
http://octave.sourceforge.net/developers.html<br />
* [[Licensing in Octave]]<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=Java_package&diff=293Java package2011-12-11T10:36:47Z<p>83.163.225.168: /* Compile and install the package in Octave */</p>
<hr />
<div>Octave is an easy to use but powerful environment for mathematical calculations, which can easily be extended by packages. Its features are close to the commercial tool Matlab so that it can often be used as a replacement. Java on the other hand offers a rich, object oriented and platform independent environment for many applications. The core Java classes can be easily extended by many freely available libraries. This document refers to the package <code>java</code>, which is part of the GNU Octave project. This package allows you to access Java classes from inside Octave. Thus it is possible to use existing class files or complete Java libraries directly from Octave.<br />
<br />
This description is based on the Octave package {{Codeline|java-1.2.8}}. The {{Forge|java}} package usually installs its script files (.m) in the directory {{Path|.../share/Octave/packages/java-1.2.8}} and its binary (.oct) files in {{Path|.../libexec/Octave/packages/java-1.2.8}}. You can get help on specific functions in Octave by executing the help command<br />
with the name of a function from this package:<br />
octave> help javaObject<br />
<br />
You can view the whole doc file in Octave by executing the info command with just the word java:<br />
octave> doc java<br />
<br />
Note on calling Octave from Java: the java package is designed for calling Java from Octave. If you want to call Octave from Java, you might want to use a library like [http://kenai.com/projects/javaOctave javaOctave] or [http://jopas.sourceforge.net joPas]. <br />
<br />
=FAQ=<br />
==How to distinguish between Octave and Matlab?==<br />
Octave and Matlab are very similar, but handle Java slightly different. Therefore it may be necessary to [[Compatibility#Are_we_running_octave.3F|detect the environment]] and use the appropriate functions.<br />
<br />
==How to make Java classes available to Octave?==<br />
Java finds classes by searching a {{Codeline|classpath}}. This is a list of Java archive files and/or directories containing class files. In Octave and Matlab the {{Codeline|classpath}} is composed of two parts:<br />
*the static {{Codeline|classpath}} is initialized once at startup of the JVM, and;<br />
*the dynamic {{Codeline|classpath}} which can be modified at runtime.<br />
<br />
Octave searches the static {{Codeline|classpath}} first, then the dynamic {{Codeline|classpath}}. Classes appearing in the static as well as in the dynamic {{Codeline|classpath}} will therefore be found in the static {{Codeline|classpath}} and loaded from this location.<br />
<br />
Classes which shall be used regularly or must be available to all users should be added to the static {{Codeline|classpath}}. The static {{Codeline|classpath}} is populated once from the contents of a plain text file named {{Path|classpath.txt}} when the Java Virtual Machine starts. This file contains one line for each individual {{Codeline|classpath}} to be added to the static {{Codeline|classpath}}. These lines can identify single class files, directories containing class files or Java archives with complete class file hierarchies. Comment lines starting with a {{Codeline|#}} or a {{Codeline|%}} character are ignored.<br />
<br />
The search rules for the file {{Path|classpath.txt}} are:<br />
*First, Octave searches for the file {{Path|classpath.txt}} in your home directory, If such a file is found, it is read and defines the initial static {{Codeline|classpath}}. Thus it is possible to build an initial static {{Codeline|classpath}} on a "per user" basis.<br />
*Next, Octave looks for another file {{Path|classpath.txt}} in the package installation directory. This is where {{Path|javaclasspath.m}} resides, usually something like<br />
:<pre>...\share\Octave\packages\java-1.2.8.</pre><br />
:you can find this directory by executing the command {{Codeline|pkg list}}. If this file exists, its contents is also appended to the static {{Codeline|classpath}}. Note that the archives and class directories defined in this file will affect all users.<br />
<br />
Classes which are used only by a specific script should be placed in the dynamic {{Codeline|classpath}}. This portion of the {{Codeline|classpath}} can be modified at runtime using the {{Codeline|javaaddpath}} and {{Codeline|javarmpath}} functions. Example:<br />
octave> base_path = "C:/Octave/java_files";<br />
octave> % add two JARchives to the dynamic classpath<br />
octave> javaaddpath ([base_path, "/someclasses.jar"]);<br />
octave> javaaddpath ([base_path, "/moreclasses.jar"]);<br />
octave> % check the dynamic classpath<br />
octave> p = javaclasspath;<br />
octave> disp (p{1});<br />
C:/Octave/java_files/someclasses.jar<br />
octave> disp (p{2});<br />
C:/Octave/java_files/moreclasses.jar<br />
octave> % remove the first element from the classpath<br />
octave> javarmpath ([base_path, "/someclasses.jar"]);<br />
octave> p = javaclasspath;<br />
octave> disp (p{1});<br />
C:/Octave/java_files/moreclasses.jar<br />
octave> % provoke an error<br />
octave> disp (p{2});<br />
error: A(I): Index exceeds matrix dimension.<br />
<br />
Another way to add files to the dynamic {{Codeline|classpath}} exclusively for your user account is to use the file {{Path|.octaverc}} which is stored in your home directory. All Octave commands in this file are executed each time you start a new instance of Octave. The following example adds the directory {{Path|~/octave}} to Octave’s search path and the archive {{Path|myclasses.jar}} in this directory to the Java search path.<br />
<br />
{{File|octaverc|<pre><br />
addpath ("~/octave");<br />
javaaddpath ("~/octave/myclasses.jar");</pre>}}<br />
<br />
== How to create an instance of a Java class? ==<br />
If your code shall work under Octave as well as Matlab you should use the function {{Codeline|javaObject}} to create Java objects. The function {{Codeline|java_new}} is Octave specific and does not exist in the Matlab environment. Using [[Compatibility#Are_we_running_octave.3F|{{Codeline|is_octave()}}]] to distinguish between environments<br />
<br />
if (is_octave)<br />
Passenger = java_new ('package.FirstClass', row, seat); % works only in Octave <br />
else<br />
Passenger = javaObject ('package.FirstClass', row, seat); % actually works in both Octave and matlab<br />
end<br />
<br />
==How can I handle memory limitations?==<br />
In order to execute Java code Octave creates a Java Virtual Machine (JVM). Such a JVM allocates a fixed amount of initial memory and may expand this pool up to a fixed maximum memory limit. The default values depend on the Java version (see {{Codeline|help javamem}}). The memory pool is shared by all Java objects running in the JVM. This strict memory limit is intended mainly to avoid that runaway applications inside web browsers or in enterprise servers can consume all memory and crash the system. When the maximum memory limit is hit, Java code will throw exceptions so that applications will fail or behave unexpectedly.<br />
<br />
In Octave as well as in Matlab, you can specify options for the creation of the JVM inside a file named {{Path|java.opts}}. This is a text file where you can enter lines containing {{Codeline|-X}} and {{Codeline|-D}} options handed to the JVM during initialization.<br />
<br />
In Octave, the Java options file must be located in the directory where {{Path|javaclasspath.m}} resides, i.e. the package installation directory, usually something like {{Path|...\share\Octave\packages\java-1.2.8}}. You can find this directory by executing {{Codeline|pkg list}}.<br />
<br />
In Matlab, the options file goes into the {{Path|MATLABROOT/bin/ARCH}} directory or in your personal Matlab startup directory (can be determined by a {{Codeline|pwd}} command). MATLABROOT is the Matlab root directory and ARCH is your system architecture, which you find by issuing the commands {{Codeline|matlabroot}} respectively {{Codeline|computer('arch')}}.<br />
<br />
The {{Codeline|-X}} options allow you to increase the maximum amount of memory available to the JVM to 256 Megabytes by adding the following line to the {{java.opts}} file:{{File|java.opts|<pre>-Xmx256m</pre>}}<br />
<br />
The maximum possible amount of memory depends on your system. On a Windows system with 2 Gigabytes main memory you should be able to set this maximum to about 1 Gigabyte.<br />
<br />
If your application requires a large amount of memory from the beginning, you can also specify the initial amount of memory allocated to the JVM. Adding the following line to the {{Path|java.opts}} file starts the JVM with 64 Megabytes of initial memory:{{File|java.opts|<pre>-Xms64m</pre>}}<br />
<br />
For more details on the available {{Codeline|-X}} options of your Java Virtual Machine issue the command {{Codeline|java -X}} at the operating system command prompt and consult the Java documentation.<br />
<br />
The {{Codeline|-D}} options can be used to define system properties which can then be used by Java<br />
classes inside Octave. System properties can be retrieved by using the {{Codeline|getProperty()}}<br />
methods of the {{Codeline|java.lang.System}} class. The following example line defines the property<br />
{{Codeline|MyProperty}} and assigns it the string {{Codeline|12.34}}.<br />
-DMyProperty=12.34<br />
The value of this property can then be retrieved as a string by a Java object or in Octave:<br />
octave> javaMethod("java.lang.System", "getProperty", "MyProperty");<br />
ans = 12.34<br />
<br />
==How to install the java package in Octave?==<br />
===Uninstall the currently installed package java===<br />
Check whether the java package is already installed by issuing the {{Codeline|pkg list}} command:<br />
octave> pkg list<br />
Package Name | Version | Installation directory<br />
--------------+---------+-----------------------<br />
java *| 1.2.8 | /home/octavio/octave/java-1.2.8<br />
<br />
If the java package appears in the list you must uninstall it first by issuing the command<br />
octave> pkg uninstall java<br />
octave> pkg list<br />
<br />
Now the java package should not be listed anymore. If you have used the java package during the current session of Octave, you have to exit and restart Octave before you can uninstall the package. This is because the system keeps certain libraries in memory after they have been loaded once.<br />
<br />
===Make sure that the build environment is configured properly===<br />
The installation process requires that the environment variable {{Codeline|JAVA_HOME}} points to the<br />
Java Development Kit (JDK) on your computer.<br />
*Note that JDK is not equal to JRE (Java Runtime Environment). The JDK home directory contains subdirectories with include, library and executable files which are required to compile the java package. These files are not part of the JRE, so you definitely need the JDK.<br />
*Do not use backslashes but ordinary slashes in the path. Set the environment variable {{Codeline|JAVA_HOME}} according to your local JDK installation. Please adapt the path in the following examples according to the JDK installation on your system. If you are using a Windows system that might be:<br />
:<pre>octave> setenv ("JAVA_HOME", "C:/Java/jdk1.6.0_21");</pre><br />
:If you are using a Linux system this would look probably more like:<br />
:<pre>octave> setenv ("JAVA_HOME", "/usr/local/jdk1.6.0_21");</pre><br />
:Note, that on all systems you must use the forward slash {{Codeline|/}} as the separator, not the backslash {{Codeline|\}}. If on a Windows system the environment variable {{Codeline|JAVA_HOME}} is already defined using the backslash, you can easily change this by issuing the following Octave command before starting the installation:<br />
:<pre>octave> setenv ("JAVA_HOME", strrep (getenv ("JAVA_HOME"), '\', '/'))</pre><br />
<br />
===Compile and install the package in Octave===<br />
[[Octave_Forge#installing_packages|Install the package]] from octave-forge.<br />
<br />
Note:<br />
On Windows (MinGW) systems the Java package can be (slightly) miscompiled; until now errors have only been reported when using Java Swing stuff. To fix this, the following compiler flags have to be added:<br />
<br />
-Wl,--kill,-at to the $(MKOCTFILE) in the Makefile<br />
<br />
see:<br />
[http://sourceforge.net/mailarchive/forum.php?thread_name=CAB-99LuCbL3u0LmZuAmneRb_0G8pjmyha9viFpncMnjC%3DeBxJA%40mail.gmail.com&forum_name=octave-dev]<br />
(scroll down a bit for the relevant postings)<br />
<br />
===Test the java package installation===<br />
The following code creates a Java string object, which however is automatically converted<br />
to an Octave string:<br />
octave> s = javaObject ("java.lang.String", "Hello OctaveString")<br />
s = Hello OctaveString<br />
<br />
Note that the java package automatically transforms the Java String object to an Octave<br />
string. This means that you cannot apply Java String methods to the result.<br />
<br />
This "auto boxing" scheme seems to be implemented for the following Java classes:<br />
*{{Codeline|java.lang.Integer}}<br />
*{{Codeline|java.lang.Double}}<br />
*{{Codeline|java.lang.Boolean}}<br />
*{{Codeline|java.lang.String}}<br />
<br />
If you instead create an object for which no "auto-boxing" is implemented, {{Codeline|javaObject}}<br />
returns the genuine Java object:<br />
octave> v = javaObject ("java.util.Vector")<br />
v =<br />
<Java object: java.util.Vector><br />
octave> v.add(12);<br />
octave> v.get(0)<br />
ans = 12<br />
<br />
If you have created such a Java object, you can apply all methods of the Java class to<br />
the returned object. Note also that for some objects you must specify an initializer:<br />
% not:<br />
octave> d = javaObject ("java.lang.Double")<br />
error: [java] java.lang.NoSuchMethodException: java.lang.Double<br />
% but:<br />
octave> d = javaObject ("java.lang.Double", 12.34)<br />
d = 12.340<br />
<br />
==Which TEX symbols are implemented in the dialog functions?==<br />
The dialog functions contain a translation table for TEX like symbol codes. Thus messages<br />
and labels can be tailored to show some common mathematical symbols or Greek characters.<br />
No further TEX formatting codes are supported. The characters are translated to their<br />
Unicode equivalent. However, not all characters may be displayable on your system. This<br />
depends on the font used by the Java system on your computer.<br />
<br />
Each TEX symbol code must be terminated by a space character to make it distinguishable from<br />
the surrounding text. Therefore the string {{Codeline|\alpha &#61;12.0}} will produce the<br />
desired result, whereas {{Codeline|\alpha&#61;12.0}} would produce the literal text {{Codeline|\alpha&#61;12.0}}.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168https://wiki.octave.org/wiki/index.php?title=IO_package&diff=292IO package2011-12-11T10:16:20Z<p>83.163.225.168: /* Required support software */</p>
<hr />
<div>The IO package is part of the octave-forge project and provides input/output from/in external formats.<br />
<br />
== ODS support ==<br />
(ODS = Open Document Format spreadsheet data format, used by e.g., LibreOffice and OpenOffice.org)<br />
<br />
=== Files content ===<br />
* '''odsread.m''' &mdash; no-hassle read script for reading from an ODS file and parsing the numeric and text data into separate arrays.<br />
* '''odswrite.m''' &mdash; no-hassle write script for writing to an ODS file.<br />
* '''odsopen.m''' &mdash; get a file pointer to an ODS spreadsheet file.<br />
* '''ods2oct.m''' &mdash; read raw data from an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''oct2ods.m''' &mdash; write data to an ODS spreadsheet file using the file pointer handed by odsopen.<br />
* '''odsclose.m''' &mdash; close file handle made by odsopen and -if data have been transfered to a spreadsheet- save data.<br />
* '''odsfinfo.m''' &mdash; explore sheet names and optionally estimated data size of ods files with unknown content.<br />
* '''calccelladdress.m''' &mdash; utility function needed for jOpenDocument class.<br />
* '''parsecell.m''' &mdash; (contained in Excel xlsread scripts, but works also for ods support) parse raw data (cell array) into separate numeric array and text (cell) array.)<br />
* '''chk_spreadsheet_support.m''' &mdash; internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
<br />
The following are support files called by the scripts and not meant for direct invocation by users:<br />
* spsh_chkrange.m<br />
* spsh_prstype.m<br />
* getusedrange.m<br />
* calccelladdress.m<br />
* parse_sp_range.m<br />
<br />
<br />
=== Required support software ===<br />
<br />
For Windows (MingW):<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.6 will do for most functionality)<br />
<br />
For Linux:<br />
* Octave with Java package (preferrably >= 1.2.8, although 1.2.5 will do for most functionality)<br />
<br />
For ODS access, you'll need to choose at least one of the following java class files collections:<br />
* (currently the preferred option) odfdom.jar (only versions 0.7.5, 0.8.6, and 0.8.7 work OK!) & xercesImpl.jar (NOTE: only version 2.9.1 dated 14 Sep 2007 works with odfdom). Get them here:<br />
** http://odftoolkit.org/projects/odfdom/pages/Home<br />
** http://odftoolkit.org/projects/odfdom/downloads/directory/current-version<br />
** http://www.google.com/search?ie=UTF-8&oe=utf-8&q=xerces-2.9.1+download<br />
* jopendocument<version>.jar. Get it from http://www.jopendocument.org (jOpenDocument 1.2 (final) is the most recent one and recommended for Octave)<br />
* OpenOffice.org (or clones like LibreOffice, Go-Office, ...). Get it from http://www.openoffice.org. The relevant Java class libs are unoil.jar, unoloader.jar, jurt.jar, juh.jar and ridl.jar (which are scattered around the OOo installation directory), while also the <OOo>/program/ directory needs to be in the classpath.<br />
<br />
These must be referenced with full pathnames in your javaclasspath. <br />
Hint: add it in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements.<br />
Alternatively, the io package contains a function script file "chk_spreadsheet_support.m" which can set up the java classpath.<br />
<br />
=== Usage ===<br />
<br />
(see “help ods<function_filename>” in octave terminal.)<br />
<br />
odsread is a sort of analog to xlsread and works more or less the same. odsread is a mere wrapper for the functions odsopen, ods2oct, and odsclose that do file access and the actual reading, plus parsecell for post-processing.<br />
<br />
odswrite works similar to xlswrite. It too is a wrapper for scripts which do the actual work and invoke other scripts, a.o. oct2ods.<br />
<br />
odsfinfo can be used to explore odsfiles with unknown content for sheet names and to get an impression of the data content sizes.<br />
When you need data from just one sheet, odsread is for you. But when you need data from multiple sheets in the same spreadsheet file, or if you want to process spreadsheet data by limited-size chunks at a time, odsopen / ods2oct [/parsecell] / … / odsclose sequences provides for much more speed and flexibility as the spreadsheet needs to be read just once rather than repeatedly for each call to odsread.<br />
<br />
Same reasoning goes for odswrite.<br />
<br />
Also, if you use odsopen / …../, you can process multiple spreadsheets simultaneously – just use odsopen repeatedly to get multiple spreadsheet file pointers.<br />
<br />
Moreover, after adding data to an existing spreadsheet file, you can fiddle with the filename in the ods file pointer struct to save the data into another, possibly new spreadsheet file.<br />
<br />
If you use odsopen / ods2oct / … / oct2ods / …. / odsclose, DO NOT FORGET to invoke odsclose in the end. The file pointers can contain an enormous amount of data and may needlessly keep precious memory allocated. In case of the UNO interface, the hidden OpenOffice.org invocation (soffice.bin) can even block proper closing of Octave.<br />
<br />
=== Spreadsheet formula support ===<br />
<br />
When using the OTK or UNO interface you can:<br />
* (When reading, ods2oct) either read spreadsheet formula results, or the literal formula text strings;<br />
* (When writing, oct2ods) either enter formulas in the worksheet as formulas, or enter them as literal text strings.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. The behaviour is controlled by an option structure options (as last argument to oct2ods.m and ods2oct.m) which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in ODS java, not even a formula validator. So if you create formulas in your spreadsheet using oct2ods or odswrite, do not expect meaningful results when reading those files later on unless you open them in OpenOffice.org Calc and write them back to disk.<br />
You can write all kind of junk as a formula into a spreadsheet cell. There's not much validity checking built into odfdom.jar. I didn't bother to try OpenOffice.org Calc to read such faulty spreadsheets, so I don't know what will happen with spreadsheets containing invalid formulas. But using the above options, you can at least repair them using octave....<br />
<br />
The only exception is if you select the UNO interface, as that invokes OpenOffice.org behind the scenes, and OOo obviously has a validator and evaluator built-in.<br />
<br />
=== Gotchas ===<br />
I know of one big gotcha: i.e. reading dates (& time). A less obvious one is Java memory pool allocation size.<br />
<br />
==== Date and time in ODS ====<br />
Octave (as does Matlab) stores dates as a number representing the number of days since January 1, 0 (and as an aside ignores a.o. Pope Gregorius' intervention in 1582 when 10 days were simply skipped).<br />
<br />
OpenOffice.org stores dates as text strings like “yyyy-mm-dd”.<br />
<br />
MS-Excel stores dates as a number representing the number of days since January 1, 1900 (and as an aside, erroneously assumes 1900 to be a leap year).<br />
<br />
Now, converting OpenOffice.org date cell values (actually, character strings flagged by “date” attributes) into Octave looks pretty straightforward. But when the ODS spreadsheet was originally an Excel spreadsheet converted by OpenOffice.org, the date cells can either be OOo date values (i.e.,strings) OR old numerical values from the Excel spreadsheet.<br />
<br />
So: you should carefully check what happens to date cells.<br />
<br />
As octave has no ”date” or “time” data type, octave date values (usually numerical data) are simply transferred as “floats” to ODS spreadsheets. You'll have to convert the values into dates yourself from within OpenOffice.org.<br />
<br />
While adding data and time values has been implemented in the write scripts, the wait is for clever solutions to distinguish dates from floats in octave cell arrays.<br />
<br />
==== Java memory pool allocation size ====<br />
The Java virtual machine (JVM) initializes one big chunk of your computer's RAM in which all Java classes and methods etc. are to be loaded: the Java memory pool. It does this because Java has a very sophisticated “garbage collection” system. At least on Windows, the initial size is 2MB and the maximum size is 64MB. On Linux this allocated size is much bigger. This part of memory is where the Java-based ODS octave routines (and the Java-based ods routines) live and keep their variables etc.<br />
<br />
For transferring large pieces of information to and from spreadsheets you might hit the limits of this pool. E.g. to be able to handle I/O of an array of around 50,000 cells I needed a memory pool size of 512 MB.<br />
<br />
The memory size can be increased by inserting a file called “java.opts” (without quotes) in the directory ./share/octave/packages/java-<version> (where the script file javaclasspath.m is located), containing just the following lines:<br />
<pre><nowiki><br />
-Xms16m<br />
-Xmx512m<br />
</nowiki></pre><br />
(where 16 = initial size, 512 = maximum size (in this example), m stands for Megabyte. This number is system-dependent).<br />
<br />
After processing a large chunk of spreadsheet information you might notice that octave's memory footprint does not shrink so it looks like Java's memory pool does not shrink back; but rest assured, the memory footprint is the allocated (reserved) memory size, not the actual used size. After the JVM has done its garbage collection, only the so-called “working set” of the memory allocation is really in use and that is a trimmed-down part of the memory allocation pool. On Windows systems it often suffices to minimize the octave terminal for a few seconds to get a more reasonable memory footprint.<br />
<br />
==== Reading cells containing errors ====<br />
Spreadsheet cells containing erroneous stuff are transferred to Octave as NaNs. But not all errors can be catched. Cells showing #Value# in OpenOffice.org Calc often contain invalid formulas but may have a 0 (null) value stored in the value fields. It is impossible to catch this as there is no run-time formula evaluator (yet) in ODF Toolkit nor jOpenDocument (like there is in Apache POI for Excel).<br />
<br />
Smaller gotcha's (only with jOpenDocument 1.2b2, fixed in 1.2b3+ and 1.2 final):<br />
* while reading, empty cells are sometimes not skipped but interpreted with numerical value 0 (zero).<br />
* a valid range MUST be specified, I haven't found a way to discover the actual occupied rows and columns (jOpenDocument can give the physical ones (= capacity) but that doesn't help).<br />
<br />
NOT fixed in version 1.2 final:<br />
* jOpenDocument doesn't set the so-called <office:value-type='string'> attribute in cells containing text; as a consequence ODF Toolkit will treat them as empty cells. OOo will read them OK.<br />
<br />
=== Matlab compatibility ===<br />
AFAIK there's no similar functionality in Matlab (yet?), only for reading and then very limited.<br />
odsread is fairly function-compatible to xlsread, however.<br />
<br />
Same goes for odswrite, odsfinfo and xlsfinfo – however odsfinfo has better functionality IMO.<br />
<br />
=== Comparison of interfaces ===<br />
The ODFtoolkit is the one that gives the best (but slow) results at present. However, parsing xml trees into rectangular arrays is not quite straightforward and the other way round is a real nightmare; odftoolkit up til 0.7.5. did little to hide the gory details for the developers.<br />
<br />
While reading ODS is still OK, writing implies checking whether cells already exist explicitly (in table:table-cells) or implicitly (in number-columns-repeated or number-rows-repeated nodes) or not at all yet in which case you'll need to add various types of parent nodes. Inserting new cells (“nodes”) or deleting nodes implies rebuilding possibly large parts of the tree in memory - nothing for the faint-of-heart. Only with ODFToolkit (odfdom) 0.8.6 and 0.8.7 things have been simplified for developers.<br />
<br />
The jOpenDocument interface is more promising, as it does shield the xml tree details and presents developers something which looks like a spreadsheet model.<br />
<br />
However, unfortunately the developers decided to shield essential methods by making them 'protected' (e.g. the vital getCellType). JopenDocument does support writing. But OTOH many obvious methods are still lacking and formula support is absent.<br />
And last (but not least) the jOpenDocument developers state that their development is primarily driven by requests from customers who pay for support. I do sympathize with this business model but for octave needs this may hamper progress for a while.<br />
<br />
The (still experimental) UNO interface, based on a Java/UNO bridge linking a hidden OpenOffice.org invocation to Octave, is the most promising:<br />
* admittedly OOo needs some tens of seconds to start for the first time, but once OOo is in the operating system's disk cache, it operates much faster than ODF or JOD;<br />
* it has built-in formula validator and evaluator;<br />
* it has a much more reliable data parser;<br />
* it can read much more spreadsheet formats than just ODS; .sxc (older OOo and StarOffice), but also .xls, .xlsx (Excel), .wk1 (Lotus 123), dbf, etc.<br />
* it consumes only a fraction of the JVM heap memory that the other Java ODS spreadsheet solutions need because OOo reads the spreadsheet in its own memory chunk in RAM. The other solutions read, expand, parse and manipulate all data in the JVM. In addition, OOo's code is outside the JVM (and Octave) while the ODF Toolkit and jOpenDocument classes also reside in the JVM. <br />
<br />
However, UNO is not stable yet (see below).<br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting ODS support are given here.<br />
Since April 2011 the function chk_spreadsheet_support() has been included in the io package. Calling it with arguments ('', 3) (empty string and debug level 3) will echo a lot of diagnostics to the screen. Large parts of the steps outlined below have been automated in this script.<br />
Problems with UNO are too complicated to treat them here; most of the troubleshooting has been implemented in chk_spreadsheet_support.m, only some general guidelines are given below.<br />
# Check if Java works. Do a pkg list and see<br />
## If there's a Java package mentioned (then it's installed). If not, install it.<br />
## If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild-auto java<br />
# Check Java memory settings. Try javamem<br />
## If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
## If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 # show MaxMemory in MiB.</pre><br />
## In case you have insufficient memory, see in [[#Gotchas]], [[#Java memory pool allocation size]], how to increase java's memory pre-reservation.<br />
# Check if all classes (.jarfiles) are in class path. Do a 'jcp = javaclasspath (-all)' (under unix/linux, do 'jcp = javaclasspath; strsplit (jcp,”:”)' (w/o quotes). See above under [[#Required support software]] what classes should be mentioned.<br />
## If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
## Once all classes are present and in the javaclasspath, the ods interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problems outside octave.<br />
# Try opening an ods file:<br />
## ods1 = odsopen ('test.ods', 1, 'otk'). If this works and ods1 is a struct with various fields containing objects, ODF toolkit interface (OTK) works. Do an ods1 = odsclose (ods1) to close the file.<br />
## ods2 = odsopen ('test.ods', 1, 'jod'). If this works and ods2 is a struct with various fields containing objects, jOpenDocument interface (JOD) works as well. Do ods2 = odsclose (ods2) to close the file.<br />
# For the UNO interface, at least version 1.2.8 of the Java package is needed plus the following Java class libs (jars) and directory:<br />
** unoil.jar (usually found in subdirectory Basis<version>/program/classes/ or the like of the OpenOffice.org (<OOo>) installation directory;<br />
** juh.jar, jurt.jar, unoloader.jar and ridl.jar, usually found in the subdirectory URE/share/java/ (or the like) of OOo's installation directory;<br />
** The subdirectory program/ (where soffice[.exe] (or ooffice) resides).<br />
** The exact case (URE or ure, Basis or basis), name ("Basis3.2" or just "basis") and subdirectory tree (URE/java or URE/share/java) varies across OOo versions and -clones, so chk_spreadsheet_support.m can have a hard time finding all needed classes. In particularly bad cases, when chk_spreadsheet_support cannot find them, you might need to add one or more of these these classes manually to the javaclasspath.<br />
<br />
=== Development ===<br />
As with the Excel r/w stuff, adding new interfaces should be easy and straightforward. Add relevant stanzas in odsopen, odsclose, odsfinfo & getusedrange and add new subfunctions (for the real work) to getusedrange_<INTF>, oct2ods and ods2oct.<br />
<br />
Suggestions for future development:<br />
* Reliable and easy ODS write support (maybe when jOpenDocument is more mature)<br />
* Speeding up (ODS is 10 X slower than e.g. OOXML !!!). jOpenDocument is much faster but still immature. UNO ''is'' MUCH faster than jOpenDocument but starting up OpenOffice.org for the first time can take tens of seconds... Note that UNO is still experimental. The issue is that odsclose() will simply kill ALL other OpenOffice.org invocations, also those that were not opened through Octave! This is related to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
* ''Passing function handle'' a la Matlab's xlsread<br />
* Adding styles (borders, cell lay-out, font, etc.)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent (''portable'').<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed).<br />
But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Use real databases for such data sets.<br />
<br />
=== ODFDOM versions ===<br />
I have tried various odfdom version. As to 0.8 & 0.8.5, while the API has been simplified enormously (finally one can address cells by spreadsheet address rather than find out yourself by parsing the table-column/-row/-cell structure), many irrecoverable bugs have been introduced :-((<br />
In addition processing ODS files became significantly slower (up to 7 times!).<br />
<br />
End of August 2010 I have implemented support for odfdom-0.8.6.jar – that version is at last sufficiently reliable to use. The few remaining bugs and limitations could easily be worked around by diving in the older TableTable API. Later on (early 2011) version 0.8.7 has been tested too - this needed a few adjustments; clearly the odfdom API (currently at main version 0) is not stable yet.<br />
So at the moment (May 2011 = last I looked) only odfdom versions 0.7.5, 0.8.6 and 0.8.7 are supported.<br />
<br />
If you want to experiment with odfdom 0.8 & 0.8.5, you can try:<br />
* odsopen.m (revision 7157)<br />
* ods2oct.m (revision 7158)<br />
* oct2ods.m (revision 7159)<br />
<br />
== XLS support ==<br />
=== Files content ===<br />
* xlsread.m''' &mdash; All-in-one function for reading data from one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlswrite.m''' &mdash; All-in-one function for writing data to one specific worksheet in an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsfinfo.m''' &mdash; All-in-one function for exploring basic properties of an Excel spreadsheet file. This script has Matlab-compatible functionality.<br />
* '''xlsopen.m''' &mdash; Function for "opening" (= providing a handle to) an Excel spreadsheet file ("workbook"). This function sorts out which interface to use for .xls access (i.e.,COM; Java & Apache POI; JexcelAPI; OpenXLS; etc.), but it's choice can be overridden.<br />
* '''xls2oct.m''' &mdash; Function for reading data from a specific worksheet pointed to in a struct created by xlsopen.m. xls2oct can be called multiple times consecutively using the same pointer struct, each time allowing to read data from different ranges and/or worksheets. Data are returned in the form of a 2D heterogeneous cell array that can be parsed by parsecell.m. xls2oct is a mere wrapper for interface-dependent scripts that do the actual low-level reading.<br />
* '''oct2xls.m''' &mdash; Function for writing data to a specific worksheet pointed to in a struct created by xlsopen.m. octxls can be called multiple times consecutively using the same pointer struct, each time allowing to write data to different ranges and/or worksheets. oct2xls is a mere wrapper for interface-dependent scripts that do the actual low-level writing.<br />
* '''xlsclose.m''' &mdash; Function for closing (the handle to) an Excel workbook. When data have been written to the workbook oct2xls will write the workbook to disk. Otherwise, the file pointer is simply closed and possibly used interfaces for Excel access (COM/ActiveX/Excel.exe) will be shut down properly.<br />
* '''parsecell.m''' &mdash; Function for separating the data in raw arrays returned by xls2oct, into numerical/logical and text (cell) arrays.<br />
* '''chk_spreadsheet_support.m''' &mdash; Internal function for (1) checking, (2) setting up, (3) debugging spreadsheet support. While not specifically meant for direct invocation from the Octave prompt (it is more useful during initialization of Octave itself) it can be very helpful when hunting down issues with spreadsheet support in Octave.<br />
* '''spsh_chkrange.m''', '''spsh_prstype.m''', '''getusedrange.m''', '''calccelladdress.m''', '''parse_sp_range.m''' &mdash; Support files called by the scripts and not meant for direct invocation by users. <br />
<br />
=== Required support software ===<br />
For the Excel/COM interface:<br />
* A windows computer with Excel installed<br />
* Octave-forge Windows-1.0.8 or later package WITH LATEST SVN PATCHES APPLIED<br />
<br />
For the Java / Apache POI / JExcelAPI interfaces (general):<br />
* octave-forge java-1.2.8 package or later version on Linux<br />
* octave-forge java-1.2.8 with latest svn fixes on Windows/MingW<br />
* Java jre or jdk > 1.6.0 (hasn't been tested with earlier versions)<br />
<br />
Apache POI specific:<br />
* class .jars: '''poi-3.5-FINAL-<date>.jar & poi-ooxml-3.5-FINAL-<date>.jar''' (or later versions) in classpath<br />
** Get it here: http://poi.apache.org/download.html<br />
* for OOXML support (only available with Apache POI): '''poi-ooxml-schemas-<version>.jar''', '''xbean.jar''', '''dom4j-1.6.1.jar''' in javaclasspath.<br />
** Get them here: http://poi.apache.org/download.html ("xmlbeans" and poi-ooxml-schemas) or http://sourceforge.net/projects/dom4j/files (dom4j-<version>)<br />
<br />
JExcelAPI specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/jexcelapi/files/<br />
<br />
OpenXLS specific:<br />
* class .jar: jxl.jar in classpath<br />
** Get it here: http://sourceforge.net/projects/openxls/<br />
<br />
These class libs must be referenced with full pathnames in your javaclasspath.<br />
<br />
They had best be put in /<libdir>/java where <libdir> on Linux is usually /usr/lib; on MinGW it is usually /lib. The PKG_ADD command expects the class libs there; if they are elsewhere, add them in ./share/octave/<version>/m/startup/octaverc using appropriate javaaddpath statements or a chk_spreadsheet_support() call.<br />
<br />
UNO specific (invoking OpenOffice.org (or clones) behind the scenes):<br />
<br />
NOTE: EXPERIMENTAL!! A working OpenOffice.org installation. The utility function chk_spreadsheet_support can be used to add the needed entries to the javaclasspath. <br />
<br />
=== Usage ===<br />
'''xlsread''' and '''xlswrite''' are mere wrappers for '''xlsopen-xls2oct-xlsclose-parsecell''' and '''xlsopen-oct2xls-xlsclose''' sequences, resp. They exist for the sake of Matlab compatibility.<br />
<br />
'''xlsfinfo''' can be used for finding out what worksheet names exist in the file. For OOXML files you either need MS-Excel 2007 for Windows (or later version) installed, and/or the input parameter REQINTF should be specified with a value of 'poi' (case-insensitive) and -obviously- the complete POI interface must have been installed.<br />
<br />
Invoking '''xlsopen'''/..../'''xlsclose''' directly provides for much more flexibility, speed, and robustness than '''xlsread''' / '''xlswrite'''. Indeed, using the same file handle (pointer struct) you can mix reading & writing before writing the workbook out to disk using xlsclose.<br />
<br />
And: xlsopen / xlsclose hide the gory interface details from the user.<br />
<br />
Currently only .xls files (BIFF8) can be read/written; using JExcelAPI BIFF5 can be read as well. For OOXML files either Excel 2007 for Windows (or higher) and/or the complete Apache POI interface must be installed (and probably the REQINTF parameter specified with a value of 'poi').<br />
<br />
When using '''xlsopen'''/.../'''xlsclose''' be sure to keep track of the file handle struct.<br />
<br />
A possible scenario:<br />
<br />
xlh = xlsopen (<excel_filename> , [rw], [<requested interface>])<br />
# Set rw to 1 if you want to write to a workbook immediately.<br />
# In that case the check for file existence is skipped and<br />
# -if needed- a new workbook created.<br />
# If you really want an other interface than auto-selected<br />
# by xlsopen you can request that. But xlsopen still checks<br />
# proper support for your choice.<br />
<br />
# Read some data<br />
[ rawarr1, xlh ] = xls2oct (xlh, <SomeWorksheet>, <Range>)<br />
# Be sure to specify xlh as output argument as xls2oct keeps<br />
# track of changes and the need to write the workbook to disk<br />
# in the xlhstruct. And the origin range is conveyed through<br />
# the xlh pointer struct.<br />
<br />
# Separate data into numeric and text data<br />
[ numarr1, txtarr1, lim1 ] = parsecell (rawarr1)<br />
<br />
# Get more data from another worksheet in the same workbook<br />
[ rawarr2, xlh ] = xls2oct (xlh, <SomeOtherWorksheet>, <Range>)<br />
[ numarr2, txtarr2, lim2 ] = parsecell (rawarr2)<br />
<br />
# <... Analysis and preparation of new data in cell array Newdata....><br />
<br />
# Add new data to spreadsheet<br />
xlh = oct2xls (Newdata, xlh, <AnotherWorksheet>, <Range>)<br />
<br />
# Close the workbook and write it to disk; then clear the handle<br />
xlh = xlsclose (xlh)<br />
clear xlh<br />
<br />
When not using the COM interface, specify a value of 'POI' for parameter REQINTF when accessing OOXML files in xlsread, xlswrite, xlsopen, xlsfinfo (and be sure the complete Apache POI interface is installed). If you haven't got ActiveX installed (i.e., not having MS-Excel under Windows) specifying 'POI' may not be needed as in such cases Apache POI is the next default interface.<br />
<br />
When using JExcelAPI (JXL), after writing into a worksheet you MUST save the file – adding data to the same or another worksheet is no more possible after the first call to oct2xls(). This is a limitation of JExcelAPI. <br />
<br />
<br />
=== Spreadsheet formula support ===<br />
When using the POI and JXL interfaces you can:<br />
* (When reading, xls2oct) either read spreadsheet formula results (like in COM interface), or the literal formula text strings;<br />
* (When writing, oct2xls) either enter formulas in the worksheet as formulas, or enter them as literal text strings. The former is also like in COM.<br />
<br />
In short, you can enter spreadsheet formulas and in a later stage read them back, change them and re-enter them in the worksheet. <br />
<br />
The behaviour is controlled by an option structure options which for now has only one (logical) field:<br />
* options.formulas_as_text = 0 (the default) implies enter formulas as formulas and read back formula results<br />
* options.formulas_as_text =1 (or any positive integer) means enter formulas as text strings and read them back as text strings.<br />
<br />
Be aware that there's no formula evaluator in JExcelAPI (JXL). So if you create formulas in your spreadsheet using oct2xls or xlswrite with 'JXL', do not expect meaningful results when reading those files later on ''unless'' you open them in Excel and write them back to disk.<br />
<br />
While both Apache POI and JExcelAPI feature a formula validator, not all spreadsheet functions present in Excel have been implemented (yet).<br />
Worse, older Excel versions feature less functions than newer versions. So be wary as this may make for interesting confusion. <br />
<br />
=== Matlab compatibility ===<br />
<br />
'''xlsread''', '''xlswrite''' and '''xlsfinfo''' are for the most part Matlab-compatible. Some small differences are mentioned below. When using the Java interfaces octave supplies some formula manipulation support.<br />
<br />
* xlsread<br />
** Matlab's xlsread supports invoking extra functions while reading ("passing function handle"); octave not. But this can be simulated outside xlsread.<br />
** Matlab's xlsread flags some spreadsheet errors, octave-forge just returns blank cells.<br />
** Octave-forge returns info about the actual (rather than the requested) cell range where the data came from. Personally I find it very useful to know from what part of a worksheet the data originate so I've put quite some effort in it :-) Matlab can't, due to Excel automatically trimming returned arrays from empty outer columns and rows. Octave is more clever but the Visual Basic call used for determining the actually used range has some limitations:<br />
**# it relies on cached range values and thus may be out-of-date;<br />
**# it counts empty formatted cells too. When using ActiveX/COM, if octave's xlsfinfo.m returns wrong data ranges it is most often an overestimation.<br />
*:Matlab's xlsread ignores all non-numeric data values outside the smallest rectangle encompassing all numerical values. Octave's xlsread doesn't. This means that Matlab ignores all row/column headers, not very user-friendly IMO.<br />
** When using the Java interface, reading and writing xls-files by octave-forge is platform-independent. On systems w/o installed Excel, Matlab can only read Excel 95 formatted .xls files (written using ML xlswrite's 'Basic" option) – and then differently than under Windows.....<br />
** Matlab's xlsread returns strings for cells containing date values. This makes for endless if-then-elseif-else-end constructs to catch all expected date formates. Octave returns numerical data (where 0 = 1/1/1900 – you can easily transfer them into proper octave date values yourself using e.g. datestr(), see bottom of this document for more info).<br />
** Matlab's xlsread invokes csvread if no Excel interface is present. Octave-forge's xlsread doesn't.<br />
<br />
* xlswrite<br />
** Octave-forge's xlswrite works on systems w/o Excel support, Matlab's doesn't (properly).<br />
**When specifying a sheet number larger than the number of existing sheets in an .xls file, Matlab's xlswrite adds empty sheets until the new sheet number is created; Octave's xlswrite only adds one sheet called "Sheet<number>" where <number> is the specified sheet number.<br />
** Even better (IMO) while M's xlswrite always creates Sheet1/Sheet2/Sheet3 when creating a new spreadsheet, octave's xlswrite only creates the requested worksheet. (Did you know that you can instruct Excel to create spreadsheets with just one, or any number of, worksheets? Look in Tools | Options, General tab.)<br />
** Oh and octave doesn't touch the "active sheet" - but that's not automatically an advantage.<br />
** If the specified write range is larger than the actual data array, Matlab's xlswrite adds #N/A cells to fill up the lowermost rows and rightmost columns; octave-forge's xlswrite doesn't.<br />
<br />
* xlsfinfo<br />
** When invoking Excel/COM interface, octave-forge's xlsfinfo also echoes the type of sheet (worksheet, chart), not just the sheet names. Using Java I haven't found similar functionality (yet). <br />
<br />
=== Comparison of interfaces & usage ===<br />
Using Excel itself (through '''COM''' / '''ActiveX''' on Windows systems) is probably the most robust and versatile and especially FAST option. There's one gotcha: in case of some type of COM errors Excel will keep running invisibly; you can only end it through Task Manager. A tiny problem is that one cannot find out easily through COM what file types are supported; xls, wks, wk1, xlsx, etc. Another -obvious- limitation is that COM Excel access only works on Windows systems where Excel is installed.<br />
<br />
'''JExcelAPI''' (Java-based and therefore platform-independent) is proven technology but switching between reading and writing is quite involved and memory-hungry when processing large spreadsheets. As the docs state, JExcelAPI is optimized for reading and it does do that well - but still slower than Excel/COM. The fact that upon a switch from reading to writing the existing spreadsheet is overwritten in place by a blank one and that you can only get the contents back wen writing out all of the changes is worrying - and any change after the first write() is lost as a next write() doesn't seem to work, worse yet, you may completely loose the spreadsheet in question. The first is by JExcelAPI design, the second is probably a bug (in octave-forge/Java or JExcelAPI ? I don't know). Adding data to existing spreadsheets does work, but IMO undue user confidence is needed. JExcelAPI supports BIFF5 (only reading) and BIFF8 (Excel 95 and Excel 97-2003, respectively). Upon overwriting, BIFF5 spreadsheets are converted silently to BIFF8. JexcelAPI, unlike ApachePOI, doesn't evaluate functions while reading but instead relies on cached results (i.e. results computed by Excel itself). Depending on Excel settings ("Automatic calculation" ON or OFF) this may or may not yield incorrect (or expected) results.<br />
<br />
'''Apache POI''' (Java-based and platform-independent too) is based on the OpenOffice.org I/O Excel r/w routines. It is a more versatile than JExcelAPI, while it doesn't support BIFF5 it does support BIFF8 (Excel 97 – 2003) and OOXML (Excel 2007). It is slower than native JXL let alone Excel & COM but it features active formula evaluation, although at the moment (v. 3.7) not all Excel functions have been implemented. I've made the relevant subfunction (xls2jpoi2oct) fall back to cached formula results (and yield a suitable warning) for non-implemented Excel functions while reading Excel files.<br />
<br />
'''OpenXLS''' (an open source version of Extentech's commercial Java-xls product) is still experimental. It seems to work faster than JExcelAPI, but it has other issues - i.e., it locks the .xls file and the unlocking mechanism is a bit wonky. Sometimes xls files keep being locked until Octave is shut down. Currently OXS write support is disabled (but the code is there).<br />
<br />
'''UNO''' (invoking OpenOffice.org or clones behind the scenes, a la ActiveX) is experimental. It works FAST (i.e., once OOo itself is loaded which can take some time) and can process much larger spreadsheets than the other Java-based interfaces because the data are not entered in the JVM but in OOo's memory. A big stumbling block is that odsclose() on a UNO xls struct will kill ALL OpenOffice.org invocations, also those that were not related to Octave! This is due to UNO-Java limitations. The underlying issue is that when Octave starts an OpenOffice.org invocation, OpenOffice.org must be closed for Octave to be able to exit; otherwise Octave will wait for OOo to shut down before it can terminate itself. So Octave must kill OOo to be able to terminate. A way out hasn't been found yet.<br />
<br />
All in all, of the three Java options I'd prefer Apache POI rather than OpenXLS or JexcelAPI. But the latter is indispensable for BIFF5 formats. Once UNO is stable it is to be preferred as it can read ALL file formats supported by OOo (viz. wk1, ods, xlsx, sxc, ...)<br />
<br />
Some notes on the choice for Java:<br />
# It saves a LOT of development time to use ready-baked Java classes rather than developing your own routines and thus effectively reinvent the wheel.<br />
# A BIG advantage is that a Java-based solution is platform-independent ("portable").<br />
# But Java is known to be not very conservative with resources, especially not when processing XML-based formats.<br />
<br />
So Java is a compromise between portability and rapid development time versus capacity (and speed). But IMO data sets larger than 5.105 cells should not be kept in spreadsheets anyway. Better use real databases for such data sets. <br />
<br />
=== Troubleshooting ===<br />
Some hints for troubleshooting Excel support are contained in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=4C61B649.9090802%40hccnet.nl&forum_name=octave-dev dated August 10, 2010. A more structured approach is below.<br />
<br />
Since April 2011 a special purpose setup file has been included in the io package (chk_spreadsheet_support.m). Large parts of the approach below (starting at Step 2) have been automated in this script. When running it with the second input argument (debug level) set to 3 a lot of useful diagnostic output will be printed to screen.<br />
#Check if COM / ActiveXworks (only under Windows OS). Do a ''pkg list'' and see:<br />
##If there's a windows package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the windows package line (then the package is loaded). If not, do a pkg load windows<br />
#Check if the ActiveX server works. Do:<br />
#:exl = actxserver ('Excel.Application') ## Note the period between 'Excel' and 'Application'<br />
##If a COM object is returned, ActiveX / COM / Excel works. Do:<br />
##:<pre>exl.Quit(); delete (exl) ## to shut down the (hidden) Excel invocation.</pre><br />
##If you get an error message, your last resort is re-installing the windows package, or trying the Java-based interfaces.<br />
#Check if java works. Do a ''pkg list'' and see:<br />
##If there's a java package mentioned (then it's installed). If not, install it.<br />
##If there's an asterisk on the java package line (then the package is loaded). If not, do a pkg rebuild -auto java<br />
#Check Java memory settings. Try<br />
#:<pre>javamem</pre><br />
##If it works, check if it reports sufficiently large max memory (had better be 200 MiB, the bigger the better)<br />
##If it doesn't work, do:<br />
##:<pre><br />
##:rt = java_invoke ('java.lang.Runtime', 'getRuntime')<br />
##:rt.gc<br />
##:rt.maxMemory ().doubleValue () / 1024 / 1024 ## show MaxMemory in MiB.</pre><br />
##In case you have insufficient memory, see in "GOTCHAS", "Java memory pool allocation size", how to increase java's memory pre-reservation.<br />
#Check if all classes (.jarfiles) are in class path. Do a 'javaclasspath' (under unix/linux, do 'tmp = javaclasspath; strsplit (tmp,":")' (w/o quotes). See above under "REQUIRED SUPPORT SOFTWARE" what classes should be mentioned.<br />
** If classes (.jar files) are missing, download and put them somewhere and add them to the javaclass path with their fully qualified pathname (in quotes) using javaaddpath().<br />
** Once all classes are present and in the javaclasspath, the xls interfaces should just work. The only remaining showstoppers are insufficient write privileges for the working directory, a wrecked up octave or some other problem outside octave.<br />
#Try opening an xls file: <br />
#: xls1 = xlsopen ('test.xls', 1, 'poi'). If this works and xls1 is a struct with various fields containing objects, the Apache POI interface (POI) works. Do an xls1 = xlsclose (xls1) to close the file.<br />
#: xls2 = xlsopen ('test.xls', 1, 'jxl'). If this works and xls2 is a struct with various fields containing objects, the JExcelAPI interface (JXL) works as well. Don't forget to do xls2 = xlsclose (xls2) to close the file.<br />
<br />
=== Development ===<br />
xlsopen/xlsclose and friends have been written so that adding other interfaces (Perl? native octave? ...?) should be very easily accomplished. Xlsopen.m merely needs two stanzas, xlsfinfo.m and getusedrange.m each need an additional elseif stanza, and xlsclose.m needs a small stanza for closing the pointer struct and writing to disk. The real work lies in creating the relevant xls2<...>2oct & oct2<...>2xls & <getusedrange_...> subfunction scripts in xls2oct.m, oct2xls.m and getusedrange.m, resp., but that shouldn't be really hard, depending on the interface support libraries' quality and documentation. Separating the file access functions and the actual reading/writing from/to the workbook in memory has made developer's life (I mean: my time developing this stuff) much easier.<br />
<br />
Some other options for development (who?):<br />
*Speeding up, especially Java worksheet/cell access. For cracks, not me.<br />
*Automatic conversion of Excel date/time values into octave ones and vice versa (adding or subtracting 636960). But then again Excel's dates are 01-01-1900 based (octave's 0-0-0000) and buggy (Excel thinks 1900 is a leap year), and I sometimes have to use dates from before 1900. Maybe as an option?<br />
*Creating Excel graphs (a significant enterprise to write from scratch).<br />
*Support for "passing function handle" in xlsread.<br />
<br />
[[Category:OctaveForge]]<br />
[[Category:Packages]]</div>83.163.225.168