Latest revision |
Your text |
Line 1: |
Line 1: |
| The {{Forge|dataframe}} package is part of the [[Octave Forge]] project. It is a data manipulation toolbox similar to R data.frame and is maintained by Pascal Dupuis.
| | Dataframe, Data manipulation toolbox similar to R data.frame |
|
| |
|
| == Introduction ==
| | At an mature development stage. [http://hg.code.sf.net/p/octave/dataframe hg] |
| | *Maintainer: Pascal Dupuis |
| | *Contributors: |
|
| |
|
| This package permits to handle complex (both in the sense of complex numbers and high complexity) data as if they were ordinary arrays, except that each column MAY possess a different type. It also provides a fairly complete interface to CSV files, permitting to cope with a number of oddities, like e.g., CSV files starting with a header spread over a few lines. The resulting array tries as far as it can to mimic an array in such a way that binary operators and usual functions will work as expected. | | *Package: [http://octave.sourceforge.net/dataframe/ dataframe] |
| | |
| | This package permits to handle complex (both in the sense of complex numbers and high complexity) data as if they were ordinary arrays, except that each column MAY possess a different type. It also complete a fairly complete interface to CSV files, permitting to cope with a number of oddities, like f.i. CSV files starting with a header spread over a few lines. The resulting array tries as far as it can to mimick an array, in such a way that binary operators and usual functions will work as expected. |
|
| |
|
| Meta-information is also handled. Rows and columns may have a name, and this name is searchable. If for whatever reason the ordering of a CSV file changes, searching by column names will return the expected information. | | Meta-information is also handled. Rows and columns may have a name, and this name is searchable. If for whatever reason the ordering of a CSV file changes, searching by column names will return the expected information. |
|
| |
| == Example ==
| |
|
| |
| To get a first taste, let's load the test csv file coming with the package:
| |
| >> experiment = dataframe('data_test.csv')
| |
| warning: load: '/home/padupuis/matlab/dataframe/inst/data_test.csv' found by searching load path
| |
| warning: fopen: '/home/padupuis/matlab/dataframe/inst/data_test.csv' found by searching load path
| |
| ans = dataframe with 10 rows and 7 columns
| |
| Src: data_test.csv
| |
| Comment: #notice there is a extra separator
| |
| Comment: # a comment line and an empty one
| |
| Comment: # the next lines use \r\n \r and \f as linefeed
| |
| Comment: # one empty input field
| |
| _1 DataName VBIAS Freq x_IBIAS_ C GOUT OK_
| |
| Nr char double double double double double char
| |
| 1 DataValue -6.0000 300000 1.6272e-11 7.0215e-13 1.6044e-07 A
| |
| 2 DataValue -5.8000 300000 1.5990e-11 6.9607e-13 1.5728e-07 E
| |
| 3 DataValue -5.6000 300000 1.3790e-11 6.9048e-13 1.5489e-07 !
| |
| 4 DataValue -5.4000 300000 1.4420e-11 6.8517e-13 1.5478e-07 ?
| |
| 5 DataValue -5.2000 300000 1.2930e-11 6.7965e-13 1.5189e-07 C
| |
| 6 DataValue -5.0000 300000 1.2610e-11 6.7444e-13 1.4931e-07 B
| |
| 7 DataValue -4.8000 300000 1.4390e-11 6.7011e-13 1.4876e-07 A
| |
| 8 DataValue -4.6000 300000 1.0890e-11 6.6416e-13 1.4890e-07 3
| |
| 9 DataValue -4.4000 300000 NA 6.5859e-13 1.4558e-07 C
| |
| 10 DataValue -4.2000 300000 1.0610e-11 6.5355e-13 1.4431e-07 B
| |
|
| |
| Those data were produced while performing a voltage sweep on a sensor, measuring with an impedance bridge
| |
| the parallel capacitor and conductance at a given frequency.
| |
|
| |
| The first lines contain few meta-information: name of the source file and a few comments found in the
| |
| csv file. The purpose is to annotate the results.
| |
|
| |
| Then we have the content. Each column starts with a name, then a type. Next we find the content lines, each
| |
| of them with an index. Then we find the content; control values (polarization voltage, applied frequency),
| |
| then measured values: DC current, capacitor, conductance. The last column is categorical: the user introduced
| |
| some code telling if the result makes senses or not.
| |
|
| |
| Let us now select the control values:
| |
| cv = experiment(1:3, ["Vbias"; "Freq"])
| |
| cv = dataframe with 3 rows and 1 columns
| |
| Src: data_test.csv
| |
| Comment: #notice there is a extra separator
| |
| Comment: # a comment line and an empty one
| |
| Comment: # the next lines use \r\n \r and \f as linefeed
| |
| Comment: # one empty input field
| |
| _1 Freq
| |
| Nr double
| |
| 1 300000
| |
| 2 300000
| |
| 3 300000
| |
|
| |
| The selection occurred on a range for the lines, by names on the column. The search criteria is here a
| |
| string array. All columns whose name match are returned.
| |
|
| |
| The result is returned as a dataframe. This can be changed:
| |
| >> experiment.array(6, "OK_")
| |
| ans = B
| |
| >> class(ans)
| |
| ans = char
| |
|
| |
| When selecting vectors, this transformation in array is automatic. The DC current is contained in elements
| |
| 31 to 40 (fourth column):
| |
| >> experiment(31:40)
| |
| ans =
| |
| Columns 1 through 9:
| |
| 1.6272e-11 1.5990e-11 1.3790e-11 1.4420e-11 1.2930e-11 1.2610e-11 1.4390e-11 1.0890e-11 NA
| |
| Column 10:
| |
| 1.0610e-11
| |
| Note that the access 'experiment("x_IBIAS")' is illegal: does it refer to row or column names ?
| |
|
| |
| ;Accessing in this pseudo-structure way is valid in the following cases:
| |
| ;choosing the output format: array, cell, dataframe (may be abbreviated as 'df')
| |
| ;attribute selection: rownames, colnames, rowcnt, colcnt, rowidx, types, source, header, comment
| |
| ;constructor call: new (no other deferencing may occur
| |
| ;column selection: just provide one valid column name
| |
| To be similar to R implementation, constructs such as x.as.array are also allowed.
| |
|
| |
|
| A simple example: | | A simple example: |
Line 118: |
Line 46: |
| 2 | | 2 |
| When the output is a vector and can be simplified to something simple ... it is. | | When the output is a vector and can be simplified to something simple ... it is. |
|
| |
| [[Category:Octave Forge]]
| |