Dataframe package: Difference between revisions

Dataframe package (edit)

Revision as of 10:12, 14 June 2019

5,118 bytes added , 14 June 2019

Overhaul introduction. Add subsections.

Siko1056

Bureaucrats, Administrators

1,847

edits

@@ Line 1: / Line 1: @@
-Dataframe, Data manipulation toolbox similar to R data.frame
+The {{Forge|dataframe}} package is part of the [[Octave Forge]] project. It is a data manipulation toolbox similar to R data.frame and is maintained by Pascal Dupuis.
-At an mature development stage. [http://octave.svn.sourceforge.net/viewvc/octave/trunk/octave-forge/extra/dicom/ octave-forge svn]
+== Introduction ==
-*Maintainer: Pascal Dupuis [http://sourceforge.net/sendmessage.php?touser=1760416contact]
-*Contributors:
-*Package: http://octave.sourceforge.net/dataframe/
+This package permits to handle complex (both in the sense of complex numbers and high complexity) data as if they were ordinary arrays, except that each column MAY possess a different type. It also provides a fairly complete interface to CSV files, permitting to cope with a number of oddities, like e.g., CSV files starting with a header spread over a few lines. The resulting array tries as far as it can to mimic an array in such a way that binary operators and usual functions will work as expected.
-Auto-generated docs from the current package: {{Forge|dataframe}}
+Meta-information is also handled. Rows and columns may have a name, and this name is searchable. If for whatever reason the ordering of a CSV file changes, searching by column names will return the expected information.
+== Example ==
+To get a first taste, let's load the test csv file coming with the package:
+  >> experiment = dataframe('data_test.csv')
+  warning: load: '/home/padupuis/matlab/dataframe/inst/data_test.csv' found by searching load path
+  warning: fopen: '/home/padupuis/matlab/dataframe/inst/data_test.csv' found by searching load path
+  ans = dataframe with 10 rows and 7 columns
+  Src: data_test.csv
+  Comment: #notice there is a extra separator
+  Comment: # a comment line and an empty one
+  Comment: # the next lines use \r\n \r and \f as linefeed
+  Comment: # one empty input field
+  _1  DataName   VBIAS   Freq   x_IBIAS_          C       GOUT  OK_
+  Nr      char  double double     double     double     double char
+DataValue -6.0000 300000 1.6272e-11 7.0215e-13 1.6044e-07    A
+DataValue -5.8000 300000 1.5990e-11 6.9607e-13 1.5728e-07    E
+DataValue -5.6000 300000 1.3790e-11 6.9048e-13 1.5489e-07    !
+DataValue -5.4000 300000 1.4420e-11 6.8517e-13 1.5478e-07    ?
+DataValue -5.2000 300000 1.2930e-11 6.7965e-13 1.5189e-07    C
+DataValue -5.0000 300000 1.2610e-11 6.7444e-13 1.4931e-07    B
+DataValue -4.8000 300000 1.4390e-11 6.7011e-13 1.4876e-07    A
+DataValue -4.6000 300000 1.0890e-11 6.6416e-13 1.4890e-07    3
+DataValue -4.4000 300000         NA 6.5859e-13 1.4558e-07    C
+DataValue -4.2000 300000 1.0610e-11 6.5355e-13 1.4431e-07    B
+Those data were produced while performing a voltage sweep on a sensor, measuring with an impedance bridge
+the parallel capacitor and conductance at a given frequency.
+The first lines contain few meta-information: name of the source file and a few comments found in the
+csv file. The purpose is to annotate the results.
+Then we have the content. Each column starts with a name, then a type. Next we find the content lines, each
+of them with an index. Then we find the content; control values (polarization voltage, applied frequency),
+then measured values: DC current, capacitor, conductance. The last column is categorical: the user introduced
+some code telling if the result makes senses or not.
+Let us now select the control values:
+  cv = experiment(1:3, ["Vbias"; "Freq"])
+  cv = dataframe with 3 rows and 1 columns
+  Src: data_test.csv
+  Comment: #notice there is a extra separator
+  Comment: # a comment line and an empty one
+  Comment: # the next lines use \r\n \r and \f as linefeed
+  Comment: # one empty input field
+  _1   Freq
+  Nr double
+300000
+300000
+300000
+The selection occurred on a range for the lines, by names on the column. The search criteria is here a
+string array. All columns whose name match are returned.
+The result is returned as a dataframe. This can be changed:
+ >> experiment.array(6, "OK_")
+ ans = B
+ >> class(ans)
+ ans = char
+When selecting vectors, this transformation in array is automatic. The DC current is contained in elements
+to 40 (fourth column):
+  >> experiment31:40)
+ ans =
+ Columns 1 through 9:
+.6272e-11   1.5990e-11   1.3790e-11   1.4420e-11   1.2930e-11   1.2610e-11   1.4390e-11   1.0890e-11           NA
+ Column 10:
+.0610e-11
+Note that the access 'experiment("x_IBIAS")' is illegal: does it refer to row or column names ?
+;Accessing in this pseudo-structure way is valid in the following cases:
+;choosing the output format: array, cell, dataframe (may be abbreviated as 'df')
+;attribute selection: rownames, colnames, rowcnt, colcnt, rowidx, types, source, header, comment
+;constructor call: new (no other deferencing may occur
+;column selection: just provide one valid column name
+To be similar to R implementation, constructs such as x.as.array are also allowed.
+A simple example:
+ truc={"Id", "Name", "Type";1, "onestring", "bla"; 2, "somestring", "foobar";}
+ truc =
+ {
+   [1,1] = Id
+   [2,1] =  1
+   [3,1] =  2
+   [1,2] = Name
+   [2,2] = onestring
+   [3,2] = somestring
+   [1,3] = Type
+   [2,3] = bla
+   [3,3] = foobar
+ }
+ >> tt=dataframe(truc)
+ tt = dataframe with 2 rows and 3 columns
+ _1     Id       Name   Type
+ Nr double       char   char
+      1  onestring    bla
+      2 somestring foobar
+The first cell line is intended to contain column names; the rest is column content. The type is automatically inferred from the cell content. Now let us select one column by its name:
+ >> tt(:, 'Name')
+ ans = dataframe with 2 rows and 1 columns
+ _1       Name
+ Nr       char
+  onestring
+somestring
+In this case, a sub-dataframe is returned. Struct-like indexing is also implemented:
+ >> tt.Id
+ ans =
+When the output is a vector and can be simplified to something simple ... it is.
+[[Category:Octave Forge]]

Dataframe package: Difference between revisions

Dataframe package (edit)

Revision as of 10:12, 14 June 2019

Navigation menu

Search