Difference between revisions of "Dataframe package"

From Octave
Jump to navigation Jump to search
m
Line 12: Line 12:
  
 
A simple example:
 
A simple example:
 +
truc={"Id", "Name", "Type";1, "onestring", "bla"; 2, "somestring", "foobar";}
 +
truc =
 +
{
 +
  [1,1] = Id
 +
  [2,1] =  1
 +
  [3,1] =  2
 +
  [1,2] = Name
 +
  [2,2] = onestring
 +
  [3,2] = somestring
 +
  [1,3] = Type
 +
  [2,3] = bla
 +
  [3,3] = foobar
 +
}
 +
>> tt=dataframe(truc)
 +
tt = dataframe with 2 rows and 3 columns
 +
_1    Id      Name  Type
 +
Nr double      char  char
 +
  1      1  onestring    bla
 +
  2      2 somestring foobar
 +
 +
The first cell line is intended to contain column names; the rest is column content. The type is automatically inferred from the cell content. Now let us select one column by its name:
 +
>> tt(:, 'Name')
 +
ans = dataframe with 2 rows and 1 columns
 +
_1      Name
 +
Nr      char
 +
1  onestring
 +
2 somestring
 +
 +
In this case, a sub-dataframe is returned. Struct-like indexing is also implemented:
 +
>> tt.Id
 +
ans =
 +
  1
 +
  2
 +
When the output is a vector and can be simplified to something simple ... it is.

Revision as of 09:19, 27 February 2015

Dataframe, Data manipulation toolbox similar to R data.frame

At an mature development stage. hg

  • Maintainer: Pascal Dupuis
  • Contributors:

This package permits to handle complex (both in the sense of complex numbers and high complexity) data as if they were ordinary arrays, except that each column MAY possess a different type. It also complete a fairly complete interface to CSV files, permitting to cope with a number of oddities, like f.i. CSV files starting with a header spread over a few lines. The resulting array tries as far as it can to mimick an array, in such a way that binary operators and usual functions will work as expected.

Meta-information is also handled. Rows and columns may have a name, and this name is searchable. If for whatever reason the ordering of a CSV file changes, searching by column names will return the expected information.

A simple example:

truc={"Id", "Name", "Type";1, "onestring", "bla"; 2, "somestring", "foobar";}
truc =
{
  [1,1] = Id
  [2,1] =  1
  [3,1] =  2
  [1,2] = Name
  [2,2] = onestring
  [3,2] = somestring
  [1,3] = Type
  [2,3] = bla
  [3,3] = foobar
}
>> tt=dataframe(truc)
tt = dataframe with 2 rows and 3 columns
_1     Id       Name   Type
Nr double       char   char
 1      1  onestring    bla
 2      2 somestring foobar

The first cell line is intended to contain column names; the rest is column content. The type is automatically inferred from the cell content. Now let us select one column by its name:

>> tt(:, 'Name')
ans = dataframe with 2 rows and 1 columns
_1       Name
Nr       char
1  onestring
2 somestring

In this case, a sub-dataframe is returned. Struct-like indexing is also implemented:

>> tt.Id
ans =
  1
  2

When the output is a vector and can be simplified to something simple ... it is.