Statistics package: Difference between revisions
Pr0m1th3as (talk | contribs) |
(→TODO list: remove procrustes from missing fn list) |
||
(40 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
<code>pkg install -forge statistics</code> | <code>pkg install -forge statistics</code> | ||
The following sections provide an overview of the functions available in the statistics package sorted alphabetically and arranged in groups similarly to the package's INDEX file. | The following sections provide an overview of the functions available in the statistics package sorted alphabetically and arranged in groups similarly to the package's INDEX file. the '''TODO''' subsections are only informative of the current development plans for the forthcoming releases and they are not intended for reporting bugs, missing features or incompatibilities. Please report these in the [https://github.com/gnu-octave/statistics statistics repository] at GitHub. | ||
== Clustering == | == Clustering == | ||
Line 9: | Line 9: | ||
=== Available functions === | === Available functions === | ||
The following table lists the functions | The following table lists the available functions for clustering data. | ||
{| class="wikitable" | {| class="wikitable" | ||
Line 15: | Line 15: | ||
! Description | ! Description | ||
|- | |- | ||
| cluster | | [https://gnu-octave.github.io/statistics/cluster.html cluster] | ||
| Define clusters from an agglomerative hierarchical cluster tree. | | Define clusters from an agglomerative hierarchical cluster tree. | ||
|- | |- | ||
| cmdscale | | [https://gnu-octave.github.io/statistics/clusterdata.html clusterdata] | ||
| Wrapper function for 'linkage' and 'cluster'. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/cmdscale.html cmdscale] | |||
| Classical multidimensional scaling of a matrix. | | Classical multidimensional scaling of a matrix. | ||
|- | |- | ||
| confusionmat | | [https://gnu-octave.github.io/statistics/confusionmat.html confusionmat] | ||
| Compute a confusion matrix for classification problems. | | Compute a confusion matrix for classification problems. | ||
|- | |- | ||
| cophenet | | [https://gnu-octave.github.io/statistics/ConfusionMatrixChart.html ConfusionMatrixChart] | ||
| Compute a ConfusionMatrixChart class object. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/cophenet.html cophenet] | |||
| Compute the cophenetic correlation coefficient. | | Compute the cophenetic correlation coefficient. | ||
|- | |- | ||
| evalclusters | | [https://gnu-octave.github.io/statistics/evalclusters.html evalclusters] | ||
| Create a clustering evaluation object to find the optimal number of clusters. | | Create a clustering evaluation object to find the optimal number of clusters. | ||
|- | |- | ||
| inconsistent | | [https://gnu-octave.github.io/statistics/inconsistent.html inconsistent] | ||
| Compute the inconsistency coefficient for each link of a hierarchical cluster tree. | | Compute the inconsistency coefficient for each link of a hierarchical cluster tree. | ||
|- | |- | ||
| kmeans | | [https://gnu-octave.github.io/statistics/kmeans.html kmeans] | ||
| Perform a K-means clustering of an NxD matrix. | | Perform a K-means clustering of an NxD matrix. | ||
|- | |- | ||
| linkage | | [https://gnu-octave.github.io/statistics/linkage.html linkage] | ||
| Produce a hierarchical clustering dendrogram. | | Produce a hierarchical clustering dendrogram. | ||
|- | |- | ||
| mahal | | [https://gnu-octave.github.io/statistics/mhsample.html mahal] | ||
| Mahalanobis' D-square distance. | | Mahalanobis' D-square distance. | ||
|- | |- | ||
| optimalleaforder | | [https://gnu-octave.github.io/statistics/mhsample.html mhsample] | ||
| Draws NSAMPLES samples from a target stationary distribution PDF using Metropolis-Hastings algorithm. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/optimalleaforder.html optimalleaforder] | |||
| Compute the optimal leaf ordering of a hierarchical binary cluster tree. | | Compute the optimal leaf ordering of a hierarchical binary cluster tree. | ||
|- | |- | ||
| pdist | | [https://gnu-octave.github.io/statistics/pdist.html pdist] | ||
| Return the distance between any two rows in X. | | Return the distance between any two rows in X. | ||
|- | |- | ||
| pdist2 | | [https://gnu-octave.github.io/statistics/pdist2.html pdist2] | ||
| Compute pairwise distance between two sets of vectors. | | Compute pairwise distance between two sets of vectors. | ||
|- | |- | ||
| squareform | | [https://gnu-octave.github.io/statistics/procrustes.html procrustes] | ||
| Procrustes Analysis. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/slicesample.html slicesample] | |||
| Draws NSAMPLES samples from a target stationary distribution PDF using slice sampling of Radford M. Neal. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/squareform.html squareform] | |||
| Interchange between distance matrix and distance vector formats. | | Interchange between distance matrix and distance vector formats. | ||
|} | |} | ||
Line 58: | Line 73: | ||
Missing functions: | Missing functions: | ||
== Data Manipulation == | == Data Manipulation == | ||
Line 67: | Line 78: | ||
=== Available functions === | === Available functions === | ||
The following table lists the functions | The following table lists the available functions for data manipulation. | ||
{| class="wikitable" | {| class="wikitable" | ||
Line 73: | Line 84: | ||
! Description | ! Description | ||
|- | |- | ||
| combnk | | [https://gnu-octave.github.io/statistics/combnk.html combnk] | ||
| Return all combinations of K elements in DATA. | | Return all combinations of K elements in DATA. | ||
|- | |- | ||
| crosstab | | [https://gnu-octave.github.io/statistics/crosstab.html crosstab] | ||
| Create a cross-tabulation (contingency table) T from data vectors. | | Create a cross-tabulation (contingency table) T from data vectors. | ||
|- | |- | ||
| datasample | | [https://gnu-octave.github.io/statistics/datasample.html datasample] | ||
| Randomly sample data. | | Randomly sample data. | ||
|- | |- | ||
| grp2idx | | [https://gnu-octave.github.io/statistics/fillmissing.html fillmissing] | ||
| Replace missing entries of array A either with values in v or as determined by other specified methods. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/grp2idx.html grp2idx] | |||
| Get index for group variables. | | Get index for group variables. | ||
|- | |- | ||
| tabulate | | [https://gnu-octave.github.io/statistics/ismissing.html ismissing] | ||
| Find missing data in a numeric or string array. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/normalise_distribution.html normalise_distribution] | |||
| Transform a set of data so as to be N(0,1) distributed according to an idea by van Albada and Robinson. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/rmmissing.html rmmissing] | |||
| Remove missing or incomplete data from an array. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/standardizeMissing.html standardizeMissing] | |||
| Replace data values specified by indicator in A by the standard ’missing’ data value for that data type. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/tabulate.html tabulate] | |||
| Compute a frequency table. | | Compute a frequency table. | ||
|} | |} | ||
Line 93: | Line 119: | ||
=== Available functions === | === Available functions === | ||
The following table lists the functions | The following table lists the available functions for descriptive statistics. | ||
{| class="wikitable" | {| class="wikitable" | ||
Line 99: | Line 125: | ||
! Description | ! Description | ||
|- | |- | ||
| geomean | | [https://gnu-octave.github.io/statistics/cl_multinom.html cl_multinom] | ||
| Confidence level of multinomial portions. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/geomean.html geomean] | |||
| Compute the geometric mean. | | Compute the geometric mean. | ||
|- | |- | ||
| grpstats | | [https://gnu-octave.github.io/statistics/grpstats.html grpstats] | ||
| Compute summary statistics by group. Fully MATLAB compatible. | | Compute summary statistics by group. Fully MATLAB compatible. | ||
|- | |- | ||
| harmmean | | [https://gnu-octave.github.io/statistics/harmmean.html harmmean] | ||
| Compute the harmonic mean. | | Compute the harmonic mean. | ||
|- | |- | ||
| jackknife | | [https://gnu-octave.github.io/statistics/jackknife.html jackknife] | ||
| Compute jackknife estimates of a parameter taking one or more given samples as parameters. | | Compute jackknife estimates of a parameter taking one or more given samples as parameters. | ||
|- | |- | ||
| mean | | [https://gnu-octave.github.io/statistics/mean.html mean] | ||
| Compute the mean. Fully MATLAB compatible. | | Compute the mean. Fully MATLAB compatible. | ||
|- | |- | ||
| median | | [https://gnu-octave.github.io/statistics/median.html median] | ||
| Compute the median. Fully MATLAB compatible. | | Compute the median. Fully MATLAB compatible. | ||
|- | |- | ||
| nanmax | | [https://gnu-octave.github.io/statistics/nanmax.html nanmax] | ||
| Find the maximal element while ignoring NaN values. | | Find the maximal element while ignoring NaN values. | ||
|- | |- | ||
| nanmin | | [https://gnu-octave.github.io/statistics/nanmin.html nanmin] | ||
| Find the minimal element while ignoring NaN values. | | Find the minimal element while ignoring NaN values. | ||
|- | |- | ||
| nansum | | [https://gnu-octave.github.io/statistics/nansum.html nansum] | ||
| Compute the sum while ignoring NaN values. | | Compute the sum while ignoring NaN values. | ||
|- | |- | ||
| std | | [https://gnu-octave.github.io/statistics/std.html std] | ||
| Compute the standard deviation. Fully MATLAB compatible. | | Compute the standard deviation. Fully MATLAB compatible. | ||
|- | |- | ||
| trimmean | | [https://gnu-octave.github.io/statistics/trimmean.html trimmean] | ||
| Compute the trimmed mean. | | Compute the trimmed mean. | ||
|- | |- | ||
| std | | [https://gnu-octave.github.io/statistics/var.html std] | ||
| Compute the variance. Fully MATLAB compatible. | | Compute the variance. Fully MATLAB compatible. | ||
|} | |} | ||
Line 138: | Line 167: | ||
=== In external packages === | === In external packages === | ||
<code>bootci</code>, <code>bootstrp</code> are implemented in the [https://gnu-octave.github.io/packages/statistics- | <code>bootci</code>, <code>bootstrp</code> are implemented in the [https://gnu-octave.github.io/packages/statistics-resampling statistics-resampling] package. | ||
=== Shadowing Octave core functions === | === Shadowing Octave core functions === | ||
Line 153: | Line 182: | ||
=== TODO list === | === TODO list === | ||
Update | Update <code>trimmean</code> function to be fully MATLAB compatible. | ||
Re-introduce the <code>nan*</code> functions implemented in C++ with the <code>"all"</code> and <code>"vecdim"</code> options. | Re-introduce the <code>nan*</code> functions implemented in C++ with the <code>"all"</code> and <code>"vecdim"</code> options. | ||
Line 200: | Line 229: | ||
| binornd | | binornd | ||
|- | |- | ||
| Bivariate | | [https://en.wikipedia.org/wiki/Joint_probability_distribution Bivariate Normal] | ||
| bvncdf | | bvncdf | ||
| | |||
| | |||
| | |||
|- | |||
| [https://en.wikipedia.org/wiki/Joint_probability_distribution Bivariate Student's <i>t</i>] | |||
| bvtcdf | |||
| | | | ||
| | | | ||
Line 224: | Line 259: | ||
| chi2rnd | | chi2rnd | ||
|- | |- | ||
| Copula Family | | [https://en.wikipedia.org/wiki/Copula_(probability_theory) Copula Family] | ||
| copulacdf | | copulacdf | ||
| copulainv | | copulainv | ||
Line 230: | Line 265: | ||
| copularnd | | copularnd | ||
|- | |- | ||
| Extreme Value | | [https://en.wikipedia.org/wiki/Gumbel_distribution Extreme Value] | ||
| evcdf | | evcdf | ||
| evinv | | evinv | ||
Line 320: | Line 355: | ||
| mvnrnd | | mvnrnd | ||
|- | |- | ||
| [https://en.wikipedia.org/wiki/Multivariate_t-distribution Multivariate Student's | | [https://en.wikipedia.org/wiki/Multivariate_t-distribution Multivariate Student's <i>t</i>] | ||
| mvtcdf mvtcdfqmc | | mvtcdf mvtcdfqmc | ||
| mvtinv | | mvtinv | ||
Line 344: | Line 379: | ||
| ncfrnd | | ncfrnd | ||
|- | |- | ||
| [https://en.wikipedia.org/wiki/Noncentral_t-distribution Noncentral Student's | | [https://en.wikipedia.org/wiki/Noncentral_t-distribution Noncentral Student's <i>t</i>] | ||
| nctcdf | | nctcdf | ||
| nctinv | | nctinv | ||
Line 380: | Line 415: | ||
| stdnormal_rnd | | stdnormal_rnd | ||
|- | |- | ||
| [https://en.wikipedia.org/wiki/Student%27s_t-distribution Student's | | [https://en.wikipedia.org/wiki/Student%27s_t-distribution Student's <i>t</i>] | ||
| tcdf | | tcdf | ||
| tinv | | tinv | ||
Line 428: | Line 463: | ||
| wishrnd | | wishrnd | ||
|} | |} | ||
=== Distribution Fitting === | === Distribution Fitting === | ||
Line 496: | Line 530: | ||
== Experimental Design == | == Experimental Design == | ||
=== Available functions === | |||
Functions available for computing design matrices. | Functions available for computing design matrices. | ||
{| class="wikitable" | |||
! Function | |||
! Description | |||
|- | |||
| [https://gnu-octave.github.io/statistics/fullfact.html fullfact] | |||
| Full factorial design. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/ff2n.html ff2n] | |||
| Two-level full factorial design. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/sigma_pts.html sigma_pts] | |||
| Calculates 2*N+1 sigma points in N dimensions. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/x2fx.html x2fx] | |||
| Convert predictors to design matrix. | |||
|} | |||
== Machine Learning == | |||
=== Available functions === | |||
The following table lists the available functions. | |||
{| class="wikitable" | |||
! Function | |||
! Description | |||
|- | |||
| [https://gnu-octave.github.io/statistics/hmmestimate.html hmmestimate] | |||
| Estimation of a hidden Markov model for a given sequence. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/hmmgenerate.html hmmgenerate] | |||
| Output sequence and hidden states of a hidden Markov model. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/hmmviterbi.html hmmviterbi] | |||
| Viterbi path of a hidden Markov model. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/svmpredict.html svmpredict] | |||
| Perform a K-means clustering of an NxD matrix. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/svmtrain.html svmtrain] | |||
| Produce a hierarchical clustering dendrogram. | |||
|} | |||
=== TODO list === | |||
Update <code>svmpredict</code> and <code>svmtrain</code> to libsvm 3.0. | |||
Missing functions: | |||
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1"> | <div style="column-count:1;-moz-column-count:1;-webkit-column-count:1"> | ||
* | * <code>hmmdecode</code> | ||
* | * <code>hmmtrain</code> | ||
</div> | </div> | ||
== Model Fitting == | == Model Fitting == | ||
Functions available for | === Available functions === | ||
Functions available for fitting or evaluating statistical models. | |||
{| class="wikitable" | |||
! Function | |||
! Description | |||
|- | |||
| [https://gnu-octave.github.io/statistics/crossval.html crossval] | |||
| Perform cross validation on given data. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/fitgmdist.html fitgmdist] | |||
| Fit a Gaussian mixture model with K components to DATA. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/fitlm.html fitlm] | |||
| Regress the continuous outcome (i.e. dependent variable) Y on continuous or categorical predictors (i.e. independent variables) X by minimizing the sum-of-squared residuals. | |||
|} | |||
=== Cross Validation === | === Cross Validation === | ||
Line 527: | Line 621: | ||
* @cvpartition/test | * @cvpartition/test | ||
* @cvpartition/training | * @cvpartition/training | ||
</div> | |||
=== TODO list === | |||
Missing functions: | |||
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1"> | |||
* <code>anova</code> | |||
* <code>manova</code> | |||
</div> | </div> | ||
== Hypothesis Testing == | == Hypothesis Testing == | ||
=== Available functions === | |||
Functions available for hypothesis testing | Functions available for hypothesis testing | ||
Line 537: | Line 642: | ||
! Description | ! Description | ||
|- | |- | ||
| adtest | | [https://gnu-octave.github.io/statistics/adtest.html adtest] | ||
| Anderson-Darling goodness-of-fit hypothesis test. | | Anderson-Darling goodness-of-fit hypothesis test. | ||
|- | |- | ||
| anova1 | | [https://gnu-octave.github.io/statistics/anova1.html anova1] | ||
| Perform a one-way analysis of variance (ANOVA) | | Perform a one-way analysis of variance (ANOVA) | ||
|- | |- | ||
| anova2 | | [https://gnu-octave.github.io/statistics/anova2.html anova2] | ||
| Performs two-way factorial (crossed) or a nested analysis of variance (ANOVA) for balanced designs. | | Performs two-way factorial (crossed) or a nested analysis of variance (ANOVA) for balanced designs. | ||
|- | |- | ||
| anovan | | [https://gnu-octave.github.io/statistics/anovan.html anovan] | ||
| Perform a multi (N)-way analysis of (co)variance (ANOVA or ANCOVA) to evaluate the effect of one or more categorical or continuous predictors (i.e. independent variables) on a continuous outcome (i.e. dependent variable). | | Perform a multi (N)-way analysis of (co)variance (ANOVA or ANCOVA) to evaluate the effect of one or more categorical or continuous predictors (i.e. independent variables) on a continuous outcome (i.e. dependent variable). | ||
|- | |- | ||
| bartlett_test | | [https://gnu-octave.github.io/statistics/bartlett_test.html bartlett_test] | ||
| Perform a Bartlett test for the homogeneity of variances. | | Perform a Bartlett test for the homogeneity of variances. | ||
|- | |- | ||
| barttest | | [https://gnu-octave.github.io/statistics/barttest.html barttest] | ||
| Bartlett's test of sphericity for correlation. | | Bartlett's test of sphericity for correlation. | ||
|- | |- | ||
| binotest | | [https://gnu-octave.github.io/statistics/binotest.html binotest] | ||
| Test for probability P of a binomial sample | | Test for probability P of a binomial sample | ||
|- | |- | ||
| chi2gof | | [https://gnu-octave.github.io/statistics/chi2gof.html chi2gof] | ||
| Chi-square goodness-of-fit test. | | Chi-square goodness-of-fit test. | ||
|- | |- | ||
| chi2test | | [https://gnu-octave.github.io/statistics/chi2test.html chi2test] | ||
| Perform a chi-squared test (for independence or homogeneity). | | Perform a chi-squared test (for independence or homogeneity). | ||
|- | |- | ||
| friedman | | [https://gnu-octave.github.io/statistics/correlation_test.html correlation_test] | ||
| Perform a correlation coefficient test whether two samples x and y come from uncorrelated populations. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/fishertest.html fishertest] | |||
| Fisher’s exact test. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/friedman.html friedman] | |||
| Performs the nonparametric Friedman's test to compare column effects in a two-way layout. | | Performs the nonparametric Friedman's test to compare column effects in a two-way layout. | ||
|- | |- | ||
| hotelling_t2test | | [https://gnu-octave.github.io/statistics/hotelling_t2test.html hotelling_t2test] | ||
| Compute Hotelling's T^2 ("T-squared") test for a single sample or two dependent samples (paired-samples). | | Compute Hotelling's T^2 ("T-squared") test for a single sample or two dependent samples (paired-samples). | ||
|- | |- | ||
| hotelling_t2test2 | | [https://gnu-octave.github.io/statistics/hotelling_t2test2.html hotelling_t2test2] | ||
| Compute Hotelling's T^2 ("T-squared") test for two independent samples. | | Compute Hotelling's T^2 ("T-squared") test for two independent samples. | ||
|- | |- | ||
| kruskalwallis | | [https://gnu-octave.github.io/statistics/kruskalwallis.html kruskalwallis] | ||
| Perform a Kruskal-Wallis test, the non-parametric alternative of a one-way analysis of variance (ANOVA). | | Perform a Kruskal-Wallis test, the non-parametric alternative of a one-way analysis of variance (ANOVA). | ||
|- | |- | ||
| kstest | | [https://gnu-octave.github.io/statistics/kstest.html kstest] | ||
| Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis test. | | Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis test. | ||
|- | |- | ||
| kstest2 | | [https://gnu-octave.github.io/statistics/kstest2.html kstest2] | ||
| Two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test. | | Two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test. | ||
|- | |- | ||
| levene_test | | [https://gnu-octave.github.io/statistics/levene_test.html levene_test] | ||
| Perform a Levene's test for the homogeneity of variances. | | Perform a Levene's test for the homogeneity of variances. | ||
|- | |- | ||
| manova1 | | [https://gnu-octave.github.io/statistics/manova1.html manova1] | ||
| One-way multivariate analysis of variance (MANOVA). | | One-way multivariate analysis of variance (MANOVA). | ||
|- | |- | ||
| ranksum | | [https://gnu-octave.github.io/statistics/multcompare.html multcompare] | ||
| Perform posthoc multiple comparison tests or p-value adjustments to control the family-wise error rate (FWER) or false discovery rate (FDR). | |||
|- | |||
| [https://gnu-octave.github.io/statistics/ranksum.html ranksum] | |||
| Wilcoxon rank sum test for equal medians. This test is equivalent to a Mann-Whitney U-test. | | Wilcoxon rank sum test for equal medians. This test is equivalent to a Mann-Whitney U-test. | ||
|- | |- | ||
| regression_ftest | | [https://gnu-octave.github.io/statistics/regression_ftest.html regression_ftest] | ||
| F-test for General Linear Regression Analysis | | F-test for General Linear Regression Analysis | ||
|- | |- | ||
| regression_ttest | | [https://gnu-octave.github.io/statistics/regression_ttest.html regression_ttest] | ||
| Perform a linear regression t-test | | Perform a linear regression t-test. | ||
|- | |- | ||
| runstest | | [https://gnu-octave.github.io/statistics/runstest.html runstest] | ||
| Runs test for detecting serial correlation in the vector X. | | Runs test for detecting serial correlation in the vector X. | ||
|- | |- | ||
| sampsizepwr | | [https://gnu-octave.github.io/statistics/sampsizepwr.html sampsizepwr] | ||
| Sample size and power calculation for hypothesis test. | | Sample size and power calculation for hypothesis test. | ||
|- | |- | ||
| signtest | | [https://gnu-octave.github.io/statistics/signtest.html signtest] | ||
| Test for median. | | Test for median. | ||
|- | |- | ||
| ttest | | [https://gnu-octave.github.io/statistics/ttest.html ttest] | ||
| Test for mean of a normal sample with unknown variance or a paired-sample t-test. | | Test for mean of a normal sample with unknown variance or a paired-sample t-test. | ||
|- | |- | ||
| ttest2 | | [https://gnu-octave.github.io/statistics/ttest2.html ttest2] | ||
| Perform a two independent samples t-test. | | Perform a two independent samples t-test. | ||
|- | |- | ||
| vartest | | [https://gnu-octave.github.io/statistics/vartest.html vartest] | ||
| One-sample test of variance. | | One-sample test of variance. | ||
|- | |- | ||
| vartest2 | | [https://gnu-octave.github.io/statistics/vartest2.html vartest2] | ||
| Two-sample F test for equal variances. | | Two-sample F test for equal variances. | ||
|- | |- | ||
| vartestn | | [https://gnu-octave.github.io/statistics/vartestn.html vartestn] | ||
| Test for equal variances across multiple groups. | | Test for equal variances across multiple groups. | ||
|- | |- | ||
| ztest | | [https://gnu-octave.github.io/statistics/ztest.html ztest] | ||
| One-sample Z-test. | | One-sample Z-test. | ||
|- | |||
| [https://gnu-octave.github.io/statistics/ztest2.html ztest2] | |||
| Two proportions Z-test. | |||
|} | |} | ||
=== TODO list === | |||
Missing functions: | |||
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1"> | |||
* <code>fishertest</code> | |||
* <code>meanEffectSize</code> | |||
</div> | |||
== Plotting == | |||
=== Available functions === | |||
The following table lists the available functions for plotting data. | |||
{| class="wikitable" | |||
! Function | |||
! Description | |||
|- | |||
| [https://gnu-octave.github.io/statistics/boxplot.html boxplot] | |||
| Produce a box plot. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/cdfplot.html cdfplot] | |||
| Display an empirical cumulative distribution function. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/confusionchart.html confusionchart] | |||
| Display a chart of a confusion matrix. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/dendrogram.html dendrogram] | |||
| Plot a dendrogram of a hierarchical binary cluster tree. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/ecdf.html ecdf] | |||
| Empirical (Kaplan-Meier) cumulative distribution function. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/gscatter.html gscatter] | |||
| Draw a scatter plot with grouped data. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/histfit.html histfit] | |||
| Plot histogram with superimposed fitted normal density. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/hist3.html hist3] | |||
| Produce bivariate (2D) histogram counts or plots. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/manovacluster.html manovacluster] | |||
| Cluster group means using manova1 output. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/normplot.html normplot] | |||
| Produce normal probability plot of the data. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/ppplot.html ppplot] | |||
| Perform a PP-plot (probability plot). | |||
|- | |||
| [https://gnu-octave.github.io/statistics/qqplot.html qqplot] | |||
| Perform a QQ-plot (quantile plot). | |||
|- | |||
| [https://gnu-octave.github.io/statistics/silhouette.html silhouette] | |||
| Compute the silhouette values of clustered data and show them on a plot. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/violin.html violin] | |||
| Produce a Violin plot of the data. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/wblplot.html wblplot] | |||
| Plot a column vector DATA on a Weibull probability plot using rank regression. | |||
|} | |||
=== TODO list === | |||
Missing functions: | |||
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1"> | |||
* <code>andrewsplot</code> | |||
* <code>bar3</code> | |||
* <code>bar3h</code> | |||
* <code>glyphplot</code> | |||
* <code>gplotmatrix</code> | |||
* <code>parallelcoords</code> | |||
</div> | |||
== Regression == | |||
=== Available functions === | |||
The following table lists the available functions for regression analysis. | |||
{| class="wikitable" | |||
! Function | |||
! Description | |||
|- | |||
| [https://gnu-octave.github.io/statistics/canoncorr.html canoncorr] | |||
| Canonical correlation analysis. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/cholcov.html cholcov] | |||
| Cholesky-like decomposition for covariance matrix. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/dcov.html dcov] | |||
| Distance correlation, covariance and correlation statistics. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/logistic_regression.html logistic_regression] | |||
| Perform ordinal logistic regression. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/monotone_smooth.html monotone_smooth] | |||
| Produce a smooth monotone increasing approximation to a sampled functional dependence. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/pca.html pca] | |||
| Performs a principal component analysis on a data matrix. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/pcacov.html pcacov] | |||
| Perform principal component analysis on the NxN covariance matrix X | |||
|- | |||
| [https://gnu-octave.github.io/statistics/pcares.html pcares] | |||
| Calculate residuals from principal component analysis. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/plsregress.html plsregress] | |||
| Calculate partial least squares regression using SIMPLS algorithm. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/princomp.html princomp] | |||
| Performs a principal component analysis on a NxP data matrix. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/regress.html regress] | |||
| Multiple Linear Regression using Least Squares Fit. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/regress_gp.html regress_gp] | |||
| Linear scalar regression using gaussian processes. | |||
|- | |||
| [https://gnu-octave.github.io/statistics/stepwisefit.html stepwisefit] | |||
| Linear regression with stepwise variable selection. | |||
|} | |||
=== TODO list === | |||
Missing functions: | |||
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1"> | |||
* <code>glmfit</code> | |||
* <code>glmval</code> | |||
* <code>mnrfit</code> | |||
* <code>mnrval</code> | |||
</div> | |||
== Wrappers == | == Wrappers == | ||
=== Available functions === | |||
Functions available for wrapping other functions or group of functions. | Functions available for wrapping other functions or group of functions. | ||
Line 634: | Line 891: | ||
! Description | ! Description | ||
|- | |- | ||
| cdf | | [https://gnu-octave.github.io/statistics/cdf.html cdf] | ||
| This is a wrapper | | This is a wrapper for the NAMEcdf and NAME_cdf functions available in the statistics package. | ||
|- | |- | ||
| | | [https://gnu-octave.github.io/statistics/icdf.html icdf] | ||
| | | This is a wrapper for the NAMEinv and NAME_inv functions available in the statistics package. | ||
|- | |- | ||
| pdf | | [https://gnu-octave.github.io/statistics/pdf.html pdf] | ||
| This is a wrapper | | This is a wrapper for the NAMEpdf and NAME_pdf functions available in the statistics package. | ||
|- | |- | ||
| | | [https://gnu-octave.github.io/statistics/random.html random] | ||
| Generates pseudo-random numbers from a given one-, two-, or three-parameter distribution. | | Generates pseudo-random numbers from a given one-, two-, or three-parameter distribution. | ||
|} | |} |
Latest revision as of 01:31, 24 July 2024
The statistics package is part of the Octave Packages. Since version 1.5.0, the statistics package requires Octave version 6.1 or higher. From Octave v7.2 or later, you can install the latest statistics package (currently 1.5.3) with the following command:
pkg install -forge statistics
The following sections provide an overview of the functions available in the statistics package sorted alphabetically and arranged in groups similarly to the package's INDEX file. the TODO subsections are only informative of the current development plans for the forthcoming releases and they are not intended for reporting bugs, missing features or incompatibilities. Please report these in the statistics repository at GitHub.
Clustering[edit]
Available functions[edit]
The following table lists the available functions for clustering data.
Function | Description |
---|---|
cluster | Define clusters from an agglomerative hierarchical cluster tree. |
clusterdata | Wrapper function for 'linkage' and 'cluster'. |
cmdscale | Classical multidimensional scaling of a matrix. |
confusionmat | Compute a confusion matrix for classification problems. |
ConfusionMatrixChart | Compute a ConfusionMatrixChart class object. |
cophenet | Compute the cophenetic correlation coefficient. |
evalclusters | Create a clustering evaluation object to find the optimal number of clusters. |
inconsistent | Compute the inconsistency coefficient for each link of a hierarchical cluster tree. |
kmeans | Perform a K-means clustering of an NxD matrix. |
linkage | Produce a hierarchical clustering dendrogram. |
mahal | Mahalanobis' D-square distance. |
mhsample | Draws NSAMPLES samples from a target stationary distribution PDF using Metropolis-Hastings algorithm. |
optimalleaforder | Compute the optimal leaf ordering of a hierarchical binary cluster tree. |
pdist | Return the distance between any two rows in X. |
pdist2 | Compute pairwise distance between two sets of vectors. |
procrustes | Procrustes Analysis. |
slicesample | Draws NSAMPLES samples from a target stationary distribution PDF using slice sampling of Radford M. Neal. |
squareform | Interchange between distance matrix and distance vector formats. |
TODO list[edit]
Missing functions:
Data Manipulation[edit]
Available functions[edit]
The following table lists the available functions for data manipulation.
Function | Description |
---|---|
combnk | Return all combinations of K elements in DATA. |
crosstab | Create a cross-tabulation (contingency table) T from data vectors. |
datasample | Randomly sample data. |
fillmissing | Replace missing entries of array A either with values in v or as determined by other specified methods. |
grp2idx | Get index for group variables. |
ismissing | Find missing data in a numeric or string array. |
normalise_distribution | Transform a set of data so as to be N(0,1) distributed according to an idea by van Albada and Robinson. |
rmmissing | Remove missing or incomplete data from an array. |
standardizeMissing | Replace data values specified by indicator in A by the standard ’missing’ data value for that data type. |
tabulate | Compute a frequency table. |
Descriptive Statistics[edit]
Available functions[edit]
The following table lists the available functions for descriptive statistics.
Function | Description |
---|---|
cl_multinom | Confidence level of multinomial portions. |
geomean | Compute the geometric mean. |
grpstats | Compute summary statistics by group. Fully MATLAB compatible. |
harmmean | Compute the harmonic mean. |
jackknife | Compute jackknife estimates of a parameter taking one or more given samples as parameters. |
mean | Compute the mean. Fully MATLAB compatible. |
median | Compute the median. Fully MATLAB compatible. |
nanmax | Find the maximal element while ignoring NaN values. |
nanmin | Find the minimal element while ignoring NaN values. |
nansum | Compute the sum while ignoring NaN values. |
std | Compute the standard deviation. Fully MATLAB compatible. |
trimmean | Compute the trimmed mean. |
std | Compute the variance. Fully MATLAB compatible. |
In external packages[edit]
bootci
, bootstrp
are implemented in the statistics-resampling package.
Shadowing Octave core functions[edit]
The following functions will shadow the respective core functions until Octave 9.
mean
median
std
var
TODO list[edit]
Update trimmean
function to be fully MATLAB compatible.
Re-introduce the nan*
functions implemented in C++ with the "all"
and "vecdim"
options.
Re-implement the following functions from core Octave, as shadowing functions with updated functionality regarding the "all"
, "omitnan"
, and "vecdim"
options, with the intend to be included in Octave 9.
cov
mad
meansq
mode
moment
Distributions[edit]
Available functions[edit]
The following table lists the cdf, icdf, pdf, and random functions available in the statistics package. Since version 1.5.3, all CDFs support the "upper" option for evaluating the complement of the respective CDF.
Note! The icdf wrapper for the quantile functions is not implemented yet.
Distribution Name | Cumulative Distribution Function | Quantile Function | Probability Density Function | Random Generator |
---|---|---|---|---|
Birnbaum–Saunders | bbscdf | bbsinv | bbspdf | bbsrnd |
Beta | betacdf | betainv | betapdf | betarndbivariate |
[Binomial | binocdf | binoinv | binopdf | binornd |
Bivariate Normal | bvncdf | |||
Bivariate Student's t | bvtcdf | |||
Burr Type XII | burrcdf | burrinv | burrpdf | burrrnd |
Cauchy | cauchy_cdf | cauchy_inv | cauchy_pdf | cauchy_rnd |
Chi-squared | chi2cdf | chi2inv | chi2pdf | chi2rnd |
Copula Family | copulacdf | copulainv | copulapdf | copularnd |
Extreme Value | evcdf | evinv | evpdf | evrnd |
Exponential | expcdf | expinv | exppdf | exprnd |
F | fcdf | finv | fpdf | frnd |
Gamma | gamcdf | gaminv | gampdf | gamrnd |
Geometric | geocdf | geoinv | geopdf | geornd |
Generalized Extreme Value | gevcdf | gevinv | gevpdf | gevrnd |
Generalized Pareto | gpcdf | gpinv | gppdf | gprnd |
Hypergeometric | hygecdf | hygeinv | hygepdf | hygernd |
Inverse-Wishart | iwishpdf | iwishrnd | ||
Johnson's SU | jsucdf | jsupdf | ||
Laplace | laplace_cdf | laplace_inv | laplace_pdf | laplace_rnd |
Logistic | logistic_cdf | logistic_inv | logistic_pdf | logistic_rnd |
Log-normal | logncdf | logninv | lognpdf | lognrnd |
Multinomial | mnpdf | mnrnd | ||
Multivariate Normal | mvncdf | mvninv | mvnpdf | mvnrnd |
Multivariate Student's t | mvtcdf mvtcdfqmc | mvtinv | mvtpdf | mvtrnd |
Nakagami | nakacdf | nakainv | nakapdf | nakarnd |
Negative Binomial | nbincdf | nbininv | nbinpdf | nbinrnd |
Noncentral F | ncfcdf | ncfinv | ncfpdf | ncfrnd |
Noncentral Student's t | nctcdf | nctinv | nctpdf | nctrnd |
Noncentral Chi-squared | ncx2cdf | ncx2inv | ncx2pdf | ncx2rnd |
Normal | normcdf | norminv | normpdf | normrnd |
Poisson | poisscdf | poissinv | poisspdf | poissrnd |
Rayleigh | raylcdf | raylinv | raylpdf | raylrnd |
Standard Normal | stdnormal_cdf | stdnormal_inv | stdnormal_pdf | stdnormal_rnd |
Student's t | tcdf | tinv | tpdf | trnd |
Triangular | tricdf | triinv | tripdf | trirnd |
Discrete Uniform | unidcdf | unidinv | unidpdf | unidrnd |
Continuous Uniform | unifcdf | unifinv | unifpdf | unifrnd |
von Mises | vmcdf | vmpdf | vmrnd | |
Weibull | wblcdf | wblinv | wblpdf | wblrnd |
Wiener process | wienrnd | |||
Wishart | wishpdf | wishrnd |
Distribution Fitting[edit]
Functions available for estimating parameters and the negative log-likelihood for certain distributions.
Distribution Name | Parameter Estimation | Negativel Log-likelihood |
---|---|---|
Extreme Value | evfit | evlike |
Exponential | expfit | explike |
Gamma | gamfit | gamlike |
Generalized Extreme Value | gevfit_lmom gevfit | gevlike |
Generalized Pareto | gpfit | gplike |
Normal | normlike |
Distribution Statistics[edit]
Functions available for computing mean and variance from distribution parameters.
betastat
binostat
chi2stat
evstat
expstat
fstat
gamstat
geostat
gevstat
gpstat
hygestat
lognstat
nbinstat
ncfstat
nctstat
ncx2stat
normstat
poisstat
raylstat
fitgmdist
tstat
unidstat
unifstat
wblstat
Experimental Design[edit]
Available functions[edit]
Functions available for computing design matrices.
Function | Description |
---|---|
fullfact | Full factorial design. |
ff2n | Two-level full factorial design. |
sigma_pts | Calculates 2*N+1 sigma points in N dimensions. |
x2fx | Convert predictors to design matrix. |
Machine Learning[edit]
Available functions[edit]
The following table lists the available functions.
Function | Description |
---|---|
hmmestimate | Estimation of a hidden Markov model for a given sequence. |
hmmgenerate | Output sequence and hidden states of a hidden Markov model. |
hmmviterbi | Viterbi path of a hidden Markov model. |
svmpredict | Perform a K-means clustering of an NxD matrix. |
svmtrain | Produce a hierarchical clustering dendrogram. |
TODO list[edit]
Update svmpredict
and svmtrain
to libsvm 3.0.
Missing functions:
hmmdecode
hmmtrain
Model Fitting[edit]
Available functions[edit]
Functions available for fitting or evaluating statistical models.
Function | Description |
---|---|
crossval | Perform cross validation on given data. |
fitgmdist | Fit a Gaussian mixture model with K components to DATA. |
fitlm | Regress the continuous outcome (i.e. dependent variable) Y on continuous or categorical predictors (i.e. independent variables) X by minimizing the sum-of-squared residuals. |
Cross Validation[edit]
Class of set partitions for cross-validation, used in crossval
- @cvpartition/cvpartition
- @cvpartition/display
- @cvpartition/get
- @cvpartition/repartition
- @cvpartition/set
- @cvpartition/test
- @cvpartition/training
TODO list[edit]
Missing functions:
anova
manova
Hypothesis Testing[edit]
Available functions[edit]
Functions available for hypothesis testing
Function | Description |
---|---|
adtest | Anderson-Darling goodness-of-fit hypothesis test. |
anova1 | Perform a one-way analysis of variance (ANOVA) |
anova2 | Performs two-way factorial (crossed) or a nested analysis of variance (ANOVA) for balanced designs. |
anovan | Perform a multi (N)-way analysis of (co)variance (ANOVA or ANCOVA) to evaluate the effect of one or more categorical or continuous predictors (i.e. independent variables) on a continuous outcome (i.e. dependent variable). |
bartlett_test | Perform a Bartlett test for the homogeneity of variances. |
barttest | Bartlett's test of sphericity for correlation. |
binotest | Test for probability P of a binomial sample |
chi2gof | Chi-square goodness-of-fit test. |
chi2test | Perform a chi-squared test (for independence or homogeneity). |
correlation_test | Perform a correlation coefficient test whether two samples x and y come from uncorrelated populations. |
fishertest | Fisher’s exact test. |
friedman | Performs the nonparametric Friedman's test to compare column effects in a two-way layout. |
hotelling_t2test | Compute Hotelling's T^2 ("T-squared") test for a single sample or two dependent samples (paired-samples). |
hotelling_t2test2 | Compute Hotelling's T^2 ("T-squared") test for two independent samples. |
kruskalwallis | Perform a Kruskal-Wallis test, the non-parametric alternative of a one-way analysis of variance (ANOVA). |
kstest | Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis test. |
kstest2 | Two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test. |
levene_test | Perform a Levene's test for the homogeneity of variances. |
manova1 | One-way multivariate analysis of variance (MANOVA). |
multcompare | Perform posthoc multiple comparison tests or p-value adjustments to control the family-wise error rate (FWER) or false discovery rate (FDR). |
ranksum | Wilcoxon rank sum test for equal medians. This test is equivalent to a Mann-Whitney U-test. |
regression_ftest | F-test for General Linear Regression Analysis |
regression_ttest | Perform a linear regression t-test. |
runstest | Runs test for detecting serial correlation in the vector X. |
sampsizepwr | Sample size and power calculation for hypothesis test. |
signtest | Test for median. |
ttest | Test for mean of a normal sample with unknown variance or a paired-sample t-test. |
ttest2 | Perform a two independent samples t-test. |
vartest | One-sample test of variance. |
vartest2 | Two-sample F test for equal variances. |
vartestn | Test for equal variances across multiple groups. |
ztest | One-sample Z-test. |
ztest2 | Two proportions Z-test. |
TODO list[edit]
Missing functions:
fishertest
meanEffectSize
Plotting[edit]
Available functions[edit]
The following table lists the available functions for plotting data.
Function | Description |
---|---|
boxplot | Produce a box plot. |
cdfplot | Display an empirical cumulative distribution function. |
confusionchart | Display a chart of a confusion matrix. |
dendrogram | Plot a dendrogram of a hierarchical binary cluster tree. |
ecdf | Empirical (Kaplan-Meier) cumulative distribution function. |
gscatter | Draw a scatter plot with grouped data. |
histfit | Plot histogram with superimposed fitted normal density. |
hist3 | Produce bivariate (2D) histogram counts or plots. |
manovacluster | Cluster group means using manova1 output. |
normplot | Produce normal probability plot of the data. |
ppplot | Perform a PP-plot (probability plot). |
qqplot | Perform a QQ-plot (quantile plot). |
silhouette | Compute the silhouette values of clustered data and show them on a plot. |
violin | Produce a Violin plot of the data. |
wblplot | Plot a column vector DATA on a Weibull probability plot using rank regression. |
TODO list[edit]
Missing functions:
andrewsplot
bar3
bar3h
glyphplot
gplotmatrix
parallelcoords
Regression[edit]
Available functions[edit]
The following table lists the available functions for regression analysis.
Function | Description |
---|---|
canoncorr | Canonical correlation analysis. |
cholcov | Cholesky-like decomposition for covariance matrix. |
dcov | Distance correlation, covariance and correlation statistics. |
logistic_regression | Perform ordinal logistic regression. |
monotone_smooth | Produce a smooth monotone increasing approximation to a sampled functional dependence. |
pca | Performs a principal component analysis on a data matrix. |
pcacov | Perform principal component analysis on the NxN covariance matrix X |
pcares | Calculate residuals from principal component analysis. |
plsregress | Calculate partial least squares regression using SIMPLS algorithm. |
princomp | Performs a principal component analysis on a NxP data matrix. |
regress | Multiple Linear Regression using Least Squares Fit. |
regress_gp | Linear scalar regression using gaussian processes. |
stepwisefit | Linear regression with stepwise variable selection. |
TODO list[edit]
Missing functions:
glmfit
glmval
mnrfit
mnrval
Wrappers[edit]
Available functions[edit]
Functions available for wrapping other functions or group of functions.
Function | Description |
---|---|
cdf | This is a wrapper for the NAMEcdf and NAME_cdf functions available in the statistics package. |
icdf | This is a wrapper for the NAMEinv and NAME_inv functions available in the statistics package. |
This is a wrapper for the NAMEpdf and NAME_pdf functions available in the statistics package. | |
random | Generates pseudo-random numbers from a given one-, two-, or three-parameter distribution. |