Statistics package: Difference between revisions

From Octave
Jump to navigation Jump to search
(→‎TODO list: remove procrustes from missing fn list)
 
(79 intermediate revisions by 4 users not shown)
Line 1: Line 1:
The {{Forge|statistics|statistics package}} is part of the [[Octave Forge]] project.
The [https://github.com/gnu-octave/statistics/ statistics package] is part of the [https://gnu-octave.github.io/packages/ Octave Packages]. Since version [https://github.com/gnu-octave/statistics/releases/tag/release-1.5.0 1.5.0], the statistics package requires Octave version 6.1 or higher. From Octave v7.2 or later, you can install the latest statistics package (currently 1.5.3) with the following command:


== Core package ==
<code>pkg install -forge statistics</code>


We need to decide what to do with the functions in the existing Forge package when they are not implemented or have been removed from the corresponding Matlab Toolbox:
The following sections provide an overview of the functions available in the statistics package sorted alphabetically and arranged in groups similarly to the package's INDEX file. the '''TODO''' subsections are only informative of the current development plans for the forthcoming releases and they are not intended for reporting bugs, missing features or incompatibilities. Please report these in the [https://github.com/gnu-octave/statistics statistics repository] at GitHub.


=== To be decided ===
== Clustering ==
 
=== Available functions ===
 
The following table lists the available functions for clustering data.
 
{| class="wikitable"
! Function
! Description
|-
| [https://gnu-octave.github.io/statistics/cluster.html cluster]
| Define clusters from an agglomerative hierarchical cluster tree.
|-
| [https://gnu-octave.github.io/statistics/clusterdata.html clusterdata]
| Wrapper function for 'linkage' and 'cluster'.
|-
| [https://gnu-octave.github.io/statistics/cmdscale.html cmdscale]
| Classical multidimensional scaling of a matrix.
|-
| [https://gnu-octave.github.io/statistics/confusionmat.html confusionmat]
| Compute a confusion matrix for classification problems.
|-
| [https://gnu-octave.github.io/statistics/ConfusionMatrixChart.html ConfusionMatrixChart]
| Compute a ConfusionMatrixChart class object.
|-
| [https://gnu-octave.github.io/statistics/cophenet.html cophenet]
| Compute the cophenetic correlation coefficient.
|-
| [https://gnu-octave.github.io/statistics/evalclusters.html evalclusters]
| Create a clustering evaluation object to find the optimal number of clusters.
|-
| [https://gnu-octave.github.io/statistics/inconsistent.html inconsistent]
| Compute the inconsistency coefficient for each link of a hierarchical cluster tree.
|-
| [https://gnu-octave.github.io/statistics/kmeans.html kmeans]
| Perform a K-means clustering of an NxD matrix.
|-
| [https://gnu-octave.github.io/statistics/linkage.html linkage]
| Produce a hierarchical clustering dendrogram.
|-
| [https://gnu-octave.github.io/statistics/mhsample.html mahal]
| Mahalanobis' D-square distance.
|-
| [https://gnu-octave.github.io/statistics/mhsample.html mhsample]
| Draws NSAMPLES samples from a target stationary distribution PDF using Metropolis-Hastings algorithm.
|-
| [https://gnu-octave.github.io/statistics/optimalleaforder.html optimalleaforder]
| Compute the optimal leaf ordering of a hierarchical binary cluster tree.
|-
| [https://gnu-octave.github.io/statistics/pdist.html pdist]
| Return the distance between any two rows in X.
|-
| [https://gnu-octave.github.io/statistics/pdist2.html pdist2]
| Compute pairwise distance between two sets of vectors.
|-
| [https://gnu-octave.github.io/statistics/procrustes.html procrustes]
| Procrustes Analysis.
|-
| [https://gnu-octave.github.io/statistics/slicesample.html slicesample]
| Draws NSAMPLES samples from a target stationary distribution PDF using slice sampling of Radford M. Neal.
|-
| [https://gnu-octave.github.io/statistics/squareform.html squareform]
| Interchange between distance matrix and distance vector formats.
|}
 
=== TODO list ===
 
Missing functions:
 
== Data Manipulation ==
 
=== Available functions ===
 
The following table lists the available functions for data manipulation.
 
{| class="wikitable"
! Function
! Description
|-
| [https://gnu-octave.github.io/statistics/combnk.html combnk]
| Return all combinations of K elements in DATA.
|-
| [https://gnu-octave.github.io/statistics/crosstab.html crosstab]
| Create a cross-tabulation (contingency table) T from data vectors.
|-
| [https://gnu-octave.github.io/statistics/datasample.html datasample]
| Randomly sample data.
|-
| [https://gnu-octave.github.io/statistics/fillmissing.html fillmissing]
| Replace missing entries of array A either with values in v or as determined by other specified methods.
|-
| [https://gnu-octave.github.io/statistics/grp2idx.html grp2idx]
| Get index for group variables.
|-
| [https://gnu-octave.github.io/statistics/ismissing.html ismissing]
| Find missing data in a numeric or string array.
|-
| [https://gnu-octave.github.io/statistics/normalise_distribution.html normalise_distribution]
|  Transform a set of data so as to be N(0,1) distributed according to an idea by van Albada and Robinson.
|-
| [https://gnu-octave.github.io/statistics/rmmissing.html rmmissing]
| Remove missing or incomplete data from an array.
|-
| [https://gnu-octave.github.io/statistics/standardizeMissing.html standardizeMissing]
| Replace data values specified by indicator in A by the standard ’missing’ data value for that data type.
|-
| [https://gnu-octave.github.io/statistics/tabulate.html tabulate]
| Compute a frequency table.
|}
 
== Descriptive Statistics ==
 
=== Available functions ===
 
The following table lists the available functions for descriptive statistics.


{| class="wikitable"
{| class="wikitable"
! function
! Function
! decision/notes
! Description
|-
|-
|anderson_darling_cdf
| [https://gnu-octave.github.io/statistics/cl_multinom.html cl_multinom]
| Confidence level of multinomial portions.
|-
| [https://gnu-octave.github.io/statistics/geomean.html geomean]
| Compute the geometric mean.
|-
| [https://gnu-octave.github.io/statistics/grpstats.html grpstats]
| Compute summary statistics by group. Fully MATLAB compatible.
|-
| [https://gnu-octave.github.io/statistics/harmmean.html harmmean]
| Compute the harmonic mean.
|-
| [https://gnu-octave.github.io/statistics/jackknife.html jackknife]
| Compute jackknife estimates of a parameter taking one or more given samples as parameters.
|-
| [https://gnu-octave.github.io/statistics/mean.html mean]
| Compute the mean. Fully MATLAB compatible.
|-
| [https://gnu-octave.github.io/statistics/median.html median]
| Compute the median. Fully MATLAB compatible.
|-
| [https://gnu-octave.github.io/statistics/nanmax.html nanmax]
| Find the maximal element while ignoring NaN values.
|-
| [https://gnu-octave.github.io/statistics/nanmin.html nanmin]
| Find the minimal element while ignoring NaN values.
|-
| [https://gnu-octave.github.io/statistics/nansum.html nansum]
| Compute the sum while ignoring NaN values.
|-
| [https://gnu-octave.github.io/statistics/std.html std]
| Compute the standard deviation. Fully MATLAB compatible.
|-
| [https://gnu-octave.github.io/statistics/trimmean.html trimmean]
| Compute the trimmed mean.
|-
| [https://gnu-octave.github.io/statistics/var.html std]
| Compute the variance. Fully MATLAB compatible.
|}
 
=== In external packages ===
 
<code>bootci</code>, <code>bootstrp</code> are implemented in the [https://gnu-octave.github.io/packages/statistics-resampling statistics-resampling] package.
 
=== Shadowing Octave core functions ===
 
The following functions will shadow the respective core functions until Octave 9.
 
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>mean</code>
* <code>median</code>
* <code>std</code>
* <code>var</code>
</div>
 
=== TODO list ===
 
Update <code>trimmean</code> function to be fully MATLAB compatible.
 
Re-introduce the <code>nan*</code> functions implemented in C++ with the <code>"all"</code> and <code>"vecdim"</code> options.
 
Re-implement the following functions from core Octave, as shadowing functions with updated functionality regarding the <code>"all"</code>, <code>"omitnan"</code>, and <code>"vecdim"</code> options, with the intend to be included in Octave 9.
 
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>cov</code>
* <code>mad</code>
* <code>meansq</code>
* <code>mode</code>
* <code>moment</code>
</div>
 
== Distributions ==
 
=== Available functions ===
 
The following table lists the '''cdf''', '''icdf''', '''pdf''', and '''random''' functions available in the statistics package. Since version [https://github.com/gnu-octave/statistics/releases/tag/release-1.5.3 1.5.3], all CDFs support the "upper" option for evaluating the complement of the respective CDF.
 
Note! The '''icdf''' wrapper for the quantile functions is not implemented yet.
 
{| class="wikitable"
! Distribution Name
! Cumulative Distribution Function
! Quantile Function
! Probability Density Function
! Random Generator
|-
| [https://en.wikipedia.org/wiki/Birnbaum%E2%80%93Saunders_distribution Birnbaum–Saunders]
| bbscdf
| bbsinv
| bbspdf
| bbsrnd
|-
| [https://en.wikipedia.org/wiki/Beta_distribution Beta]
| betacdf
| betainv
| betapdf
| betarndbivariate
|-
| [[https://en.wikipedia.org/wiki/Binomial_distribution Binomial]
| binocdf
| binoinv
| binopdf
| binornd
|-
| [https://en.wikipedia.org/wiki/Joint_probability_distribution Bivariate Normal]
| bvncdf
|
|
|
|
|-
|-
|cl_multinom
| [https://en.wikipedia.org/wiki/Joint_probability_distribution Bivariate Student's <i>t</i>]
| bvtcdf
|
|
|
|
|-
|-
|dcov
| [https://www.mathworks.com/help/stats/burr-type-xii-distribution.html Burr Type XII]
| burrcdf
| burrinv
| burrpdf
| burrrnd
|-
| [https://en.wikipedia.org/wiki/Cauchy_distribution Cauchy]
| cauchy_cdf
| cauchy_inv
| cauchy_pdf
| cauchy_rnd
|-
| [https://en.wikipedia.org/wiki/Chi-squared_distribution Chi-squared]
| chi2cdf
| chi2inv
| chi2pdf
| chi2rnd
|-
| [https://en.wikipedia.org/wiki/Copula_(probability_theory) Copula Family]
| copulacdf
| copulainv
| copulapdf
| copularnd
|-
| [https://en.wikipedia.org/wiki/Gumbel_distribution Extreme Value]
| evcdf
| evinv
| evpdf
| evrnd
|-
| [https://en.wikipedia.org/wiki/Exponential_distribution Exponential]
| expcdf
| expinv
| exppdf
| exprnd
|-
| [https://en.wikipedia.org/wiki/F-distribution F]
| fcdf
| finv
| fpdf
| frnd
|-
| [https://en.wikipedia.org/wiki/Gamma_distribution Gamma]
| gamcdf
| gaminv
| gampdf
| gamrnd
|-
| [https://en.wikipedia.org/wiki/Geometric_distribution Geometric]
| geocdf
| geoinv
| geopdf
| geornd
|-
| [https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution Generalized Extreme Value]
| gevcdf
| gevinv
| gevpdf
| gevrnd
|-
| [https://en.wikipedia.org/wiki/Generalized_Pareto_distribution Generalized Pareto]
| gpcdf
| gpinv
| gppdf
| gprnd
|-
| [https://en.wikipedia.org/wiki/Hypergeometric_distribution Hypergeometric]
| hygecdf
| hygeinv
| hygepdf
| hygernd
|-
| [https://en.wikipedia.org/wiki/Inverse-Wishart_distribution Inverse-Wishart]
|
|
|-
| gevfit_lmom
|
|
|-
| iwishpdf
| iwishpdf
|
| iwishrnd
|-
|-
| [https://en.wikipedia.org/wiki/Johnson%27s_SU-distribution Johnson's SU]
| jsucdf
| jsucdf
|
|
| jsupdf
|
|-
| [https://en.wikipedia.org/wiki/Laplace_distribution Laplace]
| laplace_cdf
| laplace_inv
| laplace_pdf
| laplace_rnd
|-
| [https://en.wikipedia.org/wiki/Logistic_distribution Logistic]
| logistic_cdf
| logistic_inv
| logistic_pdf
| logistic_rnd
|-
| [https://en.wikipedia.org/wiki/Log-normal_distribution Log-normal]
| logncdf
| logninv
| lognpdf
| lognrnd
|-
|-
|jsupdf
| [https://en.wikipedia.org/wiki/Multinomial_distribution Multinomial]
|
|
|
| mnpdf
| mnrnd
|-
| [https://en.wikipedia.org/wiki/Multivariate_normal_distribution Multivariate Normal]
| mvncdf
| mvninv
| mvnpdf
| mvnrnd
|-
| [https://en.wikipedia.org/wiki/Multivariate_t-distribution Multivariate Student's <i>t</i>]
| mvtcdf mvtcdfqmc
| mvtinv
| mvtpdf
| mvtrnd
|-
| [https://en.wikipedia.org/wiki/Nakagami_distribution Nakagami]
| nakacdf
| nakainv
| nakapdf
| nakarnd
|-
| [https://en.wikipedia.org/wiki/Negative_binomial_distribution Negative Binomial]
| nbincdf
| nbininv
| nbinpdf
| nbinrnd
|-
| [https://en.wikipedia.org/wiki/Noncentral_F-distribution Noncentral F]
| ncfcdf
| ncfinv
| ncfpdf
| ncfrnd
|-
| [https://en.wikipedia.org/wiki/Noncentral_t-distribution Noncentral Student's <i>t</i>]
| nctcdf
| nctinv
| nctpdf
| nctrnd
|-
| [https://en.wikipedia.org/wiki/Noncentral_chi-squared_distribution Noncentral Chi-squared]
| ncx2cdf
| ncx2inv
| ncx2pdf
| ncx2rnd
|-
| [https://en.wikipedia.org/wiki/Normal_distribution Normal]
| normcdf
| norminv
| normpdf
| normrnd
|-
| [https://en.wikipedia.org/wiki/Poisson_distribution Poisson]
| poisscdf
| poissinv
| poisspdf
| poissrnd
|-
| [https://en.wikipedia.org/wiki/Rayleigh_distribution Rayleigh]
| raylcdf
| raylinv
| raylpdf
| raylrnd
|-
| [https://en.wikipedia.org/wiki/Normal_distribution#Standard_normal_distribution Standard Normal]
| stdnormal_cdf
| stdnormal_inv
| stdnormal_pdf
| stdnormal_rnd
|-
| [https://en.wikipedia.org/wiki/Student%27s_t-distribution Student's <i>t</i>]
| tcdf
| tinv
| tpdf
| trnd
|-
| [https://en.wikipedia.org/wiki/Triangular_distribution Triangular]
| tricdf
| triinv
| tripdf
| trirnd
|-
| [https://en.wikipedia.org/wiki/Discrete_uniform_distribution Discrete Uniform]
| unidcdf
| unidinv
| unidpdf
| unidrnd
|-
|-
|monotone_smooth
| [https://en.wikipedia.org/wiki/Continuous_uniform_distribution Continuous Uniform]
|
| unifcdf
| unifinv
| unifpdf
| unifrnd
|-
|-
|normalise_distribution
| [https://en.wikipedia.org/wiki/Von_Mises_distribution von Mises]
| vmcdf
|
|
| vmpdf
| vmrnd
|-
|-
|princomp
| [https://en.wikipedia.org/wiki/Weibull_distribution Weibull]
|matlab says that the function has been removed but likely is a lie.  We should keep it too.
| wblcdf
| wblinv
| wblpdf
| wblrnd
|-
|-
|qrandn
| [https://en.wikipedia.org/wiki/Wiener_process Wiener process]
|
|
|
|
| wienrnd
|-
|-
|runstest
| [https://en.wikipedia.org/wiki/Wishart_distribution Wishart]
|
|
|
| wishpdf
| wishrnd
|}
=== Distribution Fitting ===
Functions available for estimating parameters and the negative log-likelihood for certain distributions.
{| class="wikitable"
! Distribution Name
! Parameter Estimation
! Negativel Log-likelihood
|-
|-
|sigma_pts
| Extreme Value
|not yet released
| evfit
| evlike
|-
|-
|violin
| Exponential
|
| expfit
| explike
|-
| Gamma
| gamfit
| gamlike
|-
|-
|vmpdf
| Generalized Extreme Value
|
| gevfit_lmom gevfit
| gevlike
|-
|-
|vmrnd
| Generalized Pareto
|
| gpfit
| gplike
|-
|-
|wishpdf
| Normal
|
|
| normlike
|}
|}


=== Matlab incompatible ===
=== Distribution Statistics ===
 
Functions available for computing ''mean'' and ''variance'' from distribution parameters.
 
<div style="column-count:4;-moz-column-count:4;-webkit-column-count:4">
* <code>betastat</code>
* <code>binostat</code>
* <code>chi2stat</code>
* <code>evstat</code>
* <code>expstat</code>
* <code>fstat</code>
* <code>gamstat</code>
* <code>geostat</code>
* <code>gevstat</code>
* <code>gpstat</code>
* <code>hygestat</code>
* <code>lognstat</code>
* <code>nbinstat</code>
* <code>ncfstat</code>
* <code>nctstat</code>
* <code>ncx2stat</code>
* <code>normstat</code>
* <code>poisstat</code>
* <code>raylstat</code>
* <code>fitgmdist</code>
* <code>tstat</code>
* <code>unidstat</code>
* <code>unifstat</code>
* <code>wblstat</code>
</div>
 
== Experimental Design ==
 
=== Available functions ===
 
Functions available for computing design matrices.
 
{| class="wikitable"
! Function
! Description
|-
| [https://gnu-octave.github.io/statistics/fullfact.html fullfact]
| Full factorial design.
|-
| [https://gnu-octave.github.io/statistics/ff2n.html ff2n]
| Two-level full factorial design.
|-
| [https://gnu-octave.github.io/statistics/sigma_pts.html sigma_pts]
| Calculates 2*N+1 sigma points in N dimensions.
|-
| [https://gnu-octave.github.io/statistics/x2fx.html x2fx]
| Convert predictors to design matrix.
|}


These functions have the same name as Matlab functions but have a different interface
== Machine Learning ==


* boxplot
=== Available functions ===
* gpcdf
* gpinv
* gppdf
* gprnd


=== Can be reused in other functions ===
The following table lists the available functions.


{| class="wikitable"
{| class="wikitable"
!forge function
! Function
!matlab counterpart
! Description
|-
|-
|anderson_darling_test
| [https://gnu-octave.github.io/statistics/hmmestimate.html hmmestimate]
|adtest
| Estimation of a hidden Markov model for a given sequence.
|-
|-
|bbscdf
| [https://gnu-octave.github.io/statistics/hmmgenerate.html hmmgenerate]
|BirnbaumSaundersDistribution class
| Output sequence and hidden states of a hidden Markov model.
|-
|-
|bbsinv
| [https://gnu-octave.github.io/statistics/hmmviterbi.html hmmviterbi]
|BirnbaumSaundersDistribution class
| Viterbi path of a hidden Markov model.
|-
|-
|bbspdf
| [https://gnu-octave.github.io/statistics/svmpredict.html svmpredict]
|BirnbaumSaundersDistribution class
| Perform a K-means clustering of an NxD matrix.
|-
|-
|bbsrnd
| [https://gnu-octave.github.io/statistics/svmtrain.html svmtrain]
|BirnbaumSaundersDistribution class
| Produce a hierarchical clustering dendrogram.
|}
 
=== TODO list ===
 
Update <code>svmpredict</code> and <code>svmtrain</code> to libsvm 3.0.
 
Missing functions:
 
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>hmmdecode</code>
* <code>hmmtrain</code>
</div>
 
== Model Fitting ==
 
=== Available functions ===
 
Functions available for fitting or evaluating statistical models.
 
{| class="wikitable"
! Function
! Description
|-
|-
|binotest
| [https://gnu-octave.github.io/statistics/crossval.html crossval]
|binofit
| Perform cross validation on given data.
|-
|-
|burrcdf
| [https://gnu-octave.github.io/statistics/fitgmdist.html fitgmdist]
|BurrDistribution class
| Fit a Gaussian mixture model with K components to DATA.
|-
|-
|burrinv
| [https://gnu-octave.github.io/statistics/fitlm.html fitlm]
|BurrDistribution class
| Regress the continuous outcome (i.e.  dependent variable) Y on continuous or categorical predictors (i.e.  independent variables) X by minimizing the sum-of-squared residuals.
|}
 
=== Cross Validation ===
 
Class of set partitions for cross-validation, used in crossval
 
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* @cvpartition/cvpartition
* @cvpartition/display
* @cvpartition/get
* @cvpartition/repartition
* @cvpartition/set
* @cvpartition/test
* @cvpartition/training
</div>
 
=== TODO list ===
 
Missing functions:
 
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>anova</code>
* <code>manova</code>
</div>
 
== Hypothesis Testing ==
 
=== Available functions ===
 
Functions available for hypothesis testing
 
{| class="wikitable"
! Function
! Description
|-
|-
|burrpdf
| [https://gnu-octave.github.io/statistics/adtest.html adtest]
|BurrDistribution class
| Anderson-Darling goodness-of-fit hypothesis test.
|-
|-
|burrrnd
| [https://gnu-octave.github.io/statistics/anova1.html anova1]
|BurrDistribution class
| Perform a one-way analysis of variance (ANOVA)
|-
|-
|nakacdf
| [https://gnu-octave.github.io/statistics/anova2.html anova2]
|NakagamiDistribution class
| Performs two-way factorial (crossed) or a nested analysis of variance (ANOVA) for balanced designs.
|-
|-
|nakainv
| [https://gnu-octave.github.io/statistics/anovan.html anovan]
|NakagamiDistribution class
| Perform a multi (N)-way analysis of (co)variance (ANOVA or ANCOVA) to evaluate the effect of one or more categorical or continuous predictors (i.e.  independent variables) on a continuous outcome (i.e.  dependent variable).
|-
|-
|nakapdf
| [https://gnu-octave.github.io/statistics/bartlett_test.html bartlett_test]
|NakagamiDistribution class
| Perform a Bartlett test for the homogeneity of variances.
|-
|-
|nakarnd - should be used to implement the
| [https://gnu-octave.github.io/statistics/barttest.html barttest]
|NakagamiDistribution class
| Bartlett's test of sphericity for correlation.
|-
|-
|regress_gp
| [https://gnu-octave.github.io/statistics/binotest.html binotest]
|RegressionGP class
| Test for probability P of a binomial sample
|-
|-
|repanova
| [https://gnu-octave.github.io/statistics/chi2gof.html chi2gof]
|RepeatedMeasuresModel.ranova
| Chi-square goodness-of-fit test.
|-
|-
|tricdf
| [https://gnu-octave.github.io/statistics/chi2test.html chi2test]
|TriangularDistribution class
| Perform a chi-squared test (for independence or homogeneity).
|-
|-
|triinv
| [https://gnu-octave.github.io/statistics/correlation_test.html correlation_test]
|TriangularDistribution class
| Perform a correlation coefficient test whether two samples x and y come from uncorrelated populations.
|-
|-
|tripdf
| [https://gnu-octave.github.io/statistics/fishertest.html fishertest]
|TriangularDistribution class
| Fisher’s exact test.
|-
|-
|trirnd
| [https://gnu-octave.github.io/statistics/friedman.html friedman]
|TriangularDistribution class
| Performs the nonparametric Friedman's test to compare column effects in a two-way layout.
|-
|-
|logistic_cdf
| [https://gnu-octave.github.io/statistics/hotelling_t2test.html hotelling_t2test]
|LogisticDistribution class
| Compute Hotelling's T^2 ("T-squared") test for a single sample or two dependent samples (paired-samples).
|-
|-
|logistic_inv
| [https://gnu-octave.github.io/statistics/hotelling_t2test2.html hotelling_t2test2]
|LogisticDistribution class
| Compute Hotelling's T^2 ("T-squared") test for two independent samples.
|-
|-
|logistic_pdf
| [https://gnu-octave.github.io/statistics/kruskalwallis.html kruskalwallis]
|LogisticDistribution class
| Perform a Kruskal-Wallis test, the non-parametric alternative of a one-way analysis of variance (ANOVA).
|-
|-
|logistic_rnd
| [https://gnu-octave.github.io/statistics/kstest.html kstest]
|LogisticDistribution class
| Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis test.
|-
|-
|stdnormal_cdf
| [https://gnu-octave.github.io/statistics/kstest2.html kstest2]
|NormalDistribution class
| Two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test.
|-
|-
|stdnormal_inv
| [https://gnu-octave.github.io/statistics/levene_test.html levene_test]
|NormalDistribution class
| Perform a Levene's test for the homogeneity of variances.
|-
|-
|stdnormal_pdf
| [https://gnu-octave.github.io/statistics/manova1.html manova1]
|NormalDistribution class
| One-way multivariate analysis of variance (MANOVA).
|-
|-
|stdnormal_rnd
| [https://gnu-octave.github.io/statistics/multcompare.html multcompare]
|NormalDistribution class
| Perform posthoc multiple comparison tests or p-value adjustments to control the family-wise error rate (FWER) or false discovery rate (FDR).
|-
|-
|anova
| [https://gnu-octave.github.io/statistics/ranksum.html ranksum]
|anova method in different *Model classes
| Wilcoxon rank sum test for equal medians.  This test is equivalent to a Mann-Whitney U-test.
|-
|-
|manova
| [https://gnu-octave.github.io/statistics/regression_ftest.html regression_ftest]
|manova methods in different *Model classes
| F-test for General Linear Regression Analysis
|-
|-
|bartlett_test
| [https://gnu-octave.github.io/statistics/regression_ttest.html regression_ttest]
|barttest
| Perform a linear regression t-test.
|-
|-
|kolmogorov_smirnov_test
| [https://gnu-octave.github.io/statistics/runstest.html runstest]
|ktest
| Runs test for detecting serial correlation in the vector X.
|-
|-
|kolmogorov_smirnov_test_2
| [https://gnu-octave.github.io/statistics/sampsizepwr.html sampsizepwr]
|ktest2
| Sample size and power calculation for hypothesis test.
|-
|-
|kruskal_wallis_test
| [https://gnu-octave.github.io/statistics/signtest.html signtest]
|kruskalwallis
| Test for median.
|-
| [https://gnu-octave.github.io/statistics/ttest.html ttest]
| Test for mean of a normal sample with unknown variance or a paired-sample t-test.
|-
| [https://gnu-octave.github.io/statistics/ttest2.html ttest2]
| Perform a two independent samples t-test.
|-
| [https://gnu-octave.github.io/statistics/vartest.html vartest]
| One-sample test of variance.
|-
| [https://gnu-octave.github.io/statistics/vartest2.html vartest2]
| Two-sample F test for equal variances.
|-
| [https://gnu-octave.github.io/statistics/vartestn.html vartestn]
| Test for equal variances across multiple groups.
|-
| [https://gnu-octave.github.io/statistics/ztest.html ztest]
| One-sample Z-test.
|-
| [https://gnu-octave.github.io/statistics/ztest2.html ztest2]
| Two proportions Z-test.
|}
|}


=== Ready to go ===
=== TODO list ===
 
Missing functions:
 
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>fishertest</code>
* <code>meanEffectSize</code>
</div>
 
== Plotting ==


These functions seem to be matlab compatible
=== Available functions ===


<div style="column-count:4;-moz-column-count:4;-webkit-column-count:4">
The following table lists the available functions for plotting data.
* anovan
 
* betastat
{| class="wikitable"
* binostat
! Function
* binotest
! Description
* canoncorr
|-
* caseread
| [https://gnu-octave.github.io/statistics/boxplot.html boxplot]
* casewrite
| Produce a box plot.
* cdf
|-
* chi2stat
| [https://gnu-octave.github.io/statistics/cdfplot.html cdfplot]
* cmdscale
| Display an empirical cumulative distribution function.
* combnk
|-
* copulacdf
| [https://gnu-octave.github.io/statistics/confusionchart.html confusionchart]
* copulapdf
| Display a chart of a confusion matrix.
* copularnd
|-
* crossval
| [https://gnu-octave.github.io/statistics/dendrogram.html dendrogram]
* @cvpartition
| Plot a dendrogram of a hierarchical binary cluster tree.
* dendrogram
|-
* expstat
| [https://gnu-octave.github.io/statistics/ecdf.html ecdf]
* ff2n
| Empirical (Kaplan-Meier) cumulative distribution function.
* fitgmdist
|-
* fstat
| [https://gnu-octave.github.io/statistics/gscatter.html gscatter]
* fullfact
| Draw a scatter plot with grouped data.
* gamfit
|-
* gamlike
| [https://gnu-octave.github.io/statistics/histfit.html histfit]
* gamstat
| Plot histogram with superimposed fitted normal density.
* geomean
|-
* geostat
| [https://gnu-octave.github.io/statistics/hist3.html hist3]
* gevcdf
| Produce bivariate (2D) histogram counts or plots.
* gevfit
|-
* gevinv
| [https://gnu-octave.github.io/statistics/manovacluster.html manovacluster]
* gevlike
| Cluster group means using manova1 output.
* gevpdf
|-
* gevrnd
| [https://gnu-octave.github.io/statistics/normplot.html normplot]
* gevstat
| Produce normal probability plot of the data.
* gmdistribution
|-
* grp2idx
| [https://gnu-octave.github.io/statistics/ppplot.html ppplot]
* harmmean
| Perform a PP-plot (probability plot).
* hist3
|-
* histfit
| [https://gnu-octave.github.io/statistics/qqplot.html qqplot]
* hmmestimate
| Perform a QQ-plot (quantile plot).
* hmmgenerate
|-
* hmmviterbi
| [https://gnu-octave.github.io/statistics/silhouette.html silhouette]
* hygestat
| Compute the silhouette values of clustered data and show them on a plot.
* iwishrnd
|-
* jackknife
| [https://gnu-octave.github.io/statistics/violin.html violin]
* kmeans
| Produce a Violin plot of the data.
* linkage
|-
* lognstat
| [https://gnu-octave.github.io/statistics/wblplot.html wblplot]
* mad
| Plot a column vector DATA on a Weibull probability plot using rank regression.
* mahal
|}
* mnpdf
 
* mnrnd
=== TODO list ===
* mvncdf
 
* mvnpdf
Missing functions:
* mvnrnd
 
* mvtcdf
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* mvtpdf
* <code>andrewsplot</code>
* mvtrnd
* <code>bar3</code>
* nanmax
* <code>bar3h</code>
* nanmean
* <code>glyphplot</code>
* nanmedian
* <code>gplotmatrix</code>
* nanmin
* <code>parallelcoords</code>
* nanstd
* nansum
* nanvar
* nbinstat
* normplot
* normstat
* pcacov
* pcares
* pdf
* pdist2
* pdist
* plsregress
* poisstat
* random
* randsample
* raylcdf
* raylinv
* raylpdf
* raylrnd
* raylstat
* regress
* signtest
* squareform
* stepwisefit
* tabulate
* tblread
* tblwrite
* trimmean
* tstat
* ttest2
* ttest
* unidstat
* unifstat
* vartest2
* vartest
* wblstat
* wishrnd
* ztest
* prctile
* qqplot
* betacdf
* betainv
* betapdf
* betarnd
* binocdf
* binoinv
* binopdf
* binornd
* chi2cdf
* chi2inv
* chi2pdf
* chi2rnd
* expcdf
* expinv
* exppdf
* exprnd
* fcdf
* finv
* fpdf
* frnd
* gamcdf
* gaminv
* gampdf
* gamrnd
* geocdf
* geoinv
* geopdf
* geornd
* hygecdf
* hygeinv
* hygepdf
* hygernd
* logncdf
* logninv
* lognpdf
* lognrnd
* nbincdf
* nbininv
* nbinpdf
* nbinrnd
* normcdf
* norminv
* normpdf
* normrnd
* poisscdf
* poissinv
* poisspdf
* poissrnd
* tcdf
* tinv
* tpdf
* trnd
* unidcdf
* unidinv
* unidpdf
* unidrnd
* unifcdf
* unifinv
* unifpdf
* unifrnd
* wblcdf
* wblinv
* wblpdf
* wblrnd
</div>
</div>


== Development ==
== Regression ==


Follows an incomplete list of stuff missing in the statistics package to be matlab compatible. Bugs are not listed here, [https://savannah.gnu.org/bugs/?func=search&group=octave search] and [https://savannah.gnu.org/bugs/?func=additem&group=octave report] them on the bug tracker instead.
=== Available functions ===


{{Note|this entire section is about the current development version. If a Matlab function is missing from the list and does not appear on the current release of the package, confirm that is also missing in the [https://sourceforge.net/p/octave/statistics/ development sources] before adding it.}}
The following table lists the available functions for regression analysis.


=== Missing functions ===
{| class="wikitable"
<div style="column-count:4;-moz-column-count:4;-webkit-column-count:4">
! Function
* ClassificationBaggedEnsemble
! Description
* ClassificationDiscriminant
|-
* ClassificationDiscriminant.fit
| [https://gnu-octave.github.io/statistics/canoncorr.html canoncorr]
* ClassificationEnsemble
| Canonical correlation analysis.
* ClassificationKNN
|-
* ClassificationKNN.fit
| [https://gnu-octave.github.io/statistics/cholcov.html cholcov]
* ClassificationPartitionedEnsemble
| Cholesky-like decomposition for covariance matrix.
* ClassificationPartitionedModel
|-
* ClassificationTree
| [https://gnu-octave.github.io/statistics/dcov.html dcov]
* ClassificationTree.fit
| Distance correlation, covariance and correlation statistics.
* CompactClassificationDiscriminant
|-
* CompactClassificationEnsemble
| [https://gnu-octave.github.io/statistics/logistic_regression.html logistic_regression]
* CompactClassificationTree
| Perform ordinal logistic regression.
* CompactRegressionEnsemble
|-
* CompactRegressionTree
| [https://gnu-octave.github.io/statistics/monotone_smooth.html monotone_smooth]
* CompactTreeBagger
| Produce a smooth monotone increasing approximation to a sampled functional dependence.
* ExhaustiveSearcher
|-
* GeneralizedLinearModel
| [https://gnu-octave.github.io/statistics/pca.html pca]
* GeneralizedLinearModel.fit
| Performs a principal component analysis on a data matrix.
* GeneralizedLinearModel.stepwise
|-
* KDTreeSearcher
| [https://gnu-octave.github.io/statistics/pcacov.html pcacov]
* LinearMixedModel
| Perform principal component analysis on the NxN covariance matrix X
* LinearMixedModel.fit
|-
* LinearMixedModel.fitmatrix
| [https://gnu-octave.github.io/statistics/pcares.html pcares]
* LinearModel
| Calculate residuals from principal component analysis.
* LinearModel.fit
|-
* LinearModel.stepwise
| [https://gnu-octave.github.io/statistics/plsregress.html plsregress]
* NaiveBayes
| Calculate partial least squares regression using SIMPLS algorithm.
* NaiveBayes.fit
|-
* NonLinearModel
| [https://gnu-octave.github.io/statistics/princomp.html princomp]
* NonLinearModel.fit
| Performs a principal component analysis on a NxP data matrix.
* ProbDistUnivKernel
|-
* ProbDistUnivParam
| [https://gnu-octave.github.io/statistics/regress.html regress]
* RegressionBaggedEnsemble
| Multiple Linear Regression using Least Squares Fit.
* RegressionEnsemble
|-
* RegressionPartitionedEnsemble
| [https://gnu-octave.github.io/statistics/regress_gp.html regress_gp]
* RegressionPartitionedModel
| Linear scalar regression using gaussian processes.
* RegressionTree
|-
* RegressionTree.fit
| [https://gnu-octave.github.io/statistics/stepwisefit.html stepwisefit]
* TreeBagger
| Linear regression with stepwise variable selection.
* addTerms
|}
* addedvarplot
 
* addlevels
=== TODO list ===
* adtest
 
* andrewsplot
Missing functions:
* anova1
 
* anova2
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* ansaribradley
* <code>glmfit</code>
* aoctool
* <code>glmval</code>
* barttest
* <code>mnrfit</code>
* bbdesign
* <code>mnrval</code>
* betafit
* betalike
* binofit
* biplot
* bootci
* bootstrp
* candexch
* candgen
* capability
* capaplot
* ccdesign
* cdfplot
* cell2dataset
* chi2gof
* cholcov
* classify
* classregtree
* cluster
* clusterdata
* clustering.evaluation.CalinskiHarabaszEvaluation
* clustering.evaluation.DaviesBouldinEvaluation
* clustering.evaluation.GapEvaluation
* clustering.evaluation.SilhouetteEvaluation
* coefCI
* coefTest
* compact
* compare
* confusionmat
* controlchart
* controlrules
* cophenet
* copulafit
* copulaparam
* copulastat
* cordexch
* corrcov
* covarianceParameters
* coxphfit
* createns
* crosstab
* datasample
* dataset
* dataset2cell
* dataset2struct
* dataset2table
* datasetfun
* daugment
* dcovary
* designMatrix
* devianceTest
* dfittool
* disttool
* droplevels
* dummyvar
* dwtest
* ecdf
* ecdfhist
* evalclusters
* evcdf
* evfit
* evinv
* evlike
* evpdf
* evrnd
* evstat
* expfit
* explike
* export
* factoran
* fitdist
* fitensemble
* fitglm
* fitlm
* fitlme
* fitlmematrix
* fitnlm
* fitted
* fixedEffects
* fracfact
* fracfactgen
* friedman
* fsurfht
* gagerr
* getlabels
* getlevels
* gline
* glmfit
* glmval
* glyphplot
* gname
* gpcdf
* gpfit
* gpinv
* gplike
* gplotmatrix
* gppdf
* gprnd
* gpstat
* grpstats
* gscatter
* haltonset
* hmmdecode
* hmmtrain
* hougen
* icdf
* inconsistent
* interactionplot
* invpred
* islevel
* ismissing
* isundefined
* jbtest
* johnsrnd
* join
* knnsearch
* kruskalwallis
* ksdensity
* kstest
* kstest2
* labels
* lasso
* lassoPlot
* lassoglm
* levelcounts
* leverage
* lhsdesign
* lhsnorm
* lillietest
* linhyptest
* lognfit
* lognlike
* lsline
* mahal
* maineffectsplot
* makedist
* manova1
* manovacluster
* mat2dataset
* mdscale
* mergelevels
* mhsample
* mle
* mlecov
* mnrfit
* mnrval
* multcompare
* multivarichart
* mvregress
* mvregresslike
* nancov
* nbinfit
* ncfcdf
* ncfinv
* ncfpdf
* ncfrnd
* ncfstat
* nctcdf
* nctinv
* nctpdf
* nctrnd
* nctstat
* ncx2cdf
* ncx2inv
* ncx2pdf
* ncx2rnd
* ncx2stat
* negloglik
* nlinfit
* nlintool
* nlmefit
* nlmefitsa
* nlparci
* nlpredci
* nnmf
* nominal
* normfit
* normlike
* normspec
* optimalleaforder
* ordinal
* parallelcoords
* paramci
* paretotails
* partialcorr
* partialcorri
* pca
* pdf
* pearsrnd
* perfcurve
* plotAdded
* plotAdjustedResponse
* plotDiagnostics
* plotEffects
* plotInteraction
* plotResiduals
* plotSlice
* poissfit
* polytool
* ppca
* predict
* prob.BetaDistribution
* prob.BinomialDistribution
* prob.BirnbaumSaundersDistribution
* prob.BurrDistribution
* prob.ExponentialDistribution
* prob.ExtremeValueDistribution
* prob.GammaDistribution
* prob.GeneralizedExtremeValueDistribution
* prob.GeneralizedParetoDistribution
* prob.InverseGaussianDistribution
* prob.KernelDistribution
* prob.LogisticDistribution
* prob.LoglogisticDistribution
* prob.LognormalDistribution
* prob.MultinomialDistribution
* prob.NakagamiDistribution
* prob.NegativeBinomialDistribution
* prob.NormalDistribution
* prob.PiecewiseLinearDistribution
* prob.PoissonDistribution
* prob.RayleighDistribution
* prob.RicianDistribution
* prob.TriangularDistribution
* prob.UniformDistribution
* prob.WeibullDistribution
* prob.tLocationScaleDistribution
* probplot
* procrustes
* proflik
* qrandset
* qrandstream
* randomEffects
* randtool
* rangesearch
* ranksum
* raylfit
* rcoplot
* refcurve
* refline
* regstats
* relieff
* removeTerms
* residuals
* response
* ridge
* robustdemo
* robustfit
* rotatefactors
* rowexch
* rsmdemo
* rstool
* sampsizepwr
* scatterhist
* sequentialfs
* setlabels
* signrank
* silhouette
* slicesample
* sobolset
* statget
* statset
* step
* stepwise
* stepwiseglm
* stepwiselm
* struct2dataset
* surfht
* svmclassify
* svmtrain
* table2dataset
* tabulate
* tdfread
* tiedrank
* truncate
* unifit
* vartestn
* wblfit
* wbllike
* wblplot
* x2fx
* xptread
</div>
</div>


=== Missing options ===
== Wrappers ==
 
=== Available functions ===
 
Functions available for wrapping other functions or group of functions.
 
{| class="wikitable"
! Function
! Description
|-
| [https://gnu-octave.github.io/statistics/cdf.html cdf]
| This is a wrapper for the NAMEcdf and NAME_cdf functions available in the statistics package.
|-
| [https://gnu-octave.github.io/statistics/icdf.html icdf]
| This is a wrapper for the NAMEinv and NAME_inv functions available in the statistics package.
|-
| [https://gnu-octave.github.io/statistics/pdf.html pdf]
| This is a wrapper for the NAMEpdf and NAME_pdf functions available in the statistics package.
|-
| [https://gnu-octave.github.io/statistics/random.html random]
| Generates pseudo-random numbers from a given one-, two-, or three-parameter distribution.
|}


[[Category:Octave Forge]]
[[Category:Packages]]
[[Category:Missing functions]]
[[Category:Missing functions]]

Latest revision as of 01:31, 24 July 2024

The statistics package is part of the Octave Packages. Since version 1.5.0, the statistics package requires Octave version 6.1 or higher. From Octave v7.2 or later, you can install the latest statistics package (currently 1.5.3) with the following command:

pkg install -forge statistics

The following sections provide an overview of the functions available in the statistics package sorted alphabetically and arranged in groups similarly to the package's INDEX file. the TODO subsections are only informative of the current development plans for the forthcoming releases and they are not intended for reporting bugs, missing features or incompatibilities. Please report these in the statistics repository at GitHub.

Clustering[edit]

Available functions[edit]

The following table lists the available functions for clustering data.

Function Description
cluster Define clusters from an agglomerative hierarchical cluster tree.
clusterdata Wrapper function for 'linkage' and 'cluster'.
cmdscale Classical multidimensional scaling of a matrix.
confusionmat Compute a confusion matrix for classification problems.
ConfusionMatrixChart Compute a ConfusionMatrixChart class object.
cophenet Compute the cophenetic correlation coefficient.
evalclusters Create a clustering evaluation object to find the optimal number of clusters.
inconsistent Compute the inconsistency coefficient for each link of a hierarchical cluster tree.
kmeans Perform a K-means clustering of an NxD matrix.
linkage Produce a hierarchical clustering dendrogram.
mahal Mahalanobis' D-square distance.
mhsample Draws NSAMPLES samples from a target stationary distribution PDF using Metropolis-Hastings algorithm.
optimalleaforder Compute the optimal leaf ordering of a hierarchical binary cluster tree.
pdist Return the distance between any two rows in X.
pdist2 Compute pairwise distance between two sets of vectors.
procrustes Procrustes Analysis.
slicesample Draws NSAMPLES samples from a target stationary distribution PDF using slice sampling of Radford M. Neal.
squareform Interchange between distance matrix and distance vector formats.

TODO list[edit]

Missing functions:

Data Manipulation[edit]

Available functions[edit]

The following table lists the available functions for data manipulation.

Function Description
combnk Return all combinations of K elements in DATA.
crosstab Create a cross-tabulation (contingency table) T from data vectors.
datasample Randomly sample data.
fillmissing Replace missing entries of array A either with values in v or as determined by other specified methods.
grp2idx Get index for group variables.
ismissing Find missing data in a numeric or string array.
normalise_distribution Transform a set of data so as to be N(0,1) distributed according to an idea by van Albada and Robinson.
rmmissing Remove missing or incomplete data from an array.
standardizeMissing Replace data values specified by indicator in A by the standard ’missing’ data value for that data type.
tabulate Compute a frequency table.

Descriptive Statistics[edit]

Available functions[edit]

The following table lists the available functions for descriptive statistics.

Function Description
cl_multinom Confidence level of multinomial portions.
geomean Compute the geometric mean.
grpstats Compute summary statistics by group. Fully MATLAB compatible.
harmmean Compute the harmonic mean.
jackknife Compute jackknife estimates of a parameter taking one or more given samples as parameters.
mean Compute the mean. Fully MATLAB compatible.
median Compute the median. Fully MATLAB compatible.
nanmax Find the maximal element while ignoring NaN values.
nanmin Find the minimal element while ignoring NaN values.
nansum Compute the sum while ignoring NaN values.
std Compute the standard deviation. Fully MATLAB compatible.
trimmean Compute the trimmed mean.
std Compute the variance. Fully MATLAB compatible.

In external packages[edit]

bootci, bootstrp are implemented in the statistics-resampling package.

Shadowing Octave core functions[edit]

The following functions will shadow the respective core functions until Octave 9.

  • mean
  • median
  • std
  • var

TODO list[edit]

Update trimmean function to be fully MATLAB compatible.

Re-introduce the nan* functions implemented in C++ with the "all" and "vecdim" options.

Re-implement the following functions from core Octave, as shadowing functions with updated functionality regarding the "all", "omitnan", and "vecdim" options, with the intend to be included in Octave 9.

  • cov
  • mad
  • meansq
  • mode
  • moment

Distributions[edit]

Available functions[edit]

The following table lists the cdf, icdf, pdf, and random functions available in the statistics package. Since version 1.5.3, all CDFs support the "upper" option for evaluating the complement of the respective CDF.

Note! The icdf wrapper for the quantile functions is not implemented yet.

Distribution Name Cumulative Distribution Function Quantile Function Probability Density Function Random Generator
Birnbaum–Saunders bbscdf bbsinv bbspdf bbsrnd
Beta betacdf betainv betapdf betarndbivariate
[Binomial binocdf binoinv binopdf binornd
Bivariate Normal bvncdf
Bivariate Student's t bvtcdf
Burr Type XII burrcdf burrinv burrpdf burrrnd
Cauchy cauchy_cdf cauchy_inv cauchy_pdf cauchy_rnd
Chi-squared chi2cdf chi2inv chi2pdf chi2rnd
Copula Family copulacdf copulainv copulapdf copularnd
Extreme Value evcdf evinv evpdf evrnd
Exponential expcdf expinv exppdf exprnd
F fcdf finv fpdf frnd
Gamma gamcdf gaminv gampdf gamrnd
Geometric geocdf geoinv geopdf geornd
Generalized Extreme Value gevcdf gevinv gevpdf gevrnd
Generalized Pareto gpcdf gpinv gppdf gprnd
Hypergeometric hygecdf hygeinv hygepdf hygernd
Inverse-Wishart iwishpdf iwishrnd
Johnson's SU jsucdf jsupdf
Laplace laplace_cdf laplace_inv laplace_pdf laplace_rnd
Logistic logistic_cdf logistic_inv logistic_pdf logistic_rnd
Log-normal logncdf logninv lognpdf lognrnd
Multinomial mnpdf mnrnd
Multivariate Normal mvncdf mvninv mvnpdf mvnrnd
Multivariate Student's t mvtcdf mvtcdfqmc mvtinv mvtpdf mvtrnd
Nakagami nakacdf nakainv nakapdf nakarnd
Negative Binomial nbincdf nbininv nbinpdf nbinrnd
Noncentral F ncfcdf ncfinv ncfpdf ncfrnd
Noncentral Student's t nctcdf nctinv nctpdf nctrnd
Noncentral Chi-squared ncx2cdf ncx2inv ncx2pdf ncx2rnd
Normal normcdf norminv normpdf normrnd
Poisson poisscdf poissinv poisspdf poissrnd
Rayleigh raylcdf raylinv raylpdf raylrnd
Standard Normal stdnormal_cdf stdnormal_inv stdnormal_pdf stdnormal_rnd
Student's t tcdf tinv tpdf trnd
Triangular tricdf triinv tripdf trirnd
Discrete Uniform unidcdf unidinv unidpdf unidrnd
Continuous Uniform unifcdf unifinv unifpdf unifrnd
von Mises vmcdf vmpdf vmrnd
Weibull wblcdf wblinv wblpdf wblrnd
Wiener process wienrnd
Wishart wishpdf wishrnd

Distribution Fitting[edit]

Functions available for estimating parameters and the negative log-likelihood for certain distributions.

Distribution Name Parameter Estimation Negativel Log-likelihood
Extreme Value evfit evlike
Exponential expfit explike
Gamma gamfit gamlike
Generalized Extreme Value gevfit_lmom gevfit gevlike
Generalized Pareto gpfit gplike
Normal normlike

Distribution Statistics[edit]

Functions available for computing mean and variance from distribution parameters.

  • betastat
  • binostat
  • chi2stat
  • evstat
  • expstat
  • fstat
  • gamstat
  • geostat
  • gevstat
  • gpstat
  • hygestat
  • lognstat
  • nbinstat
  • ncfstat
  • nctstat
  • ncx2stat
  • normstat
  • poisstat
  • raylstat
  • fitgmdist
  • tstat
  • unidstat
  • unifstat
  • wblstat

Experimental Design[edit]

Available functions[edit]

Functions available for computing design matrices.

Function Description
fullfact Full factorial design.
ff2n Two-level full factorial design.
sigma_pts Calculates 2*N+1 sigma points in N dimensions.
x2fx Convert predictors to design matrix.

Machine Learning[edit]

Available functions[edit]

The following table lists the available functions.

Function Description
hmmestimate Estimation of a hidden Markov model for a given sequence.
hmmgenerate Output sequence and hidden states of a hidden Markov model.
hmmviterbi Viterbi path of a hidden Markov model.
svmpredict Perform a K-means clustering of an NxD matrix.
svmtrain Produce a hierarchical clustering dendrogram.

TODO list[edit]

Update svmpredict and svmtrain to libsvm 3.0.

Missing functions:

  • hmmdecode
  • hmmtrain

Model Fitting[edit]

Available functions[edit]

Functions available for fitting or evaluating statistical models.

Function Description
crossval Perform cross validation on given data.
fitgmdist Fit a Gaussian mixture model with K components to DATA.
fitlm Regress the continuous outcome (i.e. dependent variable) Y on continuous or categorical predictors (i.e. independent variables) X by minimizing the sum-of-squared residuals.

Cross Validation[edit]

Class of set partitions for cross-validation, used in crossval

  • @cvpartition/cvpartition
  • @cvpartition/display
  • @cvpartition/get
  • @cvpartition/repartition
  • @cvpartition/set
  • @cvpartition/test
  • @cvpartition/training

TODO list[edit]

Missing functions:

  • anova
  • manova

Hypothesis Testing[edit]

Available functions[edit]

Functions available for hypothesis testing

Function Description
adtest Anderson-Darling goodness-of-fit hypothesis test.
anova1 Perform a one-way analysis of variance (ANOVA)
anova2 Performs two-way factorial (crossed) or a nested analysis of variance (ANOVA) for balanced designs.
anovan Perform a multi (N)-way analysis of (co)variance (ANOVA or ANCOVA) to evaluate the effect of one or more categorical or continuous predictors (i.e. independent variables) on a continuous outcome (i.e. dependent variable).
bartlett_test Perform a Bartlett test for the homogeneity of variances.
barttest Bartlett's test of sphericity for correlation.
binotest Test for probability P of a binomial sample
chi2gof Chi-square goodness-of-fit test.
chi2test Perform a chi-squared test (for independence or homogeneity).
correlation_test Perform a correlation coefficient test whether two samples x and y come from uncorrelated populations.
fishertest Fisher’s exact test.
friedman Performs the nonparametric Friedman's test to compare column effects in a two-way layout.
hotelling_t2test Compute Hotelling's T^2 ("T-squared") test for a single sample or two dependent samples (paired-samples).
hotelling_t2test2 Compute Hotelling's T^2 ("T-squared") test for two independent samples.
kruskalwallis Perform a Kruskal-Wallis test, the non-parametric alternative of a one-way analysis of variance (ANOVA).
kstest Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis test.
kstest2 Two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test.
levene_test Perform a Levene's test for the homogeneity of variances.
manova1 One-way multivariate analysis of variance (MANOVA).
multcompare Perform posthoc multiple comparison tests or p-value adjustments to control the family-wise error rate (FWER) or false discovery rate (FDR).
ranksum Wilcoxon rank sum test for equal medians. This test is equivalent to a Mann-Whitney U-test.
regression_ftest F-test for General Linear Regression Analysis
regression_ttest Perform a linear regression t-test.
runstest Runs test for detecting serial correlation in the vector X.
sampsizepwr Sample size and power calculation for hypothesis test.
signtest Test for median.
ttest Test for mean of a normal sample with unknown variance or a paired-sample t-test.
ttest2 Perform a two independent samples t-test.
vartest One-sample test of variance.
vartest2 Two-sample F test for equal variances.
vartestn Test for equal variances across multiple groups.
ztest One-sample Z-test.
ztest2 Two proportions Z-test.

TODO list[edit]

Missing functions:

  • fishertest
  • meanEffectSize

Plotting[edit]

Available functions[edit]

The following table lists the available functions for plotting data.

Function Description
boxplot Produce a box plot.
cdfplot Display an empirical cumulative distribution function.
confusionchart Display a chart of a confusion matrix.
dendrogram Plot a dendrogram of a hierarchical binary cluster tree.
ecdf Empirical (Kaplan-Meier) cumulative distribution function.
gscatter Draw a scatter plot with grouped data.
histfit Plot histogram with superimposed fitted normal density.
hist3 Produce bivariate (2D) histogram counts or plots.
manovacluster Cluster group means using manova1 output.
normplot Produce normal probability plot of the data.
ppplot Perform a PP-plot (probability plot).
qqplot Perform a QQ-plot (quantile plot).
silhouette Compute the silhouette values of clustered data and show them on a plot.
violin Produce a Violin plot of the data.
wblplot Plot a column vector DATA on a Weibull probability plot using rank regression.

TODO list[edit]

Missing functions:

  • andrewsplot
  • bar3
  • bar3h
  • glyphplot
  • gplotmatrix
  • parallelcoords

Regression[edit]

Available functions[edit]

The following table lists the available functions for regression analysis.

Function Description
canoncorr Canonical correlation analysis.
cholcov Cholesky-like decomposition for covariance matrix.
dcov Distance correlation, covariance and correlation statistics.
logistic_regression Perform ordinal logistic regression.
monotone_smooth Produce a smooth monotone increasing approximation to a sampled functional dependence.
pca Performs a principal component analysis on a data matrix.
pcacov Perform principal component analysis on the NxN covariance matrix X
pcares Calculate residuals from principal component analysis.
plsregress Calculate partial least squares regression using SIMPLS algorithm.
princomp Performs a principal component analysis on a NxP data matrix.
regress Multiple Linear Regression using Least Squares Fit.
regress_gp Linear scalar regression using gaussian processes.
stepwisefit Linear regression with stepwise variable selection.

TODO list[edit]

Missing functions:

  • glmfit
  • glmval
  • mnrfit
  • mnrval

Wrappers[edit]

Available functions[edit]

Functions available for wrapping other functions or group of functions.

Function Description
cdf This is a wrapper for the NAMEcdf and NAME_cdf functions available in the statistics package.
icdf This is a wrapper for the NAMEinv and NAME_inv functions available in the statistics package.
pdf This is a wrapper for the NAMEpdf and NAME_pdf functions available in the statistics package.
random Generates pseudo-random numbers from a given one-, two-, or three-parameter distribution.