Statistics package

The statistics package is part of the Octave Packages. Since version 1.5.0, the statistics package requires Octave version 6.1 or higher. From Octave v7.2 or later, you can install the latest statistics package (currently 1.5.3) with the following command:

pkg install -forge statistics

The following sections provide an overview of the functions available in the statistics package sorted alphabetically and arranged in groups similarly to the package's INDEX file. the TODO subsections are only informative of the current development plans for the forthcoming releases and they are not intended for reporting bugs, missing features or incompatibilities. Please report these in the statistics repository at GitHub.

Clustering

Available functions

The following table lists the available functions for clustering data.

Function	Description
cluster	Define clusters from an agglomerative hierarchical cluster tree.
clusterdata	Wrapper function for 'linkage' and 'cluster'.
cmdscale	Classical multidimensional scaling of a matrix.
confusionmat	Compute a confusion matrix for classification problems.
ConfusionMatrixChart	Compute a ConfusionMatrixChart class object.
cophenet	Compute the cophenetic correlation coefficient.
evalclusters	Create a clustering evaluation object to find the optimal number of clusters.
inconsistent	Compute the inconsistency coefficient for each link of a hierarchical cluster tree.
kmeans	Perform a K-means clustering of an NxD matrix.
linkage	Produce a hierarchical clustering dendrogram.
mahal	Mahalanobis' D-square distance.
mhsample	Draws NSAMPLES samples from a target stationary distribution PDF using Metropolis-Hastings algorithm.
optimalleaforder	Compute the optimal leaf ordering of a hierarchical binary cluster tree.
pdist	Return the distance between any two rows in X.
pdist2	Compute pairwise distance between two sets of vectors.
procrustes	Procrustes Analysis.
slicesample	Draws NSAMPLES samples from a target stationary distribution PDF using slice sampling of Radford M. Neal.
squareform	Interchange between distance matrix and distance vector formats.

TODO list

Missing functions:

Data Manipulation

Available functions

The following table lists the available functions for data manipulation.

Function	Description
combnk	Return all combinations of K elements in DATA.
crosstab	Create a cross-tabulation (contingency table) T from data vectors.
datasample	Randomly sample data.
fillmissing	Replace missing entries of array A either with values in v or as determined by other specified methods.
grp2idx	Get index for group variables.
ismissing	Find missing data in a numeric or string array.
normalise_distribution	Transform a set of data so as to be N(0,1) distributed according to an idea by van Albada and Robinson.
rmmissing	Remove missing or incomplete data from an array.
standardizeMissing	Replace data values specified by indicator in A by the standard ’missing’ data value for that data type.
tabulate	Compute a frequency table.

Descriptive Statistics

Available functions

The following table lists the available functions for descriptive statistics.

Function	Description
cl_multinom	Confidence level of multinomial portions.
geomean	Compute the geometric mean.
grpstats	Compute summary statistics by group. Fully MATLAB compatible.
harmmean	Compute the harmonic mean.
jackknife	Compute jackknife estimates of a parameter taking one or more given samples as parameters.
mean	Compute the mean. Fully MATLAB compatible.
median	Compute the median. Fully MATLAB compatible.
nanmax	Find the maximal element while ignoring NaN values.
nanmin	Find the minimal element while ignoring NaN values.
nansum	Compute the sum while ignoring NaN values.
std	Compute the standard deviation. Fully MATLAB compatible.
trimmean	Compute the trimmed mean.
std	Compute the variance. Fully MATLAB compatible.

In external packages

bootci, bootstrp are implemented in the statistics-resampling package.

Shadowing Octave core functions

The following functions will shadow the respective core functions until Octave 9.

mean
median
std
var

TODO list

Update trimmean function to be fully MATLAB compatible.

Re-introduce the nan* functions implemented in C++ with the "all" and "vecdim" options.

Re-implement the following functions from core Octave, as shadowing functions with updated functionality regarding the "all", "omitnan", and "vecdim" options, with the intend to be included in Octave 9.

cov
mad
meansq
mode
moment

Distributions

Available functions

The following table lists the cdf, icdf, pdf, and random functions available in the statistics package. Since version 1.5.3, all CDFs support the "upper" option for evaluating the complement of the respective CDF.

Note! The icdf wrapper for the quantile functions is not implemented yet.

Distribution Name	Cumulative Distribution Function	Quantile Function	Probability Density Function	Random Generator
Birnbaum–Saunders	bbscdf	bbsinv	bbspdf	bbsrnd
Beta	betacdf	betainv	betapdf	betarndbivariate
[Binomial	binocdf	binoinv	binopdf	binornd
Bivariate Normal	bvncdf
Bivariate Student's t	bvtcdf
Burr Type XII	burrcdf	burrinv	burrpdf	burrrnd
Cauchy	cauchy_cdf	cauchy_inv	cauchy_pdf	cauchy_rnd
Chi-squared	chi2cdf	chi2inv	chi2pdf	chi2rnd
Copula Family	copulacdf	copulainv	copulapdf	copularnd
Extreme Value	evcdf	evinv	evpdf	evrnd
Exponential	expcdf	expinv	exppdf	exprnd
F	fcdf	finv	fpdf	frnd
Gamma	gamcdf	gaminv	gampdf	gamrnd
Geometric	geocdf	geoinv	geopdf	geornd
Generalized Extreme Value	gevcdf	gevinv	gevpdf	gevrnd
Generalized Pareto	gpcdf	gpinv	gppdf	gprnd
Hypergeometric	hygecdf	hygeinv	hygepdf	hygernd
Inverse-Wishart			iwishpdf	iwishrnd
Johnson's SU	jsucdf		jsupdf
Laplace	laplace_cdf	laplace_inv	laplace_pdf	laplace_rnd
Logistic	logistic_cdf	logistic_inv	logistic_pdf	logistic_rnd
Log-normal	logncdf	logninv	lognpdf	lognrnd
Multinomial			mnpdf	mnrnd
Multivariate Normal	mvncdf	mvninv	mvnpdf	mvnrnd
Multivariate Student's t	mvtcdf mvtcdfqmc	mvtinv	mvtpdf	mvtrnd
Nakagami	nakacdf	nakainv	nakapdf	nakarnd
Negative Binomial	nbincdf	nbininv	nbinpdf	nbinrnd
Noncentral F	ncfcdf	ncfinv	ncfpdf	ncfrnd
Noncentral Student's t	nctcdf	nctinv	nctpdf	nctrnd
Noncentral Chi-squared	ncx2cdf	ncx2inv	ncx2pdf	ncx2rnd
Normal	normcdf	norminv	normpdf	normrnd
Poisson	poisscdf	poissinv	poisspdf	poissrnd
Rayleigh	raylcdf	raylinv	raylpdf	raylrnd
Standard Normal	stdnormal_cdf	stdnormal_inv	stdnormal_pdf	stdnormal_rnd
Student's t	tcdf	tinv	tpdf	trnd
Triangular	tricdf	triinv	tripdf	trirnd
Discrete Uniform	unidcdf	unidinv	unidpdf	unidrnd
Continuous Uniform	unifcdf	unifinv	unifpdf	unifrnd
von Mises	vmcdf		vmpdf	vmrnd
Weibull	wblcdf	wblinv	wblpdf	wblrnd
Wiener process				wienrnd
Wishart			wishpdf	wishrnd

Distribution Fitting

Functions available for estimating parameters and the negative log-likelihood for certain distributions.

Distribution Name	Parameter Estimation	Negativel Log-likelihood
Extreme Value	evfit	evlike
Exponential	expfit	explike
Gamma	gamfit	gamlike
Generalized Extreme Value	gevfit_lmom gevfit	gevlike
Generalized Pareto	gpfit	gplike
Normal		normlike

Distribution Statistics

Functions available for computing mean and variance from distribution parameters.

betastat
binostat
chi2stat
evstat
expstat
fstat
gamstat
geostat
gevstat
gpstat
hygestat
lognstat
nbinstat
ncfstat
nctstat
ncx2stat
normstat
poisstat
raylstat
fitgmdist
tstat
unidstat
unifstat
wblstat

Experimental Design

Available functions

Functions available for computing design matrices.

Function	Description
fullfact	Full factorial design.
ff2n	Two-level full factorial design.
sigma_pts	Calculates 2*N+1 sigma points in N dimensions.
x2fx	Convert predictors to design matrix.

Machine Learning

Available functions

The following table lists the available functions.

Function	Description
hmmestimate	Estimation of a hidden Markov model for a given sequence.
hmmgenerate	Output sequence and hidden states of a hidden Markov model.
hmmviterbi	Viterbi path of a hidden Markov model.
svmpredict	Perform a K-means clustering of an NxD matrix.
svmtrain	Produce a hierarchical clustering dendrogram.

TODO list

Update svmpredict and svmtrain to libsvm 3.0.

Missing functions:

hmmdecode
hmmtrain

Model Fitting

Available functions

Functions available for fitting or evaluating statistical models.

Function	Description
crossval	Perform cross validation on given data.
fitgmdist	Fit a Gaussian mixture model with K components to DATA.
fitlm	Regress the continuous outcome (i.e. dependent variable) Y on continuous or categorical predictors (i.e. independent variables) X by minimizing the sum-of-squared residuals.

Cross Validation

Class of set partitions for cross-validation, used in crossval

@cvpartition/cvpartition
@cvpartition/display
@cvpartition/get
@cvpartition/repartition
@cvpartition/set
@cvpartition/test
@cvpartition/training

TODO list

Missing functions:

anova
manova

Hypothesis Testing

Available functions

Functions available for hypothesis testing

Function	Description
adtest	Anderson-Darling goodness-of-fit hypothesis test.
anova1	Perform a one-way analysis of variance (ANOVA)
anova2	Performs two-way factorial (crossed) or a nested analysis of variance (ANOVA) for balanced designs.
anovan	Perform a multi (N)-way analysis of (co)variance (ANOVA or ANCOVA) to evaluate the effect of one or more categorical or continuous predictors (i.e. independent variables) on a continuous outcome (i.e. dependent variable).
bartlett_test	Perform a Bartlett test for the homogeneity of variances.
barttest	Bartlett's test of sphericity for correlation.
binotest	Test for probability P of a binomial sample
chi2gof	Chi-square goodness-of-fit test.
chi2test	Perform a chi-squared test (for independence or homogeneity).
correlation_test	Perform a correlation coefficient test whether two samples x and y come from uncorrelated populations.
fishertest	Fisher’s exact test.
friedman	Performs the nonparametric Friedman's test to compare column effects in a two-way layout.
hotelling_t2test	Compute Hotelling's T^2 ("T-squared") test for a single sample or two dependent samples (paired-samples).
hotelling_t2test2	Compute Hotelling's T^2 ("T-squared") test for two independent samples.
kruskalwallis	Perform a Kruskal-Wallis test, the non-parametric alternative of a one-way analysis of variance (ANOVA).
kstest	Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis test.
kstest2	Two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test.
levene_test	Perform a Levene's test for the homogeneity of variances.
manova1	One-way multivariate analysis of variance (MANOVA).
multcompare	Perform posthoc multiple comparison tests or p-value adjustments to control the family-wise error rate (FWER) or false discovery rate (FDR).
ranksum	Wilcoxon rank sum test for equal medians. This test is equivalent to a Mann-Whitney U-test.
regression_ftest	F-test for General Linear Regression Analysis
regression_ttest	Perform a linear regression t-test.
runstest	Runs test for detecting serial correlation in the vector X.
sampsizepwr	Sample size and power calculation for hypothesis test.
signtest	Test for median.
ttest	Test for mean of a normal sample with unknown variance or a paired-sample t-test.
ttest2	Perform a two independent samples t-test.
vartest	One-sample test of variance.
vartest2	Two-sample F test for equal variances.
vartestn	Test for equal variances across multiple groups.
ztest	One-sample Z-test.
ztest2	Two proportions Z-test.

TODO list

Missing functions:

fishertest
meanEffectSize

Plotting

Available functions

The following table lists the available functions for plotting data.

Function	Description
boxplot	Produce a box plot.
cdfplot	Display an empirical cumulative distribution function.
confusionchart	Display a chart of a confusion matrix.
dendrogram	Plot a dendrogram of a hierarchical binary cluster tree.
ecdf	Empirical (Kaplan-Meier) cumulative distribution function.
gscatter	Draw a scatter plot with grouped data.
histfit	Plot histogram with superimposed fitted normal density.
hist3	Produce bivariate (2D) histogram counts or plots.
manovacluster	Cluster group means using manova1 output.
normplot	Produce normal probability plot of the data.
ppplot	Perform a PP-plot (probability plot).
qqplot	Perform a QQ-plot (quantile plot).
silhouette	Compute the silhouette values of clustered data and show them on a plot.
violin	Produce a Violin plot of the data.
wblplot	Plot a column vector DATA on a Weibull probability plot using rank regression.

TODO list

Missing functions:

andrewsplot
bar3
bar3h
glyphplot
gplotmatrix
parallelcoords

Regression

Available functions

The following table lists the available functions for regression analysis.

Function	Description
canoncorr	Canonical correlation analysis.
cholcov	Cholesky-like decomposition for covariance matrix.
dcov	Distance correlation, covariance and correlation statistics.
logistic_regression	Perform ordinal logistic regression.
monotone_smooth	Produce a smooth monotone increasing approximation to a sampled functional dependence.
pca	Performs a principal component analysis on a data matrix.
pcacov	Perform principal component analysis on the NxN covariance matrix X
pcares	Calculate residuals from principal component analysis.
plsregress	Calculate partial least squares regression using SIMPLS algorithm.
princomp	Performs a principal component analysis on a NxP data matrix.
regress	Multiple Linear Regression using Least Squares Fit.
regress_gp	Linear scalar regression using gaussian processes.
stepwisefit	Linear regression with stepwise variable selection.

TODO list

Missing functions:

glmfit
glmval
mnrfit
mnrval

Wrappers

Available functions

Functions available for wrapping other functions or group of functions.

Function	Description
cdf	This is a wrapper for the NAMEcdf and NAME_cdf functions available in the statistics package.
icdf	This is a wrapper for the NAMEinv and NAME_inv functions available in the statistics package.
pdf	This is a wrapper for the NAMEpdf and NAME_pdf functions available in the statistics package.
random	Generates pseudo-random numbers from a given one-, two-, or three-parameter distribution.