Editing Statistics package

The [https://github.com/gnu-octave/statistics/ statistics package] is part of the [https://gnu-octave.github.io/packages/ Octave Packages]. Since version [https://github.com/gnu-octave/statistics/releases/tag/release-1.5.0 1.5.0], the statistics package requires Octave version 6.1 or higher. From Octave v7.2 or later, you can install the latest statistics package (currently 1.5.3) with the following command:

<code>pkg install -forge statistics</code>

The following sections provide an overview of the functions available in the statistics package sorted alphabetically and arranged in groups similarly to the package's INDEX file. the '''TODO''' subsections are only informative of the current development plans for the forthcoming releases and they are not intended for reporting bugs, missing features or incompatibilities. Please report these in the [https://github.com/gnu-octave/statistics statistics repository] at GitHub.

== Clustering ==

=== Available functions ===

The following table lists the available functions for clustering data.

{| class="wikitable"
! Function
! Description
|-
| cluster
| Define clusters from an agglomerative hierarchical cluster tree.
|-
| cmdscale
| Classical multidimensional scaling of a matrix.
|-
| confusionmat
| Compute a confusion matrix for classification problems.
|-
| cophenet
| Compute the cophenetic correlation coefficient.
|-
| evalclusters
| Create a clustering evaluation object to find the optimal number of clusters.
|-
| inconsistent
| Compute the inconsistency coefficient for each link of a hierarchical cluster tree.
|-
| kmeans
| Perform a K-means clustering of an NxD matrix.
|-
| linkage
| Produce a hierarchical clustering dendrogram.
|-
| mahal
| Mahalanobis' D-square distance.
|-
| mhsample
| Draws NSAMPLES samples from a target stationary distribution PDF using Metropolis-Hastings algorithm.
|-
| optimalleaforder
| Compute the optimal leaf ordering of a hierarchical binary cluster tree.
|-
| pdist
| Return the distance between any two rows in X.
|-
| pdist2
| Compute pairwise distance between two sets of vectors.
|-
| slicesample
| Draws NSAMPLES samples from a target stationary distribution PDF using slice sampling of Radford M. Neal.
|-
| squareform
| Interchange between distance matrix and distance vector formats.
|}

=== TODO list ===

Missing functions:

<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>procrustes</code>
</div>

== Data Manipulation ==

=== Available functions ===

The following table lists the available functions for data manipulation.

{| class="wikitable"
! Function
! Description
|-
| combnk
| Return all combinations of K elements in DATA.
|-
| crosstab
| Create a cross-tabulation (contingency table) T from data vectors.
|-
| datasample
| Randomly sample data.
|-
| grp2idx
| Get index for group variables.
|-
| tabulate
| Compute a frequency table.
|}

== Descriptive Statistics ==

=== Available functions ===

The following table lists the available functions for descriptive statistics. 

{| class="wikitable"
! Function
! Description
|-
| geomean
| Compute the geometric mean.
|-
| grpstats
| Compute summary statistics by group. Fully MATLAB compatible.
|-
| harmmean
| Compute the harmonic mean.
|-
| jackknife
| Compute jackknife estimates of a parameter taking one or more given samples as parameters.
|-
| mean
| Compute the mean. Fully MATLAB compatible.
|-
| median
| Compute the median. Fully MATLAB compatible.
|-
| nanmax
| Find the maximal element while ignoring NaN values.
|-
| nanmin
| Find the minimal element while ignoring NaN values.
|-
| nansum
| Compute the sum while ignoring NaN values.
|-
| std
| Compute the standard deviation. Fully MATLAB compatible.
|-
| trimmean
| Compute the trimmed mean.
|-
| std
| Compute the variance. Fully MATLAB compatible.
|}

=== In external packages ===

<code>bootci</code>, <code>bootstrp</code> are implemented in the [https://gnu-octave.github.io/packages/statistics-bootstrap statistics-bootstrap] package.

=== Shadowing Octave core functions ===

The following functions will shadow the respective core functions until Octave 9.

<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>mean</code>
* <code>median</code>
* <code>std</code>
* <code>var</code>
</div>

=== TODO list ===

Update <code>trimmean</code> function to be fully MATLAB compatible.

Re-introduce the <code>nan*</code> functions implemented in C++ with the <code>"all"</code> and <code>"vecdim"</code> options.

Re-implement the following functions from core Octave, as shadowing functions with updated functionality regarding the <code>"all"</code>, <code>"omitnan"</code>, and <code>"vecdim"</code> options, with the intend to be included in Octave 9.

<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>cov</code>
* <code>mad</code>
* <code>meansq</code>
* <code>mode</code>
* <code>moment</code>
</div>

== Distributions ==

=== Available functions ===

The following table lists the '''cdf''', '''icdf''', '''pdf''', and '''random''' functions available in the statistics package. Since version [https://github.com/gnu-octave/statistics/releases/tag/release-1.5.3 1.5.3], all CDFs support the "upper" option for evaluating the complement of the respective CDF.

Note! The '''icdf''' wrapper for the quantile functions is not implemented yet.

{| class="wikitable"
! Distribution Name
! Cumulative Distribution Function
! Quantile Function
! Probability Density Function
! Random Generator
|-
| [https://en.wikipedia.org/wiki/Birnbaum%E2%80%93Saunders_distribution Birnbaum–Saunders]
| bbscdf
| bbsinv
| bbspdf
| bbsrnd
|-
| [https://en.wikipedia.org/wiki/Beta_distribution Beta]
| betacdf
| betainv
| betapdf
| betarndbivariate 
|-
| [[https://en.wikipedia.org/wiki/Binomial_distribution Binomial]
| binocdf
| binoinv
| binopdf
| binornd 
|-
| [https://en.wikipedia.org/wiki/Joint_probability_distribution Bivariate]
| bvncdf
|
|
|
|-
| [https://www.mathworks.com/help/stats/burr-type-xii-distribution.html Burr Type XII]
| burrcdf
| burrinv
| burrpdf
| burrrnd
|-
| [https://en.wikipedia.org/wiki/Cauchy_distribution Cauchy]
| cauchy_cdf
| cauchy_inv
| cauchy_pdf
| cauchy_rnd
|-
| [https://en.wikipedia.org/wiki/Chi-squared_distribution Chi-squared]
| chi2cdf
| chi2inv
| chi2pdf
| chi2rnd
|-
| [https://en.wikipedia.org/wiki/Copula_(probability_theory) Copula Family]
| copulacdf
| copulainv
| copulapdf
| copularnd
|-
| [https://en.wikipedia.org/wiki/Gumbel_distribution Extreme Value]
| evcdf
| evinv
| evpdf
| evrnd
|-
| [https://en.wikipedia.org/wiki/Exponential_distribution Exponential]
| expcdf
| expinv
| exppdf
| exprnd
|-
| [https://en.wikipedia.org/wiki/F-distribution F]
| fcdf
| finv
| fpdf
| frnd
|-
| [https://en.wikipedia.org/wiki/Gamma_distribution Gamma]
| gamcdf
| gaminv
| gampdf
| gamrnd
|-
| [https://en.wikipedia.org/wiki/Geometric_distribution Geometric]
| geocdf
| geoinv
| geopdf
| geornd
|-
| [https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution Generalized Extreme Value]
| gevcdf
| gevinv
| gevpdf
| gevrnd
|-
| [https://en.wikipedia.org/wiki/Generalized_Pareto_distribution Generalized Pareto]
| gpcdf
| gpinv
| gppdf
| gprnd
|-
| [https://en.wikipedia.org/wiki/Hypergeometric_distribution Hypergeometric]
| hygecdf
| hygeinv
| hygepdf
| hygernd
|-
| [https://en.wikipedia.org/wiki/Inverse-Wishart_distribution Inverse-Wishart]
|
|
| iwishpdf
| iwishrnd
|-
| [https://en.wikipedia.org/wiki/Johnson%27s_SU-distribution Johnson's SU]
| jsucdf
|
| jsupdf
|
|-
| [https://en.wikipedia.org/wiki/Laplace_distribution Laplace]
| laplace_cdf
| laplace_inv
| laplace_pdf
| laplace_rnd
|-
| [https://en.wikipedia.org/wiki/Logistic_distribution Logistic]
| logistic_cdf
| logistic_inv
| logistic_pdf
| logistic_rnd
|-
| [https://en.wikipedia.org/wiki/Log-normal_distribution Log-normal]
| logncdf
| logninv
| lognpdf
| lognrnd
|-
| [https://en.wikipedia.org/wiki/Multinomial_distribution Multinomial]
|
|
| mnpdf
| mnrnd
|-
| [https://en.wikipedia.org/wiki/Multivariate_normal_distribution Multivariate Normal]
| mvncdf
| mvninv
| mvnpdf
| mvnrnd
|-
| [https://en.wikipedia.org/wiki/Multivariate_t-distribution Multivariate Student's T]
| mvtcdf mvtcdfqmc
| mvtinv
| mvtpdf
| mvtrnd
|-
| [https://en.wikipedia.org/wiki/Nakagami_distribution Nakagami]
| nakacdf
| nakainv
| nakapdf
| nakarnd
|-
| [https://en.wikipedia.org/wiki/Negative_binomial_distribution Negative Binomial]
| nbincdf
| nbininv
| nbinpdf
| nbinrnd
|-
| [https://en.wikipedia.org/wiki/Noncentral_F-distribution Noncentral F]
| ncfcdf
| ncfinv
| ncfpdf
| ncfrnd
|-
| [https://en.wikipedia.org/wiki/Noncentral_t-distribution Noncentral Student's T]
| nctcdf
| nctinv
| nctpdf
| nctrnd
|-
| [https://en.wikipedia.org/wiki/Noncentral_chi-squared_distribution Noncentral Chi-squared]
| ncx2cdf
| ncx2inv
| ncx2pdf
| ncx2rnd
|-
| [https://en.wikipedia.org/wiki/Normal_distribution Normal]
| normcdf
| norminv
| normpdf
| normrnd
|-
| [https://en.wikipedia.org/wiki/Poisson_distribution Poisson]
| poisscdf
| poissinv
| poisspdf
| poissrnd
|-
| [https://en.wikipedia.org/wiki/Rayleigh_distribution Rayleigh]
| raylcdf
| raylinv
| raylpdf
| raylrnd
|-
| [https://en.wikipedia.org/wiki/Normal_distribution#Standard_normal_distribution Standard Normal]
| stdnormal_cdf
| stdnormal_inv
| stdnormal_pdf
| stdnormal_rnd
|-
| [https://en.wikipedia.org/wiki/Student%27s_t-distribution Student's T]
| tcdf
| tinv
| tpdf
| trnd
|-
| [https://en.wikipedia.org/wiki/Triangular_distribution Triangular]
| tricdf
| triinv
| tripdf
| trirnd
|-
| [https://en.wikipedia.org/wiki/Discrete_uniform_distribution Discrete Uniform]
| unidcdf
| unidinv
| unidpdf
| unidrnd
|-
| [https://en.wikipedia.org/wiki/Continuous_uniform_distribution Continuous Uniform]
| unifcdf
| unifinv
| unifpdf
| unifrnd
|-
| [https://en.wikipedia.org/wiki/Von_Mises_distribution von Mises]
| vmcdf
|
| vmpdf
| vmrnd
|-
| [https://en.wikipedia.org/wiki/Weibull_distribution Weibull]
| wblcdf
| wblinv
| wblpdf
| wblrnd
|-
| [https://en.wikipedia.org/wiki/Wiener_process Wiener process]
|
|
|
| wienrnd
|-
| [https://en.wikipedia.org/wiki/Wishart_distribution Wishart]
|
|
| wishpdf
| wishrnd
|}

=== Distribution Fitting ===

Functions available for estimating parameters and the negative log-likelihood for certain distributions.

{| class="wikitable"
! Distribution Name
! Parameter Estimation
! Negativel Log-likelihood
|-
| Extreme Value
| evfit
| evlike
|-
| Exponential
| expfit
| explike
|-
| Gamma
| gamfit
| gamlike
|-
| Generalized Extreme Value
| gevfit_lmom gevfit
| gevlike
|-
| Generalized Pareto
| gpfit
| gplike
|-
| Normal
|
| normlike
|}

=== Distribution Statistics ===

Functions available for computing ''mean'' and ''variance'' from distribution parameters.

<div style="column-count:4;-moz-column-count:4;-webkit-column-count:4">
* <code>betastat</code>
* <code>binostat</code>
* <code>chi2stat</code>
* <code>evstat</code>
* <code>expstat</code>
* <code>fstat</code>
* <code>gamstat</code>
* <code>geostat</code>
* <code>gevstat</code>
* <code>gpstat</code>
* <code>hygestat</code>
* <code>lognstat</code>
* <code>nbinstat</code>
* <code>ncfstat</code>
* <code>nctstat</code>
* <code>ncx2stat</code>
* <code>normstat</code>
* <code>poisstat</code>
* <code>raylstat</code>
* <code>fitgmdist</code>
* <code>tstat</code>
* <code>unidstat</code>
* <code>unifstat</code>
* <code>wblstat</code>
</div>

== Experimental Design ==

Functions available for computing design matrices.

{| class="wikitable"
! Function
! Description
|-
| fullfact
| Full factorial design.
|-
| ff2n
| Two-level full factorial design.
|-
| sigma_pts
| Calculates 2*N+1 sigma points in N dimensions.
|-
| x2fx
| Convert predictors to design matrix.
|}

== Model Fitting ==

Functions available for fitting or evaluating statistical models. 

{| class="wikitable"
! Function
! Description
|-
| crossval
| Perform cross validation on given data.
|-
| fitgmdist
| Fit a Gaussian mixture model with K components to DATA.
|-
| fitlm
| Regress the continuous outcome (i.e.  dependent variable) Y on continuous or categorical predictors (i.e.  independent variables) X by minimizing the sum-of-squared residuals.
|}

=== Cross Validation ===

Class of set partitions for cross-validation, used in crossval

<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* @cvpartition/cvpartition
* @cvpartition/display
* @cvpartition/get
* @cvpartition/repartition
* @cvpartition/set
* @cvpartition/test
* @cvpartition/training
</div>

=== TODO list ===

Missing functions:

<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>anova</code>
* <code>manova</code>
</div>

== Hypothesis Testing ==

Functions available for hypothesis testing

{| class="wikitable"
! Function
! Description
|-
| adtest
| Anderson-Darling goodness-of-fit hypothesis test.
|-
| anova1
| Perform a one-way analysis of variance (ANOVA)
|-
| anova2
| Performs two-way factorial (crossed) or a nested analysis of variance (ANOVA) for balanced designs.
|-
| anovan
| Perform a multi (N)-way analysis of (co)variance (ANOVA or ANCOVA) to evaluate the effect of one or more categorical or continuous predictors (i.e.  independent variables) on a continuous outcome (i.e.  dependent variable).
|-
| bartlett_test
| Perform a Bartlett test for the homogeneity of variances.
|-
| barttest
| Bartlett's test of sphericity for correlation.
|-
| binotest
| Test for probability P of a binomial sample
|-
| chi2gof
| Chi-square goodness-of-fit test.
|-
| chi2test
| Perform a chi-squared test (for independence or homogeneity).
|-
| friedman
| Performs the nonparametric Friedman's test to compare column effects in a two-way layout.
|-
| hotelling_t2test
| Compute Hotelling's T^2 ("T-squared") test for a single sample or two dependent samples (paired-samples).
|-
| hotelling_t2test2
| Compute Hotelling's T^2 ("T-squared") test for two independent samples.
|-
| kruskalwallis
| Perform a Kruskal-Wallis test, the non-parametric alternative of a one-way analysis of variance (ANOVA).
|-
| kstest
| Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis test.
|-
| kstest2
| Two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test.
|-
| levene_test
| Perform a Levene's test for the homogeneity of variances.
|-
| manova1
| One-way multivariate analysis of variance (MANOVA).
|-
| multcompare
| Perform posthoc multiple comparison tests or p-value adjustments to control the family-wise error rate (FWER) or false discovery rate (FDR).
|-
| ranksum
| Wilcoxon rank sum test for equal medians.  This test is equivalent to a Mann-Whitney U-test.
|-
| regression_ftest
| F-test for General Linear Regression Analysis
|-
| regression_ttest
| Perform a linear regression t-test for the null hypothesis ''RR * B = R'' in a classical normal regression model ''Y = X * B + E''.
|-
| runstest
| Runs test for detecting serial correlation in the vector X.
|-
| sampsizepwr
| Sample size and power calculation for hypothesis test.
|-
| signtest
| Test for median.
|-
| ttest
| Test for mean of a normal sample with unknown variance or a paired-sample t-test.
|-
| ttest2
| Perform a two independent samples t-test.
|-
| vartest
| One-sample test of variance.
|-
| vartest2
| Two-sample F test for equal variances.
|-
| vartestn
| Test for equal variances across multiple groups.
|-
| ztest
| One-sample Z-test.
|}

=== TODO list ===

Missing functions:

<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>fishertest</code>
* <code>meanEffectSize</code>
</div>

== Machine Learning ==

=== Available functions ===

The following table lists the available functions.

{| class="wikitable"
! Function
! Description
|-
| hmmestimate
| Estimation of a hidden Markov model for a given sequence.
|-
| hmmgenerate
| Output sequence and hidden states of a hidden Markov model.
|-
| hmmviterbi
| Viterbi path of a hidden Markov model.
|-
| svmpredict
| Perform a K-means clustering of an NxD matrix.
|-
| svmtrain
| Produce a hierarchical clustering dendrogram.
|}

=== TODO list ===

Update <code>svmpredict</code> and <code>svmtrain</code> to libsvm 3.0.

Missing functions:

<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>hmmdecode</code>
* <code>hmmtrain</code>
</div>

== Plotting ==

=== Available functions ===

The following table lists the available functions for plotting data.

{| class="wikitable"
! Function
! Description
|-
| boxplot
| Produce a box plot.
|-
| cdfplot
| Display an empirical cumulative distribution function.
|-
| confusionchart
| Display a chart of a confusion matrix.
|-
| dendrogram
| Plot a dendrogram of a hierarchical binary cluster tree.
|-
| ecdf
| Empirical (Kaplan-Meier) cumulative distribution function.
|-
| gscatter
| Draw a scatter plot with grouped data.
|-
| histfit
| Plot histogram with superimposed fitted normal density.
|-
| hist3
| Produce bivariate (2D) histogram counts or plots.
|-
| manovacluster
| Cluster group means using manova1 output.
|-
| normplot
| Produce normal probability plot of the data.
|-
| ppplot
| Produce a probability plot.
|-
| qqplot
| Produce an empirical quantile-quantile plot.
|-
| silhouette
| Compute the silhouette values of clustered data and show them on a plot.
|-
| violin
| Produce a Violin plot of the data.
|-
| wblplot
| Plot a column vector DATA on a Weibull probability plot using rank regression.
|}

=== TODO list ===

Missing functions:

<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>andrewsplot</code>
* <code>bar3</code>
* <code>bar3h</code>
* <code>glyphplot</code>
* <code>gplotmatrix</code>
* <code>parallelcoords</code>
</div>

== Regression ==

=== Available functions ===

The following table lists the available functions for regression analysis.

{| class="wikitable"
! Function
! Description
|-
| canoncorr
| Canonical correlation analysis.
|-
| cholcov
| Cholesky-like decomposition for covariance matrix.
|-
| dcov
| Distance correlation, covariance and correlation statistics.
|-
| logistic_regression
| Perform ordinal logistic regression.
|-
| monotone_smooth
| Produce a smooth monotone increasing approximation to a sampled functional dependence.
|-
| pca
| Performs a principal component analysis on a data matrix.
|-
| pcacov
| Perform principal component analysis on the NxN covariance matrix X
|-
| pcares
| Calculate residuals from principal component analysis.
|-
| plsregress
| Calculate partial least squares regression using SIMPLS algorithm.
|-
| princomp
| Performs a principal component analysis on a NxP data matrix.
|-
| regress
| Multiple Linear Regression using Least Squares Fit.
|-
| regress_gp
| Linear scalar regression using gaussian processes.
|-
| stepwisefit
| Linear regression with stepwise variable selection.
|}

=== TODO list ===

Missing functions:

<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>glmfit</code>
* <code>glmval</code>
* <code>mnrfit</code>
* <code>mnrval</code>
</div>

== Wrappers ==

Functions available for wrapping other functions or group of functions.

{| class="wikitable"
! Function
! Description
|-
| cdf
| This is a wrapper around various NAMEcdf and NAME_cdf functions.
|-
| clusterdata
| Wrapper function for 'linkage' and 'cluster'.
|-
| pdf
| This is a wrapper around various NAMEpdf and NAME_pdf functions.
|-
| random
| Generates pseudo-random numbers from a given one-, two-, or three-parameter distribution.
|}

=== TODO list ===

Update <code>cdf</code>, <code>pdf</code>, and <code>random</code> to include the latest changes in distribution functions available in statistics-1.5.3.

Missing functions:

<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* <code>icdf</code>
</div>

[[Category:Packages]]
[[Category:Missing functions]]