Statistics package: Difference between revisions

From Octave
Jump to navigation Jump to search
No edit summary
Line 153: Line 153:
=== TODO list ===
=== TODO list ===


Update '''<code>geomean</code>''', '''<code>harmmean</code>''', and <code>trimmean</code> functions to be fully MATLAB compatible.
Update <code>geomean</code>, <code>harmmean</code>, and <code>trimmean</code> functions to be fully MATLAB compatible.


Re-introduce the '''<code>nan*</code>''' functions implemented in C++ with the 'all' and 'vecdim' options.
Re-introduce the <code>nan*</code> functions implemented in C++ with the <code>"all"</code> and <code>"vecdim"</code> options.


Re-implement the following functions from core Octave, as shadowing functions with updated functionality regarding the <code>"all"</code>, <code>"omitnan"</code>, and <code>"vecdim"</code> options, with the intend to be included in Octave 9.
Re-implement the following functions from core Octave, as shadowing functions with updated functionality regarding the <code>"all"</code>, <code>"omitnan"</code>, and <code>"vecdim"</code> options, with the intend to be included in Octave 9.


<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
<div style="column-count:1;-moz-column-count:1;-webkit-column-count:1">
* cov
* <code>cov</code>
* mad
* <code>mad</code>
* meansq
* <code>meansq</code>
* mode
* <code>mode</code>
* moment
* <code>moment</code>
</div>
</div>



Revision as of 21:13, 5 February 2023

The statistics package is part of the Octave Packages. Since version 1.5.0, the statistics package requires Octave version 6.1 or higher. From Octave v7.2 or later, you can install the latest statistics package (currently 1.5.3) with the following command:

pkg install -forge statistics

The following sections provide an overview of the functions available in the statistics package sorted alphabetically and arranged in groups similarly to the package's INDEX file.

Clustering

Available functions

The following table lists the functions available for clustering.

Function Description
cluster Define clusters from an agglomerative hierarchical cluster tree.
cmdscale Classical multidimensional scaling of a matrix.
confusionmat Compute a confusion matrix for classification problems.
cophenet Compute the cophenetic correlation coefficient.
evalclusters Create a clustering evaluation object to find the optimal number of clusters.
inconsistent Compute the inconsistency coefficient for each link of a hierarchical cluster tree.
kmeans Perform a K-means clustering of an NxD matrix.
linkage Produce a hierarchical clustering dendrogram.
mahal Mahalanobis' D-square distance.
optimalleaforder Compute the optimal leaf ordering of a hierarchical binary cluster tree.
pdist Return the distance between any two rows in X.
pdist2 Compute pairwise distance between two sets of vectors.
squareform Interchange between distance matrix and distance vector formats.

TODO list

Missing functions:

  • procrustes

Data Manipulation

Available functions

The following table lists the functions available for data manipulation.

Function Description
combnk Return all combinations of K elements in DATA.
crosstab Create a cross-tabulation (contingency table) T from data vectors.
datasample Randomly sample data.
grp2idx Get index for group variables.
tabulate Compute a frequency table.

Descriptive Statistics

Available functions

The following table lists the functions available for descriptive statistics.

Function Description
geomean Compute the geometric mean.
grpstats Compute summary statistics by group. Fully MATLAB compatible.
harmmean Compute the harmonic mean.
jackknife Compute jackknife estimates of a parameter taking one or more given samples as parameters.
mean Compute the mean. Fully MATLAB compatible.
median Compute the median. Fully MATLAB compatible.
nanmax Find the maximal element while ignoring NaN values.
nanmin Find the minimal element while ignoring NaN values.
nansum Compute the sum while ignoring NaN values.
std Compute the standard deviation. Fully MATLAB compatible.
trimmean Compute the trimmed mean.
std Compute the variance. Fully MATLAB compatible.

In external packages

bootci, bootstrp are implemented in the statistics-bootstrap package.

Shadowing Octave core functions

The following functions will shadow the respective core functions until Octave 9.

  • mean
  • median
  • std
  • var

TODO list

Update geomean, harmmean, and trimmean functions to be fully MATLAB compatible.

Re-introduce the nan* functions implemented in C++ with the "all" and "vecdim" options.

Re-implement the following functions from core Octave, as shadowing functions with updated functionality regarding the "all", "omitnan", and "vecdim" options, with the intend to be included in Octave 9.

  • cov
  • mad
  • meansq
  • mode
  • moment

Distributions

Available functions

The following table lists the cdf, icdf, pdf, and random functions available in the statistics package. Since version 1.5.3, all CDFs support the "upper" option for evaluating the complement of the respective CDF.

Note! The icdf wrapper for the quantile functions is not implemented yet.

Distribution Name Cumulative Distribution Function Quantile Function Probability Density Function Random Generator
Birnbaum–Saunders bbscdf bbsinv bbspdf bbsrnd
Beta betacdf betainv betapdf betarndbivariate
[Binomial binocdf binoinv binopdf binornd
Bivariate bvncdf
Burr Type XII burrcdf burrinv burrpdf burrrnd
Cauchy cauchy_cdf cauchy_inv cauchy_pdf cauchy_rnd
Chi-squared chi2cdf chi2inv chi2pdf chi2rnd
Copula Family copulacdf copulainv copulapdf copularnd
Extreme Value evcdf evinv evpdf evrnd
Exponential expcdf expinv exppdf exprnd
F fcdf finv fpdf frnd
Gamma gamcdf gaminv gampdf gamrnd
Geometric geocdf geoinv geopdf geornd
Generalized Extreme Value gevcdf gevinv gevpdf gevrnd
Generalized Pareto gpcdf gpinv gppdf gprnd
Hypergeometric hygecdf hygeinv hygepdf hygernd
Inverse-Wishart iwishpdf iwishrnd
Johnson's SU jsucdf jsupdf
Laplace laplace_cdf laplace_inv laplace_pdf laplace_rnd
Logistic logistic_cdf logistic_inv logistic_pdf logistic_rnd
Log-normal logncdf logninv lognpdf lognrnd
Multinomial mnpdf mnrnd
Multivariate Normal mvncdf mvninv mvnpdf mvnrnd
Multivariate Student's T mvtcdf mvtcdfqmc mvtinv mvtpdf mvtrnd
Nakagami nakacdf nakainv nakapdf nakarnd
Negative Binomial nbincdf nbininv nbinpdf nbinrnd
Noncentral F ncfcdf ncfinv ncfpdf ncfrnd
Noncentral Student's T nctcdf nctinv nctpdf nctrnd
Noncentral Chi-squared ncx2cdf ncx2inv ncx2pdf ncx2rnd
Normal normcdf norminv normpdf normrnd
Poisson poisscdf poissinv poisspdf poissrnd
Rayleigh raylcdf raylinv raylpdf raylrnd
Standard Normal stdnormal_cdf stdnormal_inv stdnormal_pdf stdnormal_rnd
Student's T tcdf tinv tpdf trnd
Triangular tricdf triinv tripdf trirnd
Discrete Uniform unidcdf unidinv unidpdf unidrnd
Continuous Uniform unifcdf unifinv unifpdf unifrnd
von Mises vmcdf vmpdf vmrnd
Weibull wblcdf wblinv wblpdf wblrnd
Wiener process wienrnd
Wishart wishpdf wishrnd


Distribution Fitting

Functions available for estimating parameters and the negative log-likelihood for certain distributions.

Distribution Name Parameter Estimation Negativel Log-likelihood
Extreme Value evfit evlike
Exponential expfit explike
Gamma gamfit gamlike
Generalized Extreme Value gevfit_lmom gevfit gevlike
Generalized Pareto gpfit gplike
Normal normlike

Distribution Statistics

Functions available for computing mean and variance from distribution parameters.

  • betastat
  • binostat
  • chi2stat
  • evstat
  • expstat
  • fstat
  • gamstat
  • geostat
  • gevstat
  • gpstat
  • hygestat
  • lognstat
  • nbinstat
  • ncfstat
  • nctstat
  • ncx2stat
  • normstat
  • poisstat
  • raylstat
  • fitgmdist
  • tstat
  • unidstat
  • unifstat
  • wblstat

Experimental Design

Functions available for computing design matrices.

  • fullfact
  • ff2n
  • x2fx

Model Fitting

Functions available for computing design matrices.

  • crossval
  • fitgmdist
  • fitlm

Cross Validation

Class of set partitions for cross-validation, used in crossval

  • @cvpartition/cvpartition
  • @cvpartition/display
  • @cvpartition/get
  • @cvpartition/repartition
  • @cvpartition/set
  • @cvpartition/test
  • @cvpartition/training

Hypothesis Testing

Functions available for hypothesis testing

Function Description
adtest Anderson-Darling goodness-of-fit hypothesis test.
anova1 Perform a one-way analysis of variance (ANOVA)
anova2 Performs two-way factorial (crossed) or a nested analysis of variance (ANOVA) for balanced designs.
anovan Perform a multi (N)-way analysis of (co)variance (ANOVA or ANCOVA) to evaluate the effect of one or more categorical or continuous predictors (i.e. independent variables) on a continuous outcome (i.e. dependent variable).
bartlett_test Perform a Bartlett test for the homogeneity of variances.
barttest Bartlett's test of sphericity for correlation.
binotest Test for probability P of a binomial sample
chi2gof Chi-square goodness-of-fit test.
chi2test Perform a chi-squared test (for independence or homogeneity).
friedman Performs the nonparametric Friedman's test to compare column effects in a two-way layout.
hotelling_t2test Compute Hotelling's T^2 ("T-squared") test for a single sample or two dependent samples (paired-samples).
hotelling_t2test2 Compute Hotelling's T^2 ("T-squared") test for two independent samples.
kruskalwallis Perform a Kruskal-Wallis test, the non-parametric alternative of a one-way analysis of variance (ANOVA).
kstest Single sample Kolmogorov-Smirnov (K-S) goodness-of-fit hypothesis test.
kstest2 Two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test.
levene_test Perform a Levene's test for the homogeneity of variances.
manova1 One-way multivariate analysis of variance (MANOVA).
ranksum Wilcoxon rank sum test for equal medians. This test is equivalent to a Mann-Whitney U-test.
regression_ftest F-test for General Linear Regression Analysis
regression_ttest Perform a linear regression t-test for the null hypothesis RR * B = R in a classical normal regression model Y = X * B + E.
runstest Runs test for detecting serial correlation in the vector X.
sampsizepwr Sample size and power calculation for hypothesis test.
signtest Test for median.
ttest Test for mean of a normal sample with unknown variance or a paired-sample t-test.
ttest2 Perform a two independent samples t-test.
vartest One-sample test of variance.
vartest2 Two-sample F test for equal variances.
vartestn Test for equal variances across multiple groups.
ztest One-sample Z-test.


Wrappers

Functions available for wrapping other functions or group of functions.

Function Description
cdf This is a wrapper around various NAMEcdf and NAME_cdf functions.
clusterdata Wrapper function for 'linkage' and 'cluster'.
pdf This is a wrapper around various NAMEpdf and NAME_pdf functions.
pdf Generates pseudo-random numbers from a given one-, two-, or three-parameter distribution.