This is the STATISTICS Reference Manual, version 1.0.0, generated automatically by Declt version 4.0b2.
Copyright © 2019-2022 Steve Nunez
Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the section entitled “Copying” is included exactly as in the original.
Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be translated as well.
This program is distributed under the terms of the Microsoft Public License.
The main system appears first, followed by any subsystem dependency.
A consolidated system of statistical functions
Consolidated Common Lisp statistical functions
Steve Nunez <steve@symbolics.tech>
(GIT https://github.com/Lisp-Stat/statistics.git)
MS-PL
This system is a consolidation of three Common Lisp statistics libraries:
- Tamas Papp’s library, focusing on central moments
- Larry Hungers general statistical library
- Gary Warren King’s (GWK) general statistical library, cl-mathstats
As of Q3 2022, CL-MATHSTATS is usable with Lisp-Stat, but not incorporated. This is due to it being rather deeply embedded into its own ecosystem of utilities libraries (metatilities-base and cl-containers and the lift test framework) that have in some cases been superseded by alexandria, anaphora or numerical-utilities. In short, we recommend using CL-MATHSTATS when you need to, recognising that you’ll be hauling in a parallel system of math, statistics and utilities. Long term, we’re working to port CL-MATHSTATS on a case-by-case basis.
1.0.0
Files are sorted by type and then listed depth-first from the systems components trees.
statistics (system).
statistics (system).
statistics (system).
Packages are listed by definition order.
The formulas and methods used are largely taken from Bernard Rosner, *Fundamentals of Biostatistics* 5th Edition. ’Rosner x’ is a page number. Some numeric functions were taken from CLASP, a 1994 common lisp package that implemented some of the statistical functions from *Numeric recipes in C* For CLASP functions, see copyright notice below.
These abreviations used in function and variable names:
ci = confidence interval
cdf = cumulative density function
ge = greater than or equal to
le = less than or equal to
pdf = probability density function
sd = standard deviation
rxc = rows by columns
sse = sample size estimate
common-lisp.
common-lisp.
Definitions are sorted by export status, category, package, and then by lexicographic order.
Default degree for (weighted) central sample moments.
Make N equal width bins and count the number of elements of sequence that belong in each.
Return P(X<k) for X a binomial random variable with parameters n & p. Bionomial expecations for fewer than k events in N trials, each having probability p. This is also known as probability mass function (PMF), the probability of getting exactly k successes in n independent Bernoulli trials.
The probability of k or more occurances in N events, each with probability p.
Return P(X=k) for X a binomial random variable with parameters n & p. Binomial expectations for seeing k events in N trials, each having probability p. Use the Poisson approximation if N>100 and P<0.01.
Confidence intervals on a binomial probability. If a binomial probability of p has been observed in N trials, what is the 1-alpha confidence interval around p? Approximate (using normal theory approximation) when npq >= 10 unless told otherwise
The significance of a one sample test for the equality of an observed probability p-hat to an expected probability p under a binomial distribution with N observations. Use the normal theory approximation if n*p*(1-p) > 10 (unless the exact flag is true).
Returns the number of subjects needed to test whether an observed probability is significantly different from a particular binomial null hypothesis with a significance alpha and a power 1-beta.
Sample size estimate for the McNemar (discordant pairs) test. Pd is the projected proportion of discordant pairs among all pairs, and Pa is the projected proportion of type A pairs among discordant pairs. alpha, 1-beta and tails are as above. Returns the number of individuals necessary; that is twice the number of matched pairs necessary.
Are the observed probabilities of an event (p-hat1 and p-hat2) in N1/N2 trials different? The normal theory method implemented here. The exact test is Fisher’s contingency table method, below.
The number of subjects needed to test if two binomial probabilities are different at a given significance alpha and power 1-beta. The sample sizes can be unequal; the p2 sample is sample-sse-ratio * the size of the p1 sample. It can be a one tailed or two tailed test.
Return the degree of CENTRAL-SAMPLE-MOMENTS.
;; Returns the point which is the indicated percentile in the Chi Square distribution with dof degrees of freedom.
Chi-square-cdf computes the left hand tail area under the chi square distribution under dof degrees of freedom up to X.
This test works on a 2xk table and assesses if there is an increasing or decreasing trend. Arguments are equal sized lists counts. Optionally, provide a list of scores, which represent some numeric attribute of the group. If not provided, scores are assumed to be 1 to k.
The significance of a one sample Chi square test for the variance of a normal distribution. Variance is the observed variance, N is the number of observations, and sigma-squared is the test variance.
Takes contingency-table, an RxC array, and returns the significance of the relationship between the row variable and the column variable. Any difference in proportion will cause this test to be significant – consider using the test for trend instead if you are looking for a consistent change.
How may ways to take n things taken k at a time, when order doesn’t matter
Return coefficient of variation
Convert X from a Normal distribution with mean mu and variance sigma to standard normal
Pearson correlation
Returns the size of a sample necessary to find a correlation of expected value rho with significance alpha and power 1-beta.
Test if two correlation coefficients are different. Users Fisher’s Z test.
Cross-tabulate two sequences (using a SPARSE-COUNTER with the given TEST). TEST is used to compare conses.
Return the empirical quantile of a vector of real numbers, sorted in ascending order (not checked). Uses a 0.5 correction.
Probabilities that correspond to the empirical quantiles of a vector of length N. That is to say,
(== (quantiles sample (empirical-quantile-probabilities (length sample)))
sample)
for any vector SAMPLE.
Return the elements of OBJECT as a vector (or reals) sorted in ascending order.
Adopted from CLASP, but changed to handle F < 1 correctly in the one-tailed case. The ‘f-statistic’ must be a positive number. The degrees of freedom arguments must be positive integers. The ‘one-tailed-p’ argument is treated as a boolean.
F test for the equality of two variances
A multiple testing correction that is less conservative than Bonferroni. Takes a list of p-values and a false discovery rate, and returns the number of p-values that are likely to be good enough to reject the null at that rate. Returns a second value which is the p-value cutoff.
Fisher’s exact test. Gives a p value for a particular 2x2 contingency table
Transforms the correlation coefficient to an approximately normal distribution.
By default, returns the five number summary (min, 1st quartile, median, 3rd quartile, max) of the elements X. If the keyword :tukey is set to a non-nil value, Tukey’s fivenum summary is computed instead.
Returns the geometric mean of SEQUENCE
The geometric mean is a mean or average, which indicates the central tendency or typical value of a set of numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum)
Returns the interquartile range of the elements of X.
Computes the regression equation for a least squares fit of a line to a sequence of points (each a list of two numbers, e.g. ’((1.0 0.1) (2.0 0.2))) and report the intercept, slope, correlation coefficient r, R^2, and the significance of the difference of the slope from 0.
Create a sparse counter. Elements are compared with TEST (should be accepted by HASH-TABLE).
McNemar’s test for correlated proportions, used for longitudinal studies. Look only at the number of discordant pairs (one treatment is effective and the other is not). If the two treatments are A and B, a-discordant-count is the number where A worked and B did not, and b-discordant-count is the number where B worked and A did not.
Returns the mean of SEQUENCE
Return the mean of OBJECT. OBJECT must be either a sequence of numbers, a sequence of BOOLEAN or a DISTRIBUTION object.
A sequence of BOOLEAN is converted to a BIT-VECTOR and the mean of it returned. This gives you the ratio of TRUE/FALSE values in the sequence (which is most often interpreted as a probability).
For samples (numeric-vectors), normalized by the weight-1 (and thus unbiased if certain assumptions hold, eg weights that count frequencies.
A combined calculation that is often useful. Takes a sequence and returns three values: mean, standard deviation and N.
Returns the median of SEQUENCE
Returns two values: a list of the modes and the number of times they occur
Confidence interval for the mean of a normal distribution.
The 1-alpha percent confidence interval on the mean of a normal distribution with parameters mean, sd & n.
The 1-alpha confidence interval on the mean of a sequence of numbers drawn from a Normal distribution.
The probability density function (PDF) for a normal distribution with mean mu and variance sigma at point x.
The 1-alpha confidence interval on the standard deviation of a sequence of numbers drawn from a Normal distribution.
The 1-alpha confidence interval on the variance of a sequence of numbers drawn from a Normal distribution.
Return an element from SEQUENCE at percentile PERCENT This function is also known as quantile.
How many ways to take n things taken k at a time, when order matters
the CDF of standard normal distribution
Probability of seeing fewer than K events over a time period when the expected number events over that time is mu.
Probability of X or more events when expected is mu.
Confidence interval for the Poisson parameter mu
Given x observations in a unit of time, what is the 1-alpha confidence interval on the Poisson parameter mu (= lambda*T)?
Since find-critical-value assumes that the function is monotonic increasing, adjust the value we are looking for taking advantage of reflectiveness
Probability of seeing k events over a time period when the expected number of events over that time is mu.
The significance of a one sample test for the equality of an observed number of events (observed) and an expected number mu under the poisson distribution. Normal theory approximation is not that great, so don’t use it unless told.
Pool ACCUMULATORS.
Returns a random number with mean and standard-distribution as specified.
Random selection from sequence
Return a random sample of size N from sequence, without replacement. If N is equal to or greater than the length of the sequence, return the entire sequence.
Rounds a floating point number to a specified number of digits precision.
Return the difference between the largest and smallest values in SEQUENCE
Really just a special case of the binomial one sample test with p = 1/2. The normal theory version has a correction factor to make it a better approximation.
Same as SIGN-TEST, but takes two sequences and tests whether the entries in one are different (greater or less) than the other.
Return the count for OBJECT.
Spearman rank correlation computes the relationship between a pair of variables when one or both are either ordinal or have a distribution that is far from normal. It takes a list of points (same format as linear-regression) and returns the spearman rank correlation coefficient and its significance.
Return the standard deviation of SEQUENCE
Return the estimated standard deviation obtained from a set of sample means from repeated samples
Returns the point which is the indicated percentile in the T distribution with dof degrees of freedom
Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
The significance of a one sample T test for the mean of a normal distribution with unknown variance. X-bar is the observed mean, sd is the observed standard deviation, N is the number of observations and mu is the test mean.
The significance of a one sample T test for the mean of a normal sequence of numbers with unknown variance. X-bar is the observed mean, sd is the observed standard deviation, N is the number of observations and mu is the test mean.
Returns the number of subjects needed to test whether the mean of a normally distributed sample mu is different from a null hypothesis mean mu-null and variance variance, with alpha, 1-beta and tails as specified.
The significance of a paired t test for the means of two normal distributions in a longitudinal study. D-bar is the mean difference, sd is the standard deviation of the differences, N is the number of pairs.
The significance of a paired t test for means of two normal distributions in a longitudinal study. Before is a sequence of before values, after is the sequence of paired after values (which must be the same length as the before sequence).
Returns the number of subjects needed to test whether the differences with mean difference-mu and variance difference-variance, with alpha, 1-beta and tails as specified.
The significance of the difference of two means (x-bar1 and x-bar2) with standard deviations sd1 and sd2, and sample sizes n1 and n2 respectively. The form of the two sample t test depends on whether the sample variances are equal or not. If the variable variances-equal? is :test, then we use an F test and the variance-significance-cutoff to determine if they are equal. If the variances are equal, then we use the two sample t test for equal variances. If they are not equal, we use the Satterthwaite method, which has good type I error properties (at the loss of some power).
The significance of the difference of two means of SEQUENCE1 and SEQUENCE2 with standard deviations sd1 and sd2, and sample sizes n1 and n2 respectively. The form of the two sample t test depends on whether the sample variances are equal or not. If the variable variances-equal? is :test, then we use an F test and the variance-significance-cutoff to determine if they are equal. If the variances are equal, then we use the two sample t test for equal variances. If they are not equal, we use the Satterthwaite method.
Returns the number of subjects needed to test whether the mean mu1 of a normally distributed sample (with variance variance1) is different from a second sample with mean mu2 and variance variance2, with alpha, 1-beta and tails as specified. It is also possible to set a sample size ratio of sample 1 to sample 2.
Tabulate a sequence (using a SPARSE-COUNTER with the given TEST).
Return variance of SEQUENCE
Variance of OBJECT. For samples, normalized by the weight-1 (and thus unbiased if certain assumptions hold, e.g. weights that count frequencies).
Note that alexandria’s default for variance will return biased variance. We change that here for consistency. If you want a biased variance use alexandria:variance directly.
Calculate quantiles QS of weighted observations. Uses a 0.5 correction.
A test on the ranking of positive and negative differences (are the positive differences significantly larger/smaller than the negative ones). Assumes a continuous and symmetric distribution of differences, although not a normal one. This is the normal theory approximation, which is only valid when N > 15. This test is equivalent to the Mann-Whitney test.
The inverse normal function, P(X<Zu) = u where X is distributed as the standard normal. Uses binary search.(
The significance of a one sample Z test for the mean of a normal distribution with known variance. mu is the null hypothesis mean, x-bar is the observed mean, sigma is the standard deviation and N is the number of observations. If tails is :both, the significance of a difference between x-bar and mu. If tails is :positive, the significance of x-bar is greater than mu, and if tails is :negative, the significance of x-bar being less than mu. Returns a p value.
Add OBJECT to ACCUMULATOR. Return OBJECT. NILs are ignored by the accumulator, unless a specialized method decides otherwise. Keywords may be used to specify additional information (eg weight).
Increments the count of OBJECT in SPARSE-COUNTER, optionally with a weight
Second central moment. For samples, normalized by the total weight (and thus not the unbiased estimator, see VARIANCE).
Third central moment.
Fourth central moment.
Return a CENTRAL-SAMPLE-MOMENTS object that allows the calculation of the central sample moments of OBJECT up to the given DEGREE.
When WEIGHTS are given, they need to be a sequence of matching length.
Return the contents of OBJECT as a SORTED-REALS.
Kurtosis FIXME talk about bias, maybe implement unbiased?
The mean of elements in OBJECT.
Median of OBJECT.
Return an element at quantile Q. May be an interpolation or an approximation, depending on OBJECT and Q. NOTE: Extensions should define methods for QUANTILES, not QUANTILE.
Multiple quantiles (see QUANTILE). NOTE: Extensions should define methods for QUANTILES, not QUANTILE.
Standard deviation. For samples, the square root of the unbiased estimator (see VARIANCE).
Skewness FIXME talk about bias, maybe implement unbiased?
The total weight of elements in ACCUMULATOR.
Return the total ’weight’ of the accumulator
Variance of OBJECT. For samples, normalized by the weight-1 (and thus unbiased if certain assumptions hold, eg weights that count frequencies).
Return (OBJECT . COUNT) pairs as an alist.
num-utils.utilities.
num-utils.num=.
error.
error.
error.
Central sample moments calculated on-line/single-pass.
M weighted mean
S2 weighted sum of squared deviations from the mean, not calculated when NIL
S3 weighted sum of cubed deviations from the mean, not calculated when NIL
S4 weighted sum of 4th power deviations from the mean, not calculated when NIL
Allows on-line, numerically stable calculation of moments. See cite{bennett2009numerically} and cite{pebay2008formulas} for the description of the algorithm. M_2, ..., M_4 in the paper are s2, ..., s4 in the code.
real
0.0d0
(or (real 0) null)
0.0d0
(or real null)
0.0d0
(or (real 0) null)
0.0d0
Accumulator which sorts elements. ELEMENTS return the sorted elements.
structure-object.
hash-table
This slot is read-only.
LET+ form for slots of the structure SORTED-REALS.
LET+ form for slots of the structure SORTED-REALS. Read-only.
FIXME documentation, factor out general part
Protects against floating point underflow errors and sets the value to 0.0 instead.
Average rank calculation for non-parametric tests. Ranks are 1 based, but lisp is 0 based, so add 1!
Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
m.
s2.
s3.
s4.
Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
Adopted from CLASP 1.4.3, http://eksl-www.cs.umass.edu/clasp.html
Eliminates floating point underflow for the exponential function. Instead, it just returns 0.0d0
Return a SORTED-REALS structure.
w.
Return the empirical quantile of a vector of real numbers, sorted in ascending order (not checked). Uses a 0.5 correction.
Return table of probability brackets for weighted quantile calculations., built from the weights (which should be positive reals, not checked). Uses a 0.5 correction.
Pool two accumulators. When they are of a different type, the resulting accumulator will be downgraded to the level afforded by the information available in the accumulators.
Jump to: | &
(
A B C D E F G I K L M N P Q R S T U V W Z |
---|
Jump to: | &
(
A B C D E F G I K L M N P Q R S T U V W Z |
---|
Jump to: | *
M O S T U W |
---|
Jump to: | *
M O S T U W |
---|
Jump to: | C E F I L N P S T |
---|
Jump to: | C E F I L N P S T |
---|