|
JMSLTM Numerical Library 4.0 | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.imsl.stat.DiscriminantAnalysis
Performs a linear or a quadratic discriminant function analysis among several known groups and the use of either reclassification, split sample, or the leaving-out-one methods in order to evaluate the rule.
Class DiscriminantAnalysis
performs discriminant function analysis using either
linear or quadratic discrimination. The output from DiscriminantAnalysis
includes a
measure of distance between the groups, a table summarizing the classification results, a matrix
containing the posterior probabilities of group membership for each observation, and the within-sample
means and covariance matrices. The linear discriminant function coefficients are also computed.
All observations are input during one call to DiscriminantAnalysis
, a method of
operation that has the advantage of simplicity.
The first step in the algorithm is the
initialization step. The variables means
, classication table
, and covariances
are initialized to zero,
and other program parameters are set. The next step begins by adding all observations in x
to the means and the
factorizations of the covariance matrices. It continues by computing some statistics of interest if requested: the linear
discriminant functions, the prior probabilities, the log of the determinant of each of the covariance matrices,
a test statistic for testing that all of the within-group covariance matrices are equal, and a matrix
of Mahalanobis distances between the groups. The matrix of Mahalanobis distances is computed via the pooled covariance
matrix when linear discrimination is specified, the row covariance matrix is used when the discrimination is quadratic.
Covariance matrices are defined as follows. Let denote the sum of the frequencies of the observations
in group i, and let denote the number of observations in group i. Then,
if denotes the within-group i covariance matrix,
Let S denote either the pooled covariance matrix or one of the within-group covariance matrices . ( will be the pooled covariance matrix in linear discrimination, and otherwise.) The Mahalanobis distance between group i and group j is computed as:
Finally, the asymptotic chi-squared test for the equality of covariance matrices is computed as follows (Morrison 1976, page 252):
where is the number of degrees of freedom in the i-th sample covariance matrix, is the number of groups, and where is the number of variables.The estimated posterior probability of each observation x belonging to group i is computed using the prior probabilities and the sample mean vectors and estimated covariance matrices under a multivariate normal assumption. Under quadratic discrimination, the within-group covariance matrices are used to compute the estimated posterior probabilities. The estimated posterior probability of an observation x belonging to group i is
whereFor the leaving-out-one method of classification, the sample mean vector and sample covariance matrices in the formula for
are adjusted so as to remove the observation x from their computation. For linear discrimination, the linear discriminant function coefficients are actually used to compute the same posterior probabilities.Using the posterior probabilities, each observation in X is
classified into a group; the result is tabulated in the matrix returned by
getClassTable
and saved in the vector returned by
getClassMembership
. The clasification table is not altered at
this stage if X[i][groupIndex]
contains a group number that is out of range.
If the reclassification method is specified, then all observations with no missing values in the nVariables
classification variables are classified. When the
leaving-out-one method is used, observations with invalid group numbers, weights, frequencies or classification variables are not classified. Regardless of the frequency,
a 1 is added (or subtracted) from the classification table for each row of
X
that is classified and contains a valid group number.
When the leaving-out-one method is used, adjustment is made to the posterior
probabilities to remove the effect of the observation in the classification rule. In this
adjustment, each observation is presumed to have a weight of weights[i]
, and a frequency of 1.0. See Lachenbruch (1975, page 36)
for the required adjustment.
Finally, upon completion, the covariance matrices are computed from their LU factorizations.
Nested Class Summary | |
static class |
DiscriminantAnalysis.CovarianceSingularException
The variance-Covariance matrix is singular. |
static class |
DiscriminantAnalysis.EmptyGroupException
There are no observations in a group. |
static class |
DiscriminantAnalysis.SumOfWeightsNegException
The sum of the weights have become negative. |
Field Summary | |
static int |
LEAVE_OUT_ONE
Indicates leave-out-one as the Classicfication Method. |
static int |
LINEAR
Indicates a linear discrimination method. |
static int |
POOLED
Indicates Pooled covariances computed. |
static int |
POOLED_GROUP
Indicates Pooled, group covariances computed. |
static int |
PRIOR_EQUAL
Indicates prior probability type is to be prior equal. |
static int |
PRIOR_PROPORTIONAL
Indicates prior probability type is to be prior proportional. |
static int |
QUADRATIC
Indicates a quadratic discrimination method. |
static int |
RECLASSIFICATION
Indicates reclassification as the classicfication method. |
Constructor Summary | |
DiscriminantAnalysis(int nVariables,
int nGroups)
Constructor for DiscriminantAnalysis . |
Method Summary | |
int[] |
getClassMembership()
Returns the group number to which the observation was classified. |
double[][] |
getClassTable()
Returns the classification table. |
double[][] |
getCoefficients()
Returns the linear discriminant function coefficients. |
double[][][] |
getCovariance()
Returns the array of covariances. |
int[] |
getGroupCounts()
Returns the group counts. |
double[][] |
getMahalanobis()
Returns the Mahalanobis distances between the group means. |
double[][] |
getMeans()
Returns the variable means. |
int |
getNRowsMissing()
Returns the number of rows of data encountered containing missing values (NaN). |
double[] |
getPrior()
Returns the prior probabilities. |
double[][] |
getProbability()
Returns the posterior probabilities for each observation. |
double[] |
getStatistics()
Returns statistics. |
void |
setClassificationMethod(int method)
Sets the classification method. |
void |
setCovarianceComputation(int type)
Sets the type of covariance matrices to be computed. |
void |
setDiscriminationMethod(int method)
Sets the discrimination method. |
void |
setPrior(double[] prior)
Sets the prior probabilities. |
void |
setPrior(int type)
Sets the type of prior probabilities to be computed. |
void |
update(double[][] x)
Processes a set of observations and performs a linear or quadratic discriminant function analysis among the several known groups. |
void |
update(double[][] x,
double[] frequencies,
double[] weights)
Processes a set of observations and associated frequencies and weights then performs a linear or quadratic discriminant function analysis among the several known groups. |
void |
update(double[][] x,
int groupIndex)
Processes a set of observations and performs a linear or quadratic discriminant function analysis among the several known groups. |
void |
update(double[][] x,
int[] varIndex)
Processes a set of observations and performs a linear or quadratic discriminant function analysis among the several known groups. |
void |
update(double[][] x,
int[] varIndex,
double[] frequencies,
double[] weights)
Processes a set of observations and associated frequencies and weights then performs a linear or quadratic discriminant function analysis among the several known groups. |
void |
update(double[][] x,
int groupIndex,
double[] frequencies,
double[] weights)
Processes a set of observations and associated frequencies and weights then performs a linear or quadratic discriminant function analysis among the several known groups. |
void |
update(double[][] x,
int groupIndex,
int[] varIndex)
Processes a set of observations and performs a linear or quadratic discriminant function analysis among the several known groups. |
void |
update(double[][] x,
int groupIndex,
int[] varIndex,
double[] frequencies,
double[] weights)
Processes a set of observations and associated frequencies and weights then performs a linear or quadratic discriminant function analysis among the several known groups. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final int LEAVE_OUT_ONE
public static final int LINEAR
public static final int POOLED
public static final int POOLED_GROUP
public static final int PRIOR_EQUAL
public static final int PRIOR_PROPORTIONAL
public static final int QUADRATIC
public static final int RECLASSIFICATION
Constructor Detail |
public DiscriminantAnalysis(int nVariables, int nGroups)
DiscriminantAnalysis
.
nVariables
- An int
representing the number of variables to be used in the
discrimination.nGroups
- An int
representing the number of groups in the
data.Method Detail |
public int[] getClassMembership()
int
array containing the group to which the observation was
classified. If an observation has an invalid group number, frequency, or
weight when the leaving-out-one method has been specified, then the observation
is not classified and the corresponding elements of the array are set to zero.public double[][] getClassTable()
double
array containing the classification table.
Each observation that is classified and has a group number equal to 1.0,
2.0, ..., is entered into the table. The rows of the
table correspond to the known group membership. The columns refer to the
group to which the observation was classified.public double[][] getCoefficients()
double
array containing the linear discriminant function coefficients. The first
column of the array contains the constant term, and the remaining columns
contain the variable coefficients. The i-th
row of the returned array corresponds to group i. The
coefficients are always computed as linear discriminant function coefficients
even when quadratic discrimination is specified.public double[][][] getCovariance()
double
array containing the covariances.
Here, unless pooled only covariance matrices are computed, in
which case g=1. When pooled only covariance matrices are computed, the within-group
covariance matrices are not computed. The pooled covariance matrix is always computed and is returned
as the g-th covariance matrix.public int[] getGroupCounts()
int
array of length nGroups
containing the number of observations
in each group.public double[][] getMahalanobis()
double
array containing the Mahalanobis distances between the group means.
For linear discrimination, the Mahalanobis distance between group means
i and j is computed using the within covariance matrix for
group i in place of the pooled covariance matrix.public double[][] getMeans()
double
array containing the variable means. The i-th
row of the returned array contains the group i variable means.public int getNRowsMissing()
int
representing the number of rows of data encountered
containing missing values (NaN) for the classification, group, weight,
and/or frequency variables. If a row of data contains a missing value (NaN)
for any of these variables, that row is excluded from the computations.public double[] getPrior()
double
vector of length nGroups
containing the prior probabilities
for each group.public double[][] getProbability()
double
array containing the posterior probabilities for each observation.public double[] getStatistics()
double
array (stat) containing output statistics.
I | STAT[I] |
0 | Sum of the degrees of freedom for the within-covariance matrices. |
1 | Chi-squared statistic. |
2 | The degrees of freedom in the chi-squared statistic. |
3 | Probability of a greater chi-squared, respectively, of a test of the homogeneity of the within-covariance matrices. (Not computed when the pooled only covariance matrix is computed). |
4 thru 4+nGroups |
Log of the determinant of each group's covariance matrix. (Not computed when the pooled only covariance matrix is computed) and of the pooled covariance matrix. |
Last nGroups + 1 elements |
Sum of the weights within each group. |
Last element | Sum of the weights in all groups. |
public void setClassificationMethod(int method)
method
- A int
scalar indicating the method of classification.
Use class member
RECLASSIFICATION
or
LEAVE_OUT_ONE
.
If this member function is not called, the RECLASSIFICATION
method is used.public void setCovarianceComputation(int type)
type
- An int
scalar indicating the type of covariance matrices
to be computed.
Use class member
POOLED
or
POOLED_GROUP
.
If this member function is not called, the POOLED_GROUP
type is used.public void setDiscriminationMethod(int method)
method
- An int
scalar indicating the method of discrimination.
Use class member
LINEAR
or
QUADRATIC
.
If this member function is not called, the LINEAR
method is used.public void setPrior(double[] prior)
prior
- A double
vector of length nGroups
containing the prior probabilities
for each group. The elements of prior
should sum to 1.0.
If this member function is not called, the elements of prior
are set so as
to be equal if PRIOR_EQUAL
is set or they are set to be
proportional to the sample size in each group if PRIOR_PROPORTIONAL
is set.public void setPrior(int type)
type
- An int
scalar indicating the type of prior probabilities
to be computed.
Use class member
PRIOR_EQUAL
or
PRIOR_PROPORTIONAL
.
If this member function is not called, the PRIOR_EQUAL
type is used.public void update(double[][] x) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- a double
matrix containing the observations. The
first nVariables
columns correspond to the
variables, and the last column (column nVariables
)
contains the group numbers. The groups must be numbered
1,2, ..., nGroups
.
DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.EmptyGroupException
DiscriminantAnalysis.CovarianceSingularException
public void update(double[][] x, double[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- A double
matrix containing the observations. The
first nVariables
columns correspond to the
variables, and the last column (column nVariables
)
contains the group numbers. The groups must be numbered
1,2, ..., nGroups
.frequencies
- A double
array containing the associated
frequencies.weights
- A double
array containing the associated
weights.
DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.EmptyGroupException
DiscriminantAnalysis.CovarianceSingularException
public void update(double[][] x, int groupIndex) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- A double
matrix containing the observations.
The first nVariables
columns correspond to the
variables, excluding the groupIndex
column.groupIndex
- An int
containing the column index of
x
in which the group numbers are stored. The groups
must be numbered 1,2, ..., nGroups
.
DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.EmptyGroupException
DiscriminantAnalysis.CovarianceSingularException
public void update(double[][] x, int[] varIndex) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- A double
matrix containing the observations. The
columns indicated in varIndex
correspond to the variables,
and the last column (column nVariables
)
contains the group numbers. The groups must be numbered
1,2, ..., nGroups
.varIndex
- An int
array containing the column indices
in x
that correspond to the variables to be
used in the analysis.
DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.EmptyGroupException
DiscriminantAnalysis.CovarianceSingularException
public void update(double[][] x, int[] varIndex, double[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- A double
matrix containing the observations. The
columns indicated in varIndex
correspond to the
variables, and the last column (column nVariables
)
contains the group numbers. The groups must be numbered
1,2, ..., nGroups
.varIndex
- An int
array containing the column indices
in x
that correspond to the variables to be
used in the analysis.frequencies
- A double
array containing the associated
frequencies.weights
- A double
array containing the associated
weights.
DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.EmptyGroupException
DiscriminantAnalysis.CovarianceSingularException
public void update(double[][] x, int groupIndex, double[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- A double
matrix containing the observations. The
first nVariables
columns correspond to the
variables, excluding the groupIndex
column.groupIndex
- An int
containing the column index of
x
in which the group numbers are stored. The groups
must be numbered 1,2, ..., nGroups
.frequencies
- A double
array containing the associated
frequencies.weights
- A double
array containing the associated
weights.
DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.EmptyGroupException
DiscriminantAnalysis.CovarianceSingularException
public void update(double[][] x, int groupIndex, int[] varIndex) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- A double
matrix containing the observations. The
columns indicated in varIndex
correspond to the
variables, and groupIndex
column contains the group
numbers.groupIndex
- An int
containing the column index of
x
in which the group numbers are stored. The groups
must be numbered 1,2, ..., nGroups
.varIndex
- An int
array containing the column indices
in x
that correspond to the variables to be
used in the analysis.
DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.EmptyGroupException
DiscriminantAnalysis.CovarianceSingularException
public void update(double[][] x, int groupIndex, int[] varIndex, double[] frequencies, double[] weights) throws DiscriminantAnalysis.SumOfWeightsNegException, DiscriminantAnalysis.EmptyGroupException, DiscriminantAnalysis.CovarianceSingularException
x
- A double
matrix containing the observations. The
columns indicated in varIndex
correspond to the
variables, and groupIndex
column contains the group
numbers.groupIndex
- An int
containing the column index of
x
in which the group numbers are stored. The groups
must be numbered 1,2, ..., nGroups
.varIndex
- An int
array containing the column indices
in x
that correspond to the variables to be
used in the analysis.frequencies
- A double
array containing the associated
frequencies.weights
- A double
array containing the associated
weights.
DiscriminantAnalysis.SumOfWeightsNegException
DiscriminantAnalysis.EmptyGroupException
DiscriminantAnalysis.CovarianceSingularException
|
JMSLTM Numerical Library 4.0 | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |