Correlate summation analysis

Correlate summation analysis is a data mining method. It is designed to find the variables that are most covariant with all of the other variables being studied, relative to clustering. Aggregate correlate summation is the product of the totaled negative logarithm of the p-values for all of the correlations to a given variable and its (normalized) standard deviation-to-mean quotient. Discrete correlate summation is the product of the totaled absolute value of the logarithm of the p-value ratios between two groups' correlations to a given variable and its absolute value of the logarithm of the group mean ratios.

1 Correlate summation template
2 Discrete correlate summation
3 Aggregate correlate summation
4 Non-linear modeling
5 References

Correlate summation template

This zipped Excel template performs a correlate summation analysis for up to 100 variables for 4 groups of 15 subjects:

[1]

The paper ^[1] describing the method is embedded in the spreadsheet.

Discrete correlate summation

Given two groups, a correlation matrix (m by m) was constructed for m variables for each group. Each column represents all of the correlations (r) between a given variable and each of the other variables. For variables with either heterogeneous or homogeneous numbers of data points (n), the n for each individual correlation was calculated by assigning each data point with a value of one and taking the sum of the products for each pair in that correlation.

The correlations were tested for linearity using Student's t-distribution to evaluate:

$t=\frac{|r|}{\sqrt{\frac{1-r^2}{n-2}}}$

for (n − 2) degrees of freedom, returning two tails ^[2].

The correlation matrices were thus transformed into linear probability matrices. For the two groups, the absolute value of the logarithm of the ratio of each comparison’s p-value gives a log correlation ratio that is larger as the ratio approaches zero or infinity. Each column was totaled to form the discrete correlate summation array. As in the log correlation ratio (log_cr), the log mean ratio (log_mr) for the two groups’ means was acquired for each variable. The correlate summation was then multiplied by the log mean ratio, to yield the discrete mean-correlate summation (DCΣ_x) ^[1].

Aggregate correlate summation

As in the discrete correlate summation, a linear probability matrix was calculated for all of the data (no grouping). The negative logarithm was taken for all of the p-values; the columns were totaled to give the aggregate correlate summation (ACΣ) array. The standard deviation for each variable is divided by its mean to normalize the variances between variables. Data with a bimodal distribution will have a larger normalized standard deviation (nSD) than will data with a normal distribution. The nSD array multiplied by the ACΣ array yielded the aggregate mean-correlate summation (ACΣ_x) ^[1].

Non-linear modeling

A linear correlation between variables for a given sample set is typically the initial step in the investigation of relationships, which may lead to an underlying mechanism. The variation (either inherent or in response to a challenge) in a given population gives rise to correlations of variables of which only a portion of the sigmoidal (control) relationship may be evident. Generally in the face of data that defies linear regression, data patterns indicate power relationship of the general type:

y = m x a

Type 1: a < 0 is a hyperbolic function

Type 2: a = 0 is a horizontal line

Type 3: 0 < a < 1 is a root function

Type 4: a = 1 is actually a linear function

Type 5: a > 1 is a power function

(In all five cases a log-log plot yields a linear curve.) ^[3]

On a positive sigmoidal/logistic curve, the initial, intermediate and late portions resemble power, linear and root functions, respectively. Also, the late portion of a negative control function is reminiscent of a hyperbolic curve.

In an analysis of variable correlation, the sigmoidal relationship of the entire (unsampled in some cases) data range should be considered. This type of analysis is accomplished by regression with either a logistic curve or simple linear regression with further investigation of the Type 1, 3 and 5 power relationships ^[1].

References

^ ^a ^b ^c ^d Westwood, B; Chappell, M. (2006). Application of correlate summation to data clustering in the estrogen- and salt-sensitive female mRen2.Lewis rat. TMBIO '06 (ACM). pp. 21–26. doi:10.1145/1183535.1183542.
^ Swinscow, T. (1997) Statistics at Square One. BMJ Publishing Group.
^ Mandel, J. (1984) The Statistical Analysis of Experimental Data. Dover Publications, Mineola, NY.

Categories:

Covariance and correlation

Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

List of mathematics articles (C) — NOTOC C C closed subgroup C minimal theory C normal subgroup C number C semiring C space C symmetry C* algebra C0 semigroup CA group Cabal (set theory) Cabibbo Kobayashi Maskawa matrix Cabinet projection Cable knot Cabri Geometry Cabtaxi number… … Wikipedia
List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… … Wikipedia
eye, human — ▪ anatomy Introduction specialized sense organ capable of receiving visual images, which are then carried to the brain. Anatomy of the visual apparatus Structures auxiliary to the eye The orbit The eye is protected from mechanical injury… … Universalium
Zero-point field — In quantum field theory, the zero point field is the lowest energy state of a field, i.e. its ground state, which is non zero. [cite book | last = Gribbin | first = John | title = Q is for Quantum An Encyclopedia of Particle Physics | publisher … Wikipedia
History of science — History of science … Wikipedia
Fibromyalgia — Classification and external resources The location of the nine paired tender points that comprise the 1990 American College of Rheumatology criteria for fibromyalgia. ICD 10 M … Wikipedia
Transcranial doppler — (TCD) is a test that measures the velocity of blood flow through the brain s blood vessels. Used to help in the diagnosis of emboli, stenosis, vasospasm from a subarachnoid hemorrhage (bleeding from a ruptured aneurysm), and other problems, this… … Wikipedia
nature, philosophy of — Introduction the discipline that investigates substantive issues regarding the actual features of nature as a reality. The discussion here is divided into two parts: the philosophy of physics and the philosophy of biology. In this… … Universalium
Geometric algebra — In mathematical physics, a geometric algebra is a multilinear algebra described technically as a Clifford algebra over a real vector space equipped with a non degenerate quadratic form. Informally, a geometric algebra is a Clifford algebra that… … Wikipedia
Pre-Illinoian Stage — The Pre Illinoian Stage is the name currently used for early and middle Pleistocene glacial and interglacial deposits within North America. As the oldest stage in the North American nomenclature, it precedes the Illinoian Stage.Hallberg, G.R.,… … Wikipedia

Academic Dictionaries and Encyclopedias

Correlate summation analysis

Contents

Correlate summation template

Discrete correlate summation

Aggregate correlate summation

Non-linear modeling

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Correlate summation analysis

Contents

Correlate summation template

Discrete correlate summation

Aggregate correlate summation

Non-linear modeling

References

Look at other dictionaries:

Share the article and excerpts

Direct link