Partial Correlation Network Analysis for Mixed Data

Thumbnail Image
Leong, Shirley Hui Yee
Issue Date
Research Projects
Organizational Units
Journal Issue
Alternative Title
The partial correlation is well defined for continuous data and popularly used in network analysis. Its strength is in its interpretation as the relationship between two variables after removing the effects of other variables. We follow up on a recent proposal of such a measure for categorical data, but the properties of which were not well studied. The new partial correlation is defined as the first canonical correlation of Pearson residuals from logistic regressions. This is analogous to the continuous case, where the partial correlation is obtained from correlating residuals from linear regressions. A simulation study is presented to examine the properties of the new partial correlation and compare it to other measures, such as the partial phi coefficient. In the limiting case, the new partial correlation and the partial phi coefficient converge in estimate and inference. However, the partial phi coefficient cannot be applied to multi-categorical data. Furthermore, it is not an efficient measure to control for more than one variable. The new partial correlation is well defined for the multi-categorical case and can readily control for more than one variable. Being derived as the canonical correlation, the new partial correlation can also measure the relationship between continuous and categorical variables as the multiple correlation between the Pearson residuals from the logistic regression and the usual residual from the linear regression when the response variables are categorical and continuous respectively. Now that we are fully capable of obtaining partial correlation networks for any data types, continuous, categorical or mixed, our next goal is to compare the network structure between different groups and to examine the impact of continuous, in addition to categorical covariates, on the pathway connections. This is accomplished by extending the two-level regression approach for continuous data originally developed by our research group (Pradhan, 2009) to categorical data and mixed data network analysis. By linearly regressing the first canonical variates and replacing the slope coefficient with an expression of the covariates, we can test for the effect of covariates (both categorical and continuous) on the partial correlation and the network structure. This new covariate partial correlation network analysis approach is illustrated through two studies on the links between human genotypes (single-nucleotide polymorphisms) and disease phenotypes.
141 pg.
The Graduate School, Stony Brook University: Stony Brook, NY.
PubMed ID