EXPORT CITATION

Chapter-10 Correlation and linear regression

BOOK TITLE: Research Methodology Simplified: Every Clinician a Researcher

Author
1. Parikh Mahendra N
2. Mukherjee Joydev
3. Hazra Avijit
4. Gogtay Nithya
ISBN
9789350250037
DOI
10.5005/jp/books/11435_10
Edition
1/e
Publishing Year
2010
Pages
6
Author Affiliations
1. Seth GS Medical College and Nowrosjee Wadia Maternity Hospital, Mumbai, Seth Gordhandas Sunderdas Medical College, Nowrosjee Wadia Maternity Hospital, Mumbai, Maharashtra, India; Shushrusha Citizens’ Cooperative Hospital, Mumbai, Maharashtra, India; Fertility Sterility, India; The Journal of Obstetrics and Gynaecology of India, Nowrosjee Wadia Maternity Hospital, Mumbai, Mumbai, Maharashtra, India, Mumbai, Seth GS Medical College and Nowrosjee Wadia Maternity Hospital, Mumbai, Maharashtra, India
2. North Bengal Medical College, West Bengal, India, RG Kar Medical College, Kolkota, RG Kar Medical College, Kolkata, India, RG Kar Medical College, Kolkata, RG Kar Medical College, Kolkata, West Bengal, India
3. Institute of Postgraduate Medical Education, and Research, Kolkata, Institute of Postgraduate Medical Education and Research, Kolkata, India, Institute of Postgraduate Medical Education and Research (IPGMER), Kolkata, West Bengal, India
4. Seth Gordhandas Sunderdas Medical College and King Edward Memorial Hospital, Mumbai, Maharashtra, India; Journal of Postgraduate Medicine, Seth GS Medical College and KEM Hospital, Mumbai, India, Department of Clinical Pharmacology, Seth GS Medical College and KEM Hospital, Mumbai, India
Chapter keywords

Abstract

Correlation and linear regression are commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between a pair of variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson’s correlation coefficient (r). The value r2 denotes the proportion of the variability of y that can be attributed to its linear relation with x, and is called the coefficient of determination. If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman’s rho (r), may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns p < 0.05. A 95% confidence interval for the population correlation coefficient can also be calculated. Linear regression is a technique that attempts to link the two variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. Generally, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analyses are based on certain assumptions and misleading conclusions may be drawn if these are not met. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Clustering (subgroups) or outliers within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer towards causation, the two are not synonymous.

© 2019 Jaypee Brothers Medical Publishers (P) LTD.   |   All Rights Reserved