*Statistic*or

*datum*means a measured or counted fact or piece of information stated as a figure, such as height of a person, birth weight of a baby, etc.

*Statistics*or

*data*would be plural of the same, stated in more than one figure, such as height of 2 persons, birth weight of 5 babies, etc. They are collected from experiments, records and surveys, in all walks of life, such as economics, politics, education, industry, business, administration, etc. Medicine too, including Preventive Medicine and Public Health, is one such field.

*Statistics*though apparently plural, when used in a singular sense, is a

*science of figures*. It is a field of study concerned with techniques or methods of collection of data, classification, compilation, summarizing, interpretation, drawing inferences, testing of hypotheses, making recommendations, etc. when only a part of data is used.

In any book on statistical methods the word

*statistics*is used both as a plural of statistic and as a science of figures.BIOSTATISTICS

*Biostatistics*is the term used when tools of statistics are applied to the data that is derived from biological sciences, such as medicine.

Any science demands precision for its development, and so does medical science. For precision, facts, observations or measurements have to be expressed in figures.

*“When you can measure what you are speaking about and express it in numbers, you know something about it but when you cannot 2measure, when you cannot express it in numbers, your knowledge is of meagre and unsatisfactory kind.”*

… Lord Kelvin

Everything in medicine, be it research, diagnosis or treatment, depends on counting or measurement. High or low blood pressure has no meaning, unless it is expressed in figures. Incidence of tuberculosis or death rate in typhoid is stated in figures. Enlargement of spleen is measured in fingers’ breadth. Thus medical statistics or biostatistics can be called quantitative medicine.

In nature, blood pressure, pulse rate, action of a drug or any other measurement or counting varies not only from person to person but also from group to group. The extent of this variability in an attribute or a character, whether it is by chance, i.e. biological or normal, is learnt by studying statistics as a

*science*.Comparison of a variable in two or more groups is of great importance in applied scientific practice of medicine, e.g. infant mortality rate in developing countries like India was around 73 per thousand live births in 1994 while in developed countries like the USA, UK and Japan, the rates have gone down to about 5 per thousand live births per year due to external factors like socioeconomic advancement, better application of scientific knowledge in medicine or improved health services. Rise in pulse rate noted after an injection of a drug may be by chance or due to the effect of drug.

Variation more than natural limits may be pathological, i.e. abnormal due to the play of certain external factors. Hence, biostatistics may also be called a science of variation. The data after collection, lying in a haphazard mass are of no use, unless they are properly sorted, presented, compared, analyzed and interpreted. They mean something more than figures, give a dimension to the problem and even suggest the solution. For such a study of figures, one has to apply certain mathematical techniques called statistical methods, such as calculation of standard deviation, standard error and preparation of a life table. Though these methods are quite simple and general in application, medicos follow them only when they are presented in a familiar way giving day-to-day medical examples. Moreover, medical statistics merit special attention as they deal with human beings and not with material objects or lower 3animals. Medical observer has to give his or her opinion or make an impression after applying these methods.

“General impressions are never to be trusted. Unfortunately when they are of long standing nature, they become fixed rules of life and assume a prescriptive right not to be questioned. Consequently those who are not accustomed to original enquiry, entertain a hatred and horror of statistics. They cannot endure the idea of submitting their sacred impressions to cold blooded verification. But it is the triumph of scientific men to rise superior to such superstitions, to desire tests, by which the value of their beliefs may be ascertained and to feel sufficiently, masters of themselves to discard contemptuously whatever may be found untrue.”

… Francis Galton

A medical student should not depend on a statistician for the statistical analysis. For professional interpretation of his or her results, he or she should learn the application of methods himself or herself which do not require knowledge of mathematics higher than what he or she had acquired at school. However, he or she should take the guidance of a qualified statistician right from the beginning of any scientific study till drawing the conclusions.

Medical Statistics go under different names when applied in different fields such as:

*Health statistics*in public health or community health.*Medical statistics*in medicine related to the study of defect, injury, disease, efficacy of drug, serum and line of treatment, etc.*Vital statistics*in demography pertaining to vital events of births, marriages and deaths.

These terms are overlapping and not exclusive of each other.

Application and Uses of Biostatistics as a Science

Biostatistics is an integral part of research in all ramifications of biosciences and in general has the following uses:

- Measurement of population characteristics describing structure of a population, e.g. age, gender.
- Measurement of biological parameters of a population, e.g. mean height, weight, blood pressure.
- Measurement of disease in a community in terms of morbidity, disability and mortality, e.g. prevalence of a disease, disability rate, infant mortality rate.
- Association between causative factors and effects by comparison with statistical tests of significance, e.g. proportion of smokers in those with lung cancer and those without lung cancer, prevalence of cardiovascular diseases in obese and non-obese individuals.
- Evaluation of preventive and therapeutic strategies, e.g. efficacy of vaccines, fall in blood pressure by antihypertensive drug.
- Utilization of services, e.g. vaccine coverage, rate of admission in hospital.
- Prediction of events by modeling, e.g. life table.

Some examples of its use specific to various disciplines of medical science are outlined below.

In Physiology and Anatomy

- To define what is normal or healthy in a population and to find
*limits of normality*in variables, such as weight and pulse rate—the mean pulse rate is 72 per minute but the limits on either side of mean up to which it may be considered normal, has to be established with certain appropriate techniques. - To find the
*difference between means*and*proportions*between two groups of population or at two different places or in different periods. The mean height of boys in Gujarat is less than the mean height in Punjab. Whether this difference is due to chance of a natural variation or because of some other contributory factors, such as better nutrition has to be decided.

In Pharmacology

- To find the action of drug—a drug is given to animals or humans to see whether the changes produced are due to the drug or by chance.
- To compare the action of two different drugs or two successive dosages of the same drug.
- To find the relative potency of a new drug with respect to a standard drug.

In Medicine

- To compare the efficacy of a particular drug, operation or line of treatment—for this, the percentage cured, relieved or died in the intervention and control groups, is compared and difference due to chance or otherwise is found by applying statistical techniques.
- To find an association between two attributes, such as cancer and smoking, or filariasis and social class—an appropriate test is applied for this purpose.
- To identify signs and symptoms of a disease or syndrome, e.g. Cough in typhoid is found by chance and fever is found in almost every case. The proportional incidence of one symptom or another indicates whether it is a characteristic feature of the disease or not.

In Community Medicine and Public Health

- To test usefulness of sera and vaccines in the field—percentage of attacks or deaths among the vaccinated subjects is compared with that among the unvaccinated ones to find whether the difference observed is statistically significant.
- In public health, the measures adopted are evaluated, e.g. Lowering of morbidity rate in typhoid after pasteurization of milk may be attributed to clean supply of milk, if it is statistically proved. Fall in birth rate may be the result of family planning methods adopted under National Family Welfare Program or due to rise in living standards, increasing awareness and higher age of marriage.

Thus, by learning the methods in biostatistics, a student learns to evaluate articles published in medical journals or papers read in medical conferences. He or she understands the basic methods of observation in his or her clinical practice or research.

*“He who accepts statistics indiscriminately, will often be duped unnecessarily. But he who distrusts statistics, indiscriminately will often be ignorant, unnecessarily.” (WA Wallis and HV Roberts, in Nature of Statistics. The Free Press, New York, 1965).*

Application and Uses of Biostatistics as Figures

Health and vital statistics are essential tools in demography, public health, medical practice and community services. Recording of vital events in birth and death registers and diseases in hospitals is like book keeping of the community, describing the incidence or prevalence of diseases, defects or deaths in a defined population. Such events properly recorded form the eyes and ears of a public health or medical administrator, otherwise it would be like sailing in a ship without compass. Thus, biostatistics as a science of figures will tell:

- What are the leading causes of death?
- What are the important causes of sickness?
- Whether a particular disease is rising or falling in severity and prevalence?
- Which age group, sex, social class of people, profession or place is affected the most?
- The levels or standards of health reached.
- Age and sex composition of population in a community.
- Whether a particular population is rising, falling, aging or ailing?

Scope

In this book, an attempt is made to highlight the basic principles of statistical methods or techniques for the use of medical students. The approach is to equip medicos and other users to the extent that they may be able to appreciate the utility and usefulness of statistics in medical and other biosciences. Certain essential methods in biostatistics must be learnt to understand their application in diagnosis, prognosis, prescription and management of diseases in individuals and community and apply in research conducted during student life and beyond. The subject forms an integral part of all disciplines of medicine as explained already.

Numerous examples have been worked out in this text and the various steps of calculations have been elaborated. Use of various statistical parameters like mean, standard deviation, standard error, correlation coefficient, etc. and their application in various statistical tests like

*Z*,*χ*^{2}, ‘*t*’, etc. have been explained. Situations in which various tests should be applied are also given in detail. Still simplicity has been the watchword and intricacies have been avoided, to minimize antipathy or averseness by the medical students. All previous editions of this book have been found to be popular among the students of other biosciences as well such as zoology, botany, humanities, agriculture, anthropology, etc.Common Statistical Terms

Before learning the methods in biostatistics one should understand and remember some commonly used terms, their symbols and notations and refer to these as and when needed.

- Variable: A variable, also called an attribute, is a characteristic that can vary, i.e. it may take different values in different persons, places or things. In other words it is a quantity that varies within limits, such as height, weight, blood pressure, age, etc. It is denoted as
*X*and notation for orderly series as*X*_{1},*X*_{2},*X*_{3}, ….*X*. The suffix_{n}*n*is symbol for number in the series.*Σ*(sigma) stands for summation of results or observation. - Observation: An event and its measurements, e.g. blood pressure (event) and 120 mm Hg (measurement).
- Observational unit: The source that gives observations, such as object, person, etc. In medical statistics the term individuals or subjects is used more often.
- Data: A set of values recorded on one or more observational units. Data are obtained by measurement or counting and are considered to be the raw materials of statistics.
- Qualitative data: This is the result of an attribute that cannot be expressed in numbers. Hence it cannot be measured but can be expressed by classifying into different groups and indicating the number of observations in that group, which is also known as frequency, e.g. blood group A, B, O, AB or gender male, female, third gender.
- Quantitative data: It can be counted and expressed in numbers. Quantitative data can be discrete that is expressed in whole numbers, e.g. number of family members, or continuous that can take infinite values in a specific range and the value can be presented as fractions or decimals, e.g. weight, height.
- Population: It is an entire group of people or study elements—persons, things or measurements for which we have an interest at a particular time. Populations are determined by our sphere of interest. It may be
*infinite*or*finite*. If a population consists of fixed number of values, it is said to be finite. If population consists of an endless succession of values the population is an*infinite*one. It has to be fully defined, such as all human beings, all families joint or nuclear, all women of 15–45 years of age or only married women, all patients, all doctors in service or in practice and so on. Such a population cannot be counted and hence invariably gives*qualitative data*. If it is finite or limited in number it can easily be counted.9A statistical population may also be birth weights, hemoglobin levels, readings of a thermometer, number of RBCs in the human body, etc. Such a population mostly gives*quantitative data*. It is finite or small in number or infinite or unlimited in number that cannot be easily counted. - Sampling unit: Each member of a population.
- Sample: It may be defined as a part or a subset of a population. It is a group of sampling units that form part of a population, generally selected so as to be representative of the population whose variables are under study. There are many kinds of samples that can be selected from a population. Various methods employed are described later in the book.
- Parameter: It is a summary value or constant of a variable that describes the population, such as its mean, standard deviation, standard error, correlation coefficient, proportion, etc. This value is calculated from the sample and is often applied to population but it may or may not be a valid estimate of population. Though not desirable, parameter and statistic are often used as synonyms. Basic difference between the two is that a parameter is the constant that describes the characteristics of a population, while statistic is a function that describes the characteristics of a sample.
- Parametric test: It is one in which population constants as described above are used such as mean, variances, etc. and data tend to follow one assumed or established distribution, such as normal, binomial, Poisson, etc.

Notations for a Population and Sample Values

Values of a sample are known as statistics and values of a population are known as parameters, as already explained. Roman letters are used for statistics of samples and Greek for parameters of population.

*Common notations are:*

Summary measures | Sample statistics | Population parameters |
---|---|---|

Number of subjects | N | |

Mean | µ | |

Standard deviation | s | σ |

Variance | s^{2} | σ^{2} |

Proportion | p | P or π |

Complement of proportion | q | Q or (1– π) |

*Other symbols commonly used are*:

= | : | Equal to |

> | : | Greater than |

≥ | : | Greater than or equal to |

< | : | Less than |

≤ | : | Less than or equal to |

Z | : | The number of standard deviations from the mean or standard normal deviate/variate |

% | : | Percent |

r | : | Pearson's correlation coefficient |

p | : | Spearman's correlation coefficient |

O | : | Observed number |

E | : | Expected number |

d.f. or df | : | Degrees of freedom |

k | : | Number of groups or classes |

P(A) | : | Probability of event A |