Fundamentals of Biostatistics B Sarmukaddam Sanjeev
INDEX
×
Chapter Notes

Save Clear


Introduction1

 
ABOUT BIOSTATISTICS
Biostatistics which is an applied branch of statistics is very heavily utilized for Medical research methodology. Initially, statistics worked as a branch of mathematics. Although all statistical methods have very sound mathematical foundation, we are going to discuss only applied part of the subject. Some of you might had elementary statistics in the syllabus (of 11th and 12th standard) as a part of mathematics. Some of you may be familiar with few concepts otherwise. However, we are not going to assume any background in statistics/mathematics. Since we are looking at only applied side of the subject, it is not essential either. In fact, all statistical methods have a sound logic behind them making the subject easily understandable. Most of the things are just above common sense. It is a different matter that unfortunately common sense is not so common but needs to be learnt systematically (ref.: The initial examination of data, Jr Roy Stat Soc A, 148, 1985).
Serious consequences follow when erroneous conclusions are drawn from invalid data or through faulty or incomplete analysis. Sometimes, over analysis is done. One of the disadvantages of computer technology is that it can make presentation such that the poor data may look good. The most unreliable and invalid information, collected in the most unsystematic and/or most unscientific way can easily be analysed with the help of computer and can be thoroughly impressive. In this respect, the important point/adage to remember with regard to data processing is “garbage in – garbage out”. It is important to know ‘from where and how the data are collected’ (in addition to know ‘for what?’). One needs to be able to make sound independent judgements regarding the adequacy, validity and reliability of information given and conclusions, recommendations offered.
Weight is considered a sensitive indicator of growth and nutritional status of a child younger than 5 years. A growth chart is used to monitor the progress of a child by plotting weight at different ages (in months) and comparing the trend with a standard. Obtaining such reference lines is one of the many applications of biostatistics. In ICU patient's condition/progress is monitored with respect to temperature, BP, heart rate, etc. by plotting the values on some-sort of control charts. Such charts with reference lines could be made for many more variables to be observed in other 2disorders. In fact, all the concepts in quality control can very much be adapted/used in medical care. The term “Quality Control” which is well known in production industry is gaining importance in medical care, more so since it has come in the purview of consumer protection act.
Although statistics has enumerable applications in other branches of science, arts or commerce (including management), in medical science also it is very useful because one of the characteristics of people is marked variation that is present or occurs between individuals. For example – the average height of an Indian man is 5′ 3′′ but one fairly frequently sees men from 4′ 10′′ upto 6′ 0′′. People on average consult their general practitioner about 3 times in a year but there are few people who hardly ever see a doctor and a few who are in and out every week. We often find such marked variation in most of the characteristics of individual with whom we have to deal in medical science. Despite such large variation, statistics help us to find some pattern in it (i.e. Unity in Diversity). Most common source of uncertainty in medicine (and health) is the natural biologic variability between and within individuals. Variation between samples, laboratories, instruments, observers, etc. further add to these uncertainties. There are other sources like incomplete information, imperfect instruments, lack of sufficient medical knowledge, poor compliance to the regimen, etc. Role of biostatistics is to handle these factors to minimize their effect. It is a science of management of uncertainties – to measure them and to minimize their impact on decisions.
General practitioner in his practice/clinical work is constantly comparing symptoms presented by patients with his own perception of the acceptable range of human physiology and behaviour. Many of the times the common meanings of the word ‘normal’ are socially defined. These definitions are often dependent on the health beliefs and expectations of individuals which may vary widely. A 75 year old lady with treatable congestive cardiac failure may regard her breathlessness as ‘normal’ because it is what she has become used to over many months. Likewise, an 80 year old man may regard deafness as ‘normal’ and not bother to ask his doctor whether the situation could be improved by removing wax or supplying a hearing aid. Therefore, it is very important to know where physiology ends and pathology starts. Researchers themselves make errors in analysis but even when they do not, their conclusions are often misinterpreted or misused by people who have access to them. In fact, what often finds its way to the general public is distorted and garbled to the extent that it can cause more harm than good. Many people find themselves increasingly frustrated with the confusing and contradictory health advice in media.
To develop an ability to critically read, interpret and use the findings of the other's study or knowledge from other sources is one of the important purposes of acquiring knowledge of this subject apart from, of course, planning and analysing own study. To keep up to date information has become a necessity as medical science is progressing day by day. Prevalence of diseases like AIDS are increasing, new drugs and new therapies are developed. New diagnostic tools are used. Unless one completely understands research done elsewhere, one can not fully utilize the findings for patients in day to day practice. Science of planning good studies/experiments/surveys and interpreting conclusion of other's is well developed these days. Sufficient knowledge of “Research Methodology” is essential for this purpose. Wrong study design yields wrong conclusions. Poorly designed study yields poor (with much less power) conclusions.
It might be useful to point out two different meanings attached to the term “statistics”. The term statistics has been used to indicate facts and figures of any kind : health statistics, hospital statistics, vital statistics, business statistics, etc. For this purpose, a plural sense of the term is used. The term is also used to refer to a branch of science developed for handling data in general. The essential features of statistics are evident from the various definitions of statistics:
3
  1. Principles and methods for the collection, presentation, analysis and interpretation of numerical data of different kinds:
    1. Observational data (measurement, survey data)
    2. Data that have been obtained by a repetitive operations (experimental data)
    3. Data affected to a marked degree by a multiplicity of causes
  2. The science and art of dealing with variation in such a way as to obtain reliable results.
  3. Controlled, objective methods whereby group trends are abstracted from observations on many separate individuals.
  4. The science of experimentation which may be regarded as mathematics applied to observational data.
Generally singular sense of the term is used for this purpose. In fact this term is very old. You will find similar word in ancient languages like
Status – Latin
Statista – Italian
Statistik – German
Meaning of all these words is “political state”. In ancient time, the Govt. used to collect the information regarding the population and property/wealth of the country, the former enabling the Govt. to have an idea of the manpower of the country and the latter providing it a basis for introducing new taxes and levies.
There are many branches of applied statistics depending on the area of application. Few special methods are also developed to deal with specific / peculiar situations arising in those fields. One such major branch is Biostatistics. When the data being analyzed are derived from the biological sciences or medicine the term biostatistics is used to distinguish this particular application of statistical tools and concepts.
Biometry is the term which is used generally as synonym of the term Biostatistics but biometry is more than biostatistics as a biometrician is suppose to be in a position to take the measurements, (or collect the relevant data) himself. Therefore, biometry can be defined, in short, as quantitative biology. Information which is presented in numerical form are the data. This information could be either generated through observation / experimentation or may be compiled from other sources like old records or documents, medical records, census, etc. Data could be of different types or measured at different levels. Identification of type and level of measurement is important because further treatment for drawing valid and useful conclusions on its basis depends on it.
 
Types of Data and Levels of Measurement
The data mainly are of two types:
  1. Qualitative data
  2. Quantitative data
A characteristic, which may take on different values, i.e. which may vary in different persons, places, or things is called variable. Therefore, data are realized values of this variable. Variable can be of qualitative or quantitative type. Qualitative data are also called, sometimes, as categorical or enumeration data and quantitative data are called measurement or metric data. It is essential to know type of data because they require separate statistical treatment, (i.e. any further statistical treatment depends on what type of data you are handling).
Quantitative data can be either continuous (value can be fractional ex. weight 39.6 kg.) or discrete (only full integer values ex. family size).
Qualitative variables are measured either on a nominal or an ordinal scale and quantitative variables are measured on an interval or a ratio scale.
Nominal: Observations are placed into broad categories which may be denoted by symbols or labels or names. Ex. Blood Groups, Diagnostic categories, Gender, Marital status, Cause of death.
Ordinal: Categories or observations are ranked or ordered. Each category is in unique position in relation to other categories but distances between the categories are not known. Ex. Severity of illness, Socioeconomic status, Ranking of student according to marks.4
Interval: In addition to ordinal level of measurement, distance between any two numbers (values of the variable) is fixed and equal. The origin is arbitrary i.e. the zero point for the scale may be arbitrary (as it is, for example, in the Fahrenheit and Celsius scales of temperature measurement). Examples of interval scale measurements include year of birth and other variables measured in calendar time. These particular variables may be readily transformed to ratio scale measurements by, for example, conversion of year of birth to age at some fixed point. Other examples : Intelligence coefficient, Score on most of psychiatric scales.
Ratio: In addition to interval level of measurement it has true zero point as its origin, i.e. zero indicates absence of the variable. That is a ratio scale has an absolute or natural zero which has empirical meaning. Ratio of any two scale points is meaningful. Ex. Height, Weight, generally most of the physiological or biochemical variables.
All scales have certain formal properties. For nominal scale, the only relation involved is that of equivalence. Ordinal scale incorporates not only the relation of equivalence but also the relation ‘greater than’. All members of the upper class, say in socioeconomic status, are higher (wrt prestige or social acceptability) than all members of the middle class. The middle class members, in turn, are higher than the lower class. The equivalence (=) relation holds among members of the same class, and the > relation holds between any pair of classes. Any order-preserving transformation does not change the information contained in an ordinal scale.
Any change in the numbers associated with the positions of the objects measured in an interval scale must preserve not only the ordering of the objects but also the relative differences between the objects. That is the interval scale is unique up to a linear transformation. Thus the information yielded by the scale is not affected if each number is multiplied by a positive constant and then a constant is added to this product. The operations of arithmetic are permissible on the intervals between numbers assigned to the objects. Ratio scales are achieved only when all four of these relations (namely 1. equivalence 2. greater than 3. known ratio of any two intervals and 4. known ratio of any two scale values) are operationally possible to attain. All arithmetic operations are permissible (for ratio scale) on the numerical values assigned to the objects themselves as well as on the intervals. Ratio scale variable can measured as interval or ordinal or nominal. One can move from bottom to top but not from top to bottom if one wants to change the scale or convert the data in different (lower) scale.
Many uses of statistics\biostatistics could be listed, however, the main of uses of statistics in general are
  1. To collect data in best possible way. This includes methods of
    1. Designing forms for data collection
    2. Organising the collection procedure
    3. Designing and executing research
    4. Conducting surveys in a population
  2. To describe the characteristics of a group or a situation. This is accomplished mainly by
    1. Data collection
    2. Data summary
    3. Data presentation
  3. To analyse data and to draw conclusions from such analyses. This involves the use of various analytical techniques and the use of probability in drawing conclusions.
 
About (How to Use) This Book
Although there are 18 chapters in this book, all are not equal in length and all are not given the same amount of emphasis. I have divided these chapters in three types as follows :
I ⇒ Minimum elaboration/amplification
II ⇒ Moderate/Medium elaboration/amplification
III ⇒ More/Maximum elaboration/amplification.
There are six chapters in each division as below.
Division/Type of chapter
Chapter number
I
1,2,7,8,11,15.
II
3,4,9,10,13,14.
III
5,6,12,16,17,18.
5
Few reasons gave rise this division were – keeping the overall length of the book limited was essential, good references are easily available for type I or II chapters, topics in type I and II are generally covered in most of basic books (therefore there was no point in repeating elaboration), few topics (like ‘design of experiment’, ‘multivariate analysis’) need separate books and they are available (therefore more elaboration was not warranted), sometimes further elaboration was ‘out of scope’ for this book, etc. It is not always that length of type III chapter is more; nevertheless, elaboration is maximum to the necessary extent.
While many readers may read this book (completely or use the above division) in the order written. Some may choose from the table of contents (what is of particular interest to them). However, there are some choices and sequences that may help the reader select features of specific relevance.
General: chapters 1 and 18.
Descriptive statistics: chapters 1,2,3, and 10.
Design of sample survey or experiment: chapters 4,5,13, and 17.
Clinical interpretations: chapters 16 and 18.
Estimation: chapters 3,6,10, and 12.
Inference: chapters 7,8,9,14, and 15.
Association: chapters 11 and 12.
Though specific references are given at the end of each chapter, few books I like and often refer are
  1. ‘A short textbook of ‘Medical Statistics’
    Sir Austin Bradford Hill
    10th Edition, ELBS, London, 1977.
  2. ‘Statistical Methods in Medical Research’
    P Armitage and G Berry
    3rd edition, Blackwell Scientific Publications, London, 1994.
  3. ‘Medical Biostatistics’
    A Indrayan and Sanjeev Sarmukaddam
    Marcel Dekker, Inc, New York, 2001.
  4. ‘Statistical Methods for Comparative Studies: Techniques for Bias Reduction’
    Sharon Anderson, et al.
    John Wiley and Sons, New York, 1980.
  5. ‘Statistical Methods for Rates and Proportions’
    Joseph L.Fleiss
    John Wiley and Sons, New York, 1973.
  6. ‘Epidemiology in Medicine’
    CH Hennekens and JE Buring
    Little, Brown and Company, Boston, 1987.
  7. ‘Clinical Epidemiology: A Basic Science for Clinical Medicine’
    David L. Sackett, et al.
    2nd edition, Little, Brown and Company, Boston, 1991.
  8. ‘Nonparametric Statistics for Behavioral Sciences’
    Sidney Siegel
    McGraw-Hill Kogakush, Tokyo, 1956.
  9. ‘Biostatistics: A Foundation for Analysis in the Health Sciences’
    Wayne W. Daniel
    5th edition John Wiley and Sons, New York, 1987.
  10. ‘Primer of Biostatistics’
    Stanton A.Glantz
    2nd edition, McGraw Hill Information Services Company, New York 1989.
  11. ‘An Introduction to Biostatistics’
    T Glover and K Mitchell
    McGraw Hill, Boston, 2002.
  12. ‘Essentials of Medical Statistics’
    BR Kirkwood
    Blackwell Science Ltd, Oxford, 1988.
In addition to these general ones there are few specific (devoted to specific topic) like
  1. Applied Logistic Regression
    DW Hasner and S Lemeshow
    JohnWileyand Sons, New York, 1989.
  2. Beyond Normality
    RS Galen and SR Gambino
    John Wiley and Sons, New York, 1975.
  3. Statistical Principle in Experimental Designs
    BJ Winer
    McGraw Hill, New York, 1971.6
These books are very good and I learned so much from them that they are not quoted every time as it is almost impossible to distinguish the source. This is a list of books for further reading. As mentioned in the preface, I have tried to touch upon as many topics as possible in this book, however, it is impossible to cover all relevant topics in one book (as it is not an encyclopaedia on the subject). Few topics (for example – time series analysis, modelling, stochastic processes) though important, are not covered at all. This is not to claim that the topics covered are complete (in the sense nothing more is beyond). In fact, what all covered is only fundamental (therefore the name) to this ever growing important subject. Some amount of repetition (especially in design chapter) was un-avoidable. It may be more in bulleted paragraphs.