Introduction
The use of observational epidemiology in the identification of environmental risk factors for chronic disease originated in, or at least gained critical impetus from, the publication in 1950 of several papers suggesting a relationship between smoking and lung cancer. Discussing the capabilities of the method used, the authors of one of these works (Doll and Hill, 1950) considered it likely that for many chronic diseases there would exist a range of factors capable of affecting their aetiology and incidence. A quarter of a century later, Doll, now working with R. Peto, returned to this point and suggested that the influence of a risk factor on rates of disease in a population could only be understood fully when the presence and activity of all the other risk factors to which the population was liable could be taken into account (Doll and Peto, 1976).
In the course of the half century which has followed the initial remarks of Doll and Hill, much has been learned about the multifactorial nature of chronic disease. Observational epidemiology has pursued with considerable success the identification and characterisation of risk factors and has confirmed that a large number may influence rates of a given disease. The enumeration by Hopkins and Williams (1981) of 246 factors which may affect the risk of developing coronary heart disease is one example. Epidemiologists have acknowledged the presence and importance of this multiplicity of risk factors by taking increasing care in the selection and characterisation of the populations which they study and by developing analytical statistical techniques to isolate those of interest and to set the remainder aside.
Over this period, it has also become apparent that, in logical terms, single risk factors which are necessary or sufficient for the development of chronic disease are rare if not unknown. Remaining with the example of smoking and lung cancer, it is clear that, although many workers believe smoking to be the most important and prevalent risk factor for the disease, cases do occur in non-smokers. Smoking is thus not a necessary factor. Likewise, the majority of smokers do not develop lung cancer, so that smoking is not a sufficient factor for its development. Generally, the rarity of exclusive relationships between exposure and outcome requires us to recognise that several factors are involved in rates of any chronic disease in a population and, perhaps, in the risk in individual and thus that chronic disease is the result of multifactorial interactions
While the objective of much mainstream epidemiology has been the identification of risk factors the treatment of such multifactoriality as an impediment to understanding which must be recognised and dealt with has been both understandable and justifiable. I will argue in this essay, however, that it is also important to understand the joint importance of all the factors which are operative in the aetiology of disease and that it is now appropriate to consider not only analytical techniques used to break the nexus of factors into its components but also synthetic approaches aimed at a more complete understanding of the significance of the nexus as a whole.
While this is a philosophical question of considerable interest and complexity, its significance does not remain there. One of the principal values to society of epidemiology is the estimation of how the effects of risk factors, as determined from samples of a population, can be extrapolated to the greater population from which the sample was drawn and beyond. The incidence of disease in a population must be a function of all of the risk factors applying within it. Statistical correction for confounding factors, which is the usual way of dealing with multifactorial influences, may be sufficient for the characterisation of a specific risk factors but does not allow the joint influence of all risk factors, known or unknown, to be taken into account. It is sometimes argued that when the magnitude of association for one risk factor is large in proportion to all the others which are known, as is the case, for example, with smoking and lung cancer, the effects of these others will be insignificant and can be disregarded. As this essay progresses, I will show that in a multifactorial world this is not necessarily the case.
I will suggest that for epidemiology to take adequate account of the multifactorial nature of disease requires more than simply counting the number of risk factors which apply. Risk factors do not simply co-exist but interact and I will begin by suggesting that this occurs at three levels of observation.
Before doing so, I should explain that this is an essay intended to provoke thought and not an annotated review. None of what I discuss is, in itself, new or unfamiliar and much is obvious. It is thus unnecessary to burden the text with citations which would be either so general as to be useless or so specific as to be of limited explanatory value.
Level 1: interactions in prevalence
Risk factors are not always distributed randomly or independently between the members of a population. The behavioural or social factors which determine the likelihood that an individual is exposed to one risk factor may influence the likelihood that they are exposed to another. This is illustrated in figure 1 using as an example the spectrum of risk factors associated with smoking.
This diagram, while broadly correct, is not intended to provide a comprehensive picture of the situation but simply to illustrate the range of interactions which may occur. While the diagram shows, as is well known, that in comparison with non-smokers, smokers are more likely to be drinkers, to eat differently, to take less exercise and to differ in socioeconomic status, its purpose is to permit some more general observations.
Firstly, it is apparent that the prevalence of these risk factors is the result of complex interactions not simple relationships between pairs of risk factors. Thus while there is a relationship between smoking behaviour and diet, the latter may also be influenced by drinking, exercise and socioeconomic status. Conversely, all of these are related, in prevalence to smoking, so that no one of these factors may be, in isolation, an adequate predictor of another.
Secondly, the interactions between pairs of risk factors may be reciprocal or in one direction only. In some cases, the directionality is clear. While genetic constitution may affect an individual's tendency to smoke, drink, overeat or worry it is most unlikely that, over the number of generations accessible to epidemiologists, the presence or absence of any of these risk factors will have a detectable influence on their heritability. In other cases, such as smoking and drinking, it is likely that the influence of one on the prevalence of the other will be reciprocal although the influences will not necessarily by symmetrical in intensity.
Thirdly, interactions in prevalence can be quantitative as well as quantitative. Heavier smokers tend to be heavier drinkers, and vice versa, while, in developed countries, an inverse relationship between expressions of socioeconomic status and the likelihood of smoking can be detected.
Finally, and perhaps most importantly, few risk factors are simple or single entities. Rather, they are conventional labels for complexes which vary in their capacity for resolution into the actual and proximal risk factors which they comprise. For example, the risks associated with diet are poorly represented by the selection of a few groups of food thought to be important. There is ample evidence that attempts to refine our knowledge of dietary influences by concentrating on one or a few chemicals present in particular items are almost invariably unsuccessful in making the intended transition from epidemiology to biochemistry or toxicology. Diet comprises quantitative, qualitative and behavioural elements and cannot be regarded simply as a vehicle for harmful and beneficial chemicals. The most interesting example of a composite risk factor is, however, socioeconomic status because, while its influence on the risk of chronic disease appears to be almost indisputable, it does not, at the first level of examination, appear to present a single, obvious mode of action. Suggestions that it is a proxy for smoking, for example, are too simple minded, while to state that lower levels of education render people incapable of digesting the wisdom of public health advice tells us more about the intelligence of the authorities concerned than about that of the population. It is more likely that, as a risk factor, socioeconomic status reflects the different behavioural characteristics of different strata of society, the influence of economic factors upon the quality of an individual's environment and the effects of the psychological stress imposed by absolute and relative material deprivation.
In summary, risk factors are not dispersed at random within a population. An individual exposed to one risk factor is more likely to be exposed to certain others. Mutual interactions in prevalence will mean that certain clusters of risk factors are more likely to occur, and are hence more frequent, than others. While it is possible that the presence of one particular risk factor may have an over-riding influence on which others affect an individual, there is insufficient evidence to determine how frequently this may occur or to identify such dominating risk factors. Moreover, clustering of risk factors will not be deterministic but subject to the inevitable influence of chance. Thus, the frequency of different clusters will be a function of a number of interlocking and interacting frequency distributions and hence refractory to analytical solution. It is worth noting that in the hypothetical situation where risk factors are distributed by chance and with equal frequency, when the number of factors present approaches nine or ten, the number of possible combinations would approach or exceed the number of individuals in the population. The extent to which assortative clustering reduces the actual number of combinations present cannot be determined with the knowledge available at present.
Level 2: Multifactorial activity
A risk factor, by definition, effects some change in a person exposed to it which disturbs the normal biochemical and physiological processes of the body in such a way that, if the disruption is sufficient in extent, some alteration predisposing to disease is induced. Highly specific risk factors, such as particular chemical entities or infectious agents, may have specific and discrete biological effects. Many of the risk factors with which we are concerned here can, however, be multifactorial in their influence. That is, they may deliver several forms of biological insult. In addition, while the immediate biological effect of a risk factor may be specific, the consequences of this may lead to a more general form of disturbance which shared with other risk factors.
Add to this the observation that many risk factors are composite entities capable of exerting different forms of effect and we observe a further level of multifactoriality,. An analysis of how each may act would require reference to a vast body of literature and is, in any case, not the point of this paper. Rather, I suggest that modes of action can be considered at a more general level, as illustrated in figure 2. This diagram is not intended to be comprehensive and the terms used and their relationships to risk factors are open to discussion and argument. To do so would, however, miss the point of the diagram which is not to assert particular or specific modes of action for risk factors but to illustrate how risk factors may interact in their biological consequences.
A number of points can be made.
Firstly, some, but not all, risk factors may expose the body to potentially toxic chemicals which may participate in the processes of disease by inducing mutations which alter the properties of cells or distort regulatory mechanisms in tissues. Some of these chemicals, especially in the case of diet, may be specific to the risk factor. On the other hand, some are common to several risk factors. For example, exposure to polyaromatic hydrocarbons can derive from smoking, air pollution and diet.
Secondly, risk factors such as smoking or polluted air present particulate matter to the respiratory tract. While some of this may contain, and act as a vehicle for, active chemicals even chemically inert particulates, such as inorganic carbon, can have deleterious effects by impairing clearance mechanisms. Moreover, the ingestion of particulates by scavenging cells in the lung can lead to the release from cells of oxidative or inflammatory agents.
Thirdly, oxidative stress is widely regarded as being deleterious. It can be toxic or genotoxic directly, by virtue of the chemical activity of oxidative species and it can be disruptive to physiological regulation. All of the risk factors in figure 2 can induce oxidative stress. This may be direct in the case of smoking, air pollution and diet, or indirect in the case of socioeconomic status, through its association with these risk factors and as a consequence of the psychological stress which it may carry with it. Exercise is paradoxical in this context in that theoretically it is an oxidative stressor, through increased levels of oxidative metabolism, but is generally regarded as beneficial at moderate levels. Diet may contribute both oxidants and anti-oxidants to the body.
Finally, the metabolic syndrome is becoming associated with an increasing number of chronic diseases including various cancers and cardiovascular disorders. Related to a high glycaemic load and insulin resistance, as is diabetes, it also resembles the latter in its association with regulatory disturbance, oxidative stress and an inflammatory response. While its nature suggests a relationship to diet and obesity it is also related to other risk factors. It has been associated with smoking, for example, although the nature and direction of the association remains obscure.
It is thus apparent that a number of risk factors acting through various routes can influence several important pathophysiological process, genotoxicity, regulatory disruption and inflammation. The significance of this to the present argument is that different risk factors can contribute to the development of disease through common modes of action, be these the presentation of similar toxic chemicals or the induction of physiological imbalance. Perfectly valid arguments can be raised concerning, for example, the toxicological equivalence of benzo[a]pyrene presented to the respiratory tract by tobacco smoke or polluted air and perhaps that ingested via the diet. Likewise, inflammation is a very broad phenomenon and while the extent to which different manifestations are comparable and have similar effects could be debated at length, at a basal level, increased cell proliferation is likely to be a consequence of many of them. Thus, it is possible, or even likely, that any specific effects notwithstanding, different risk factors may contribute to disease through common routes.
Level 3: a multiplicity of mechanisms
Ultimately, a risk factor is a risk factor because, directly or indirectly, it is responsible for some biological change which lies on a mechanistic pathway to disease. Again, therefore, we cannot regard risk factors in isolation but as agents which make some incomplete contribution to the development of the disease process. Their contributions may be complementary, if they affect different processes, or they may be additive if they contribute increments to the same process. This should be qualified if we distinguish between mechanistic steps which are quantal and exist only in a permissive or non-permissive state and those whose efficacy depends upon some quantitative attribute
There are many theories of the development of chronic disease which have in common the suggestion that several, or many steps, are required and that some of these may have to occur before others. Using the development of cancer as an example, the process may require a genetic change in a cell, or, more strictly, the existence of a cell with a particular genetic constitution, which presents the potential for neoplastic growth. The probabilities of this change happening and of sufficient altered cells accumulating to have an influence are both affected by factors affecting cellular proliferation. Further changes to allow the normal controls against an expansion of a population of cells which is inappropriate at a particular time and place to be evaded are necessary. In the later stages of carcinogenesis, factors such as angiogenesis and those related to metastases are required. In other words, many things must happen. The burgeoning, if highly reductionist, field of molecular pathology has provided much data on such mechanisms but has contributed little to an overall understanding. What it does suggest, however, is that there exists considerable redundancy and duplication in the processes involved in both normal and pathological physiology. To return to logical terminology, there appear to be few, if any, processes at the molecular level which can be said to be either necessary or sufficient for the development of a chronic disease.
A general and conceptual way of regarding this state of affairs is presented in figure 3. Blue circles represent the basal state of a cell and orange shading those which can contribute to disease. Progression from left to right indicates further development towards the final condition while states which are equivalent in this respect are distributed vertically.
The development of disease can be seen as a path seeking process in which the successive attainment of necessary states in a population of cells is represented by dark circles joined by lines. The attainment of an effective state may or may not depend upon the prior existence of a permissive earlier state. If this is not the case, then isolated effective states or incomplete chains may exist. Aspects of this model could be described in terms of field carcinogenesis for those who favour this idea.
The main significance of this model is that the development of a chronic disease is not the obligatory satisfaction of a unique series of cellular requirements but the completion of one of a number of possible mechanistic series. It is likely that there will be a stochastic element in the chain of events which leads to a particular case although, to the extent that risk factors, or more correctly, their biological effectors, may have preferential effects, those applying to the case may also affect the route of the pathway. That is, the prior existence within a tissue of some relevant biological change may increase the probability that one particular pathway will be completed relative to the likelihood that other possibilities will be fulfilled. Under these circumstances the risk factor in question might be seen, in epidemiological terms, to have a synergistic interaction with others.
A secondary consequence is that the existence in a cell or tissue of a state thought to be important in the disease process does not mean necessarily that it was, or would be, significant in the case in point. Thus, the detection of an effect related to a particular risk factor may increase the likelihood that the presence of this risk factor contributed to the disease but is not an unequivocal demonstration that it did so.
What, for the sake of simplicity, the model does not incorporate is the possibility that some changes may be a necessary or possible consequence of the prior existence of other states. If this were the case, some of some pathways could be completed rapidly as a particular event is followed automatically by its successor. On the other hand, it could also mean that certain changes observed in a particular disease are merely consequences of other changes and do not lie on the mechanistic path.
While this is a very general model it is supported by equally general observations. In the case of lung cancer, for example, much research has been directed towards the molecular events leading to the final neoplastic state. Particular attention has been directed towards alterations in the structure and function of the protein which is encoded in the gene p53. Some workers attach considerable importance to observations that alterations may be found in around half the cases of lung cancer. This means, however, that around half the cases of lung cancer develop without a change in p53. While more precise classification of the disease could reduce this proportion, the observation suggests that the various roles attributed to the protein p53 in carcinogenesis can be satisfied by distinct alternatives.
All together now
The discussion in the last three sections can be brought together and summarised diagrammatically as in figure 4.
This hypothetical example proposes that six risk factors may act together in the development of a disease. Their prevalences in the population are inter-related such that they fall into two clusters, risk factors (1 – 3) and (2 – 4). These two clusters are not quite independent of each other in prevalence, however, as a relationship between the frequencies of risk factors 3 and 4 may link the frequencies of the clusters. The six risk factors together exert four forms of biological effect. Three of these are influenced by two risk factors, while risk factor 4 is the sole effector of effect 3. Two pairs of risk factors, (1,2) and (3,4) each contribute a single effect, while one, risk factor 5, has two forms of action. These effects act within a cellular milieu, shaped by the genetic constitution of the individual and stirred stochastically, until a complete pathway leading to disease has assembled itself.
I have chosen to show an example with more risk factors than effects. I could also have chosen a model with more effects than risk factors. Either would make the simple point that there is not necessarily a one to one correspondence between the two levels in the pathway. It might be argued that figure 4 contains a redundancy of risk factors as removing numbers one and six would seem to leave the situation unchanged. Observational epidemiology could present such an appearance for at least two different reasons. Firstly, two risk factors could be so linked in their distributions and prevalences in the population that both are present in a large number of cases. Secondly, it could be that while two risk factors act by presenting equivalent effects neither is capable of producing a sufficient quantity of effect in isolation and both are required before any risk can be detected.
Multifactorial epidemiology
While the multifactoriality of chronic disease is widely acknowledged in the literature its explicit discussion is less common. It is not the purpose of this paper to review that which exists. In any case, a large proportion of it is concerned with the relationship of epidemiology to causation as may be exemplified by a recent exchange between Greenland and Morgan (Ha-Duong, Casman and Morgan, 2004; Greenland, 2004; Casman, Ha-Duong and Morgan, 2004). This, while directed at multifactorial disease and causation, did not address the matters discussed in this essay. Causation is, in any case, a nuisance. It is a word which carries different meanings for epidemiologists, philosophers, lawyers, doctors and each of us as individuals. Its discussion can lead to the interminable and insoluble wrangles which have exercised philosophers for centuries. In the present context, I will use the term simply to describe an event, or series of events, which precedes an effect, the latter being, in our case, the clinical recognition of disease. To those of us concerned with the biological sciences, a cause must always be a provisional assignation and a convenient label rather than something which can be determined definitively. To do so amidst the vagaries of biological information is no more possible than to establish absolute truth or absolute falsehood.
What is required is a means of dealing with the multifactorial situation which I have described in the context of epidemiological research.
The model for multifactorial disease which I will use here is that of K.J. Rothman (Rothman, 1986). It is neither the only nor the most recent model but it seems reasonable and serves as a vehicle for this discussion. I use it as a basis for the ideas presented here but these are not dependent upon it.
Rothman was concerned with the observation that chronic diseases can seldom, if ever, be attributed to a single cause which is both necessary and sufficient.
He proposed that this situation is best understood by thinking of a number of component causes acting together as a sufficient cause
where A to F in the first of the two diagrams on the left are the component causes. He described this as sufficient but not necessary because he saw that different combinations of component causes could exist as a result of replacing one or more of the component causes by an equivalent, as in the second of the diagrams.
It is obvious that different sufficient causes need not have the same number of components. Any number from two upwards may suffice. It should also be clear that for any given case of a disease only one sufficient cause will apply, although, as will be seen below, one or more of its apparent components may be irrelevant and it could even seem that a second sufficient cause appeared to apply. On the other hand, the number of sufficient causes which pertains in a population will be limited only by the number which can be assembled from the available pool of component causes, while their frequency will be a function of the prevalence of the components. In principle, if enough interchangeable component causes existed, the number of sufficient causes could approach the number of cases.
It should be apparent that the component causes in this generalised discussion are broadly equivalent to risk factors identified in observational epidemiology. I will, however, continue to refer to them as component causes in this section, both for the sake of consistency and to allow contrasts with risk factors as described conventionally when it is appropriate to do so.
Rothman suggested that, for those wishing to attribute responsibility to the various factors involved in the development of disease, each of the component causes must be considered equivalent and that each should be allotted a responsibility of unity. He did so because without any one of the components, the aggregate would cease to be a sufficient cause. It can be argued, however, that if the concept of complementary or equivalent components is accepted and if one component can be replaced by another then its responsibility would thus depend upon the likelihood of this taking place and this would, in turn, depend upon the number of alternatives which existed and their relative frequency. In this context, risk factors as shown in figure 4 would represent such interchangeable entities, while 'effects' in that diagram are closer to Rothman's depiction of component causes.
Furthermore, in the diagram the sectors representing each component cause are of equal size. As suggested above, it is likely that some component causes will not provide a unique effect but will contribute incrementally to a broader underlying effect. Under these circumstances the size of the sectors would differ in proportion to their contribution. Put another way, some component causes, taking the term to be, in this case, synonymous with risk factors, may be divisible according to the biological effects which they contribute to the whole. The interchangeability between component causes and the biological effects which they may exert is illustrated in figure 5.
The significance, in terms of their contribution to the sum of disease, of different component causes in a population will depend upon the number of sufficient causes in which they participate and the number of cases which these contribute to the total. As Rothman expressed it, the aetiological fraction for a component cause will be equivalent to the fraction of the disease attributable to all the sufficient causes of which the component is a part. This will depend upon the prevalence of the component cause and the extent to which it is replaceable by others. Rothman suggested, using more conventional terminology, that the strength of the association between a risk factor and a disease is not an invariable and consistent property of a risk factor but depends upon the relative prevalence of other factors.
Although Rothman did not discuss the matter explicitly, these considerations are also applicable to the independence or otherwise of risk factors. Epidemiologists consider an influence to be an independent risk factor for a disease if it can be shown that the apparent risk associated with it remains significant after due correction for other factors of interest. In terms of the present model this represents the detection of sufficient causes in which the factor of interest was present but other specified component causes were not. The magnitude of the independent effect may represent the prevalence of this contributory cause not in the population of people but in the population of sufficient causes. The corollary of this argument should also be considered. If an independent association between a risk factor and a disease cannot be demonstrated it does not necessarily mean that the factor in question has been identified as a consequence of mutual association with a true component cause. It is possible that the risk factor acts by making some contribution, interchangeable with that of others, to a general influence involved in the aetiology of a disease.
Rothman’s model thus provides a rational framework which allows us to consider how different factors may interact in the aetiology of chronic disease. It has continuing value in preventing us from giving undue attention to single risk factors and in giving a broader perspective on the interactions of risk factors in both the individual and the population of which he or she is a member. In principle, it has further utility in allowing epidemiology to take more explicit account of such interactions in the design and analysis of studies. At present, however, there is insufficient information on the distribution, beyond their simple prevalence, and biological contribution of risk factors to permit this be achieved.
Interim summary
My arguments thus far have been directed at the following points.
The risk factors associated with chronic diseases are seldom, if ever, necessary and sufficient causes of disease. Rather, they appear to make an incomplete contribution to disease processes and are effective in changing rates or probabilities of disease only when coexistent with other complementary risk factors such that the aggregate does become a sufficient cause. It seems likely that for any given disease several of these sufficient causes, differing in the component causes which comprise them, will exist.
Risk factors as recognised by epidemiological investigations may not be independent entities. There is considerable evidence that some of them vary in respect of their presence, extent or direction according to the presence, extent or direction of other risk factors. Risk factors are thus not independently distributed. The likelihood of an individual being exposed to one risk factor will depend upon his or her probability of being exposed to others. The distribution of a risk factor in a population will depend upon the distribution of other risk factors. The complexity which this introduces into the study of risk factors is not eased by the observation that many of them are composite entities, recognised by a label corresponding to a frequent correlate, but having diverse functional components
Risk factors have in common not only participation in clusters of sufficient causes of disease and mutually interacting distributions in a population but the presentation to the body of certain common toxic or pathogenic influences. This is not incompatible with any specific effects which they may possess. Risk factors may interact both synergistically, by effecting processes complementary to those induced by other factors, and additively by adding, to varying extents, contributions to the same process.
Neither is the development of disease simply the satisfaction of a sufficient number of necessary biological steps. While any generalised model must be conjectural, it is at least reasonable to propose that the biology of disease is also multifactorial and that the pathway towards the ultimate catastrophe can follow a number of routes. Moreover, the detection of a potentially pathogenic state does not necessarily mean that it lies on that route pertaining in an individual.
What do we conclude from all of this? The observation that the aetiology of chronic disease is complex is hardly new, although perhaps not everyone appreciates the degree of complexity. What, in slightly more practical terms, does it tell the epidemiologist?
Firstly, it tells the epidemiologist that to consider risk factors to be independent, isolated and unique entities which can be studied without regard to others is not representative of the real world, however that may be conceived. Most epidemiologists would, if pressed, agree with this. A response that if all the strictures which I have discussed were acknowledged actively, serious research would be impossible is perfectly valid. It is necessary, however, to be aware of the limitations of interpretation of findings derived from the study of risk factors in isolation. It is also evident that the discussion here is based upon information obtained from the study of single risk factors and that their identification is a prelude to the integration of information which I seek. The important point is, however, that this is not an end in itself but a contribution to an epidemiological understanding of the origins of disease in individuals and populations.
The second, and more important, consequence is also related to the use of epidemiological findings in public health. The complex interactions which I have described place severe restrictions on efforts to calculate the morbidity or mortality attributable to a risk factor, whether this relates to probabilities in an individual or rates in a population. While, as I will suggest below, this is not an intractable difficulty, it does seem that simple calculations of aetiological fractions or attributable mortality may be not only meaningless but also misleading.
Observational epidemiology and multifactorial disease
If we accept the possibility that chronic disease is the result not of single risk factors but o of their clustering into sufficient causes, we need to consider how these will be perceived by the well established observational methods.
While case-control and cohort studies can be regarded as different methods, each with their strengths and weaknesses, of studying the incidence of disease in populations, the differ fundamentally when seen from one perspective. Case-control studies sample the population of cases and seek to determine distinguishing features of the people who bear them. Cohort studies sample populations of people of known characteristics and monitor the kinds of diseases which they contract. Using the terminology of the preceding section, case-control studies will thus tend to select the commonest types of sufficient cause while cohort studies will be biased towards those sufficient causes containing the component causes most frequently present and measured in the population under examination. In principle, therefore, a comparison of findings derived from the two types of study could provide some information on the range of component causes operative for a particular disease. In practice, however, it seems unlikely that the magnitude of difference would exceed the inherent variability in results.
The greatest limitation which such studies based on samples of a population pose for a study of sufficient causes is in recognising and examining all the component causes which operate in a population. Even if a satisfactory proportion of the total was known, it will seldom be practical to measure them all in a sampling study. How many studies on lung cancer, for example, include domestic radon levels amongst the personal variables measured? Excluding studies aimed explicitly at this variable, the answer is very few. There is also the possibility that hitherto unrecognised risk factors participate in sufficient causes. It is unlikely that unknown risk factors which would show a large magnitude of association if studied in isolation remain to be discovered for diseases which have been studied extensively. On the other hand, it is quite possible that risk factors with small independent effects on the individual remain undiscovered or neglected. If a number of these acted interchangeably as component causes in an appreciable number of sufficient causes they would be important in determining the total number of sufficient causes which were operative. An interesting example of this is air pollution which I will discuss in a future version of this site. While such pollution can be shown to be a risk factor for disease, its independent effect, as defined conventionally, is frequently small. This may be due in part to its widespread distribution and consequent difficulties in isolating its effect. There is evidence, however, that air pollution is a risk factor for lung cancer in smokers but not in non-smokers. One interpretation of this is that its effect is seldom sufficient to participate in the aetiology of lung cancer on its own but that it shares with smoking some biological effect which becomes significant in the combination of the two.
One means of taking all risk factors, or component causes, into account is the ecological study in which rates of disease are compared across several populations in relation to some measure, also made at the population level, of a factor of interest. The disease rate in an entire population is an automatic reflection of all the factors which operate upon it, be they known or not. As it is derived from the whole population and not from a sample of it, representativeness is not an issue and the rate is the true rate and not an estimate. Ecological studies are like any other in being subject to errors and inaccuracies. Standards of diagnosis and certification of deaths may vary between the units of population compared. Likewise, estimates of the risk factor of interest may vary and these may be less satisfactory in nature than those obtainable from individuals. Taking cigarette smoking as an example, measures of consumption per capitum will not take differences in smoking prevalence into account and vice versa.
Such studies have been criticised on the basis of the 'ecological fallacy' which states that the risk to individuals cannot be inferred from that in the population of which they are constituents. As my purpose at present is to compare rates or risks in populations, this stricture is not relevant. In any case, it can be argued that the logical pitfalls involved in deducing individual rates from a population are no greater than those involved in the induction of population rates from individuals which many epidemiologists feel able to do with the results of sampling studies.
The application of the ecological approach to the well established relationship between mortality from lung cancer and cigarette smoking thirty years earlier is shown in figure 6. The imposition of a lag period between exposure and apparent effect follows convention. The uncertainties and inaccuracies in such an approach will be discussed in the next edition of this website.

It can be seen that a positive relationship exists with the correlation within the data suggesting that variation in cigarette consumption accounts for around 60% of the variation in mortality from lung cancer. It is also evident, however, that cigarette consumption is not a good predictor of mortality from lung cancer. For example, it appears that while cigarette consumption in the United States of America was between three and four times greater than that in Italy, mortality rates from lung cancer thirty years later were closely similar. Conversely, although cigarette consumption was similar in New Zealand and the United Kingdom, lung cancer rates were 1.7 times higher in the latter than the former.
One explanation for such observations is that the relationship between smoking and lung cancer depends not upon smoking alone but is modified by other factors which may differ from country to country. It is of interest that some of the more obvious possibilities can be excluded. Thus, although New Zealand and the United Kingdom are geographically distant, the ethnic nature of the populations is similar and the type of cigarette smoked was probably, in the period in question, nearly identical. It can also be seen that rates of lung cancer and cigarette consumption in Canada and the United States of America are similar despite the inhabitants of these countries smoking cigarettes containing quite different tobacco blends.
We must not forget that the measure of exposure to cigarette smoking used is broad and imprecise. There are, nevertheless further examples in the literature of international comparisons which suggest that factors other than cigarette smoking can influence rates of lung cancer and ischaemic heart disease (Dean, 1961; Eastcott, 1956; Marmott et al., 1975). These will be discussed further in the next edition of this website.
Errors and inaccuracies in the data were blamed by Brown et al. (1994) for their failure to detect a relationship between measures of smoking and the prevalence of chronic obstructive pulmonary disease in a number of European countries. They did not discuss the possibility that that the underlying hypothesis which led them to expect a relationship might be inadequate.
A particularly interesting situation can be seen, in figure 7, in a comparisons of international rates of smoking and coronary heart. In this case, smoking prevalence is used as an index of smoking behaviour.

The first two graphs in figure 7 suggest that mortality from coronary heart disease in men may be related inversely to smoking prevalence but that in women a positive association may pertain. While this is interesting in itself, a more intriguing aspect of the data is seen in the second two graphs in which mortality from coronary heart disease in one gender is plotted against smoking prevalence in the other. Rather than showing a weakening of the relationships as would be expected if smoking in one gender was an imperfect reflection of smoking in the other, or no relationship if smoking prevalence in men was independent of that in women, the results suggest that smoking prevalence in one gender bears a similar relationship to mortality in both genders but that this relationship differs when male or female smoking prevalence is examined. While a number of explanations for this observation could be advanced, the simplest is that smoking in men reflects a different set of social and behavioural characteristics from that in women and that it is this set of characteristics rather than smoking itself which bears a relationship to the development of heart disease.
A not dissimilar observation lies in the data of Pettersson et al. (2004) who reported, from a comparison of data from mainland Scandinavia, a positive relationship between smoking in women and the incidence of testicular cancer. While these authors suggest that this represents a biological effect of smoking during pregnancy it is possible that this provides another example of smoking representing a particular set of environmental conditions in one gender.
I will summarise this section with the conclusion that ecological studies provide a particular perspective on the relationship between risk factors and disease by giving an indication of the influence of a particular risk factor in combination with all others which apply in the population. It does not tell us anything about the nature or number of these other factors but it does provide a measure of their overall effect which can be used in comparison with the results of sampling studies to stimulate further investigation in the origins of diseases in populations. It might even tell us, as the published observations on chronic obstructive pulmonary disease suggest, that a risk factor found to be important in sampling studies may have little significance at the level of the population or, as in examples shown in figure 7, that the risk factor of interest may actually be merely a label for those factors which have real relevance.
Confounding
The immediate, and correct, response of most epidemiologists to the preceding section will be that proper control of confounding variables can isolate the effect of a risk factor of interest from those which surround it in the population. In the present context of a multifactorial world, however, we must ask whether confounding retains the meaning which it has in observational epidemiology.
Simply defined, a confounding factor is an influence in a population which is associated in prevalence with the risk factor of interest and which is involved in the aetiology of the disease under investigation. Both smoking and alcohol consumption are risk factors for cancers of the head and neck and the prevalence of smoking tends to increase or decrease with the prevalence of drinking. Thus smoking and drinking are mutual confounders in the epidemiology of cancers of the head and neck. If the population under investigation contains sufficient numbers of teetotal smokers and non-smoking drinkers then the risk associated with each can be determined separately and the values obtained used to apply an arithmetic correction to the values of risk obtained in those who both smoked and drank. In practice, the methods used in correction may extend beyond simple arithmetic into mathematically complex regression procedures depending upon assumptions about the distribution of data which may or may not be satisfied in any given case.
This is fine as far as it goes, but can we really apply findings from the simple world of individual risk factors to the multifactorial dimension in which so many factors are related to so many others? Consider the situation where an epidemiologist chose to deal with the presence of confounding factors, in what is arguably the ideal manner, by censoring his data to exclude all cases where subjects were exposed to known risk factors other than that of interest. This would deplete his or her carefully garnered population considerably. The extent to which this can occur is seen in figure 8.
This is based upon the NHANES 1999-2000 population, as used in the exercise on cluster analysis, and shows the relative diminution of the number of smokers in the population as those exposed to other risk factors are successively removed. It is apparent that simply removing those smokers in the top quintile of alcohol consumption reduces the number of smokers by half and that when those exposed to a further five risk factors are excluded less than 10% of the original population remains.
This diminution suggests that correction for confounding by exclusion might well lead to problems of loss of statistical power and precision, but this is not the real difficulty. What this example demonstrates is that as correction for confounding proceeds, the results are applicable to a progressively smaller proportion of the population. Thus, if some estimate of a risk associated with smoking was calculated after adjustment for all the factors included in figure 8, it might indeed be an accurate estimate of the independent effect of smoking, but it would be applicable only to around 5% of smokers in the actual population. Like it or not, smokers are exposed to other risk factors and the risk they experience will be a function of all of them. Corrected estimates of risk are thus an abstraction with but limited relevance to the real world.
While the foregoing was a simple, practical, example of the divergence between the real world and that of conventional epidemiological practice, the problem is also evident at the theoretical level. Correction for confounding is in essence the application of what philosophers call a counterfactual argument. A counterfactual argument, as the name suggests, proceeds along the lines of 'what would occur in a world where the observation facing us did not arise?' In other words, what would happen if smokers did not drink or were not exposed to other risk factors? A counterfactual argument applies, however, if, and only if, everything else remains unchanged. Given that risk factors are not distributed independently in the population and that they may share biological effectors and mechanistic effects, then removing one or more of them from a set is unlikely to leave the remainder unchanged. While this may seem simply an assertion on my part, in the section on cluster analysis the issue can be addressed in part by comparing the distribution of risk factors other than smoking in smokers, former smokers and never smokers. It is apparent that these three categories differ in more than smoking status. This represents some evidence, albeit incomplete, that setting smoking aside does not leave everything else unchanged and hence that the use of a counterfactual argument is invalid.
In conclusion, and at the risk of repetition, correction for confounding provides abstract estimates of risk with limited relevance to the real, multifactorial, world. Such estimates may have value or interest in themselves, but they are of limited value in understanding the influence of risk factors for chronic disease in the world around us.
Discussion
I have now outlined my reasons for suggesting that the identification and characterisation of risk factors is insufficeint to provide a full understanding of the origins of chronic disease in populations. That is not to say, however, that this application of epidemiology, which has kept many workers busy for half a century, is either wrong or inappropriate. Were it not for such activity, I would not have the information which suggests to me that a new approach is desirable. Rather, it is a matter of perspective and objectives. The pursuit of individual risk factors was, and remains, a necessary endeavour and I am simply seeking to use the information obtained to achieve a different level of understanding.
I have proposed that risk factors for disease cannot, and should not, be regarded as independent entities. This is despite the efforts of those who have described them to establish that they are so. Risk factors, as observed traditionally, are a heterogeneous collection of labels for diverse human activities and exposures to substances with the potential for deleterious biological effects. They vary too in the extent to which their modes of apparent action are understood although, paradoxically, efforts to refine understanding in this respect may obscure their mode of action.
Smoking is, statistically, a risk factor for cervical cancer. Is this because smoking is correlated in behavioural terms with forms of sexual behaviour which increase exposure to the strains of the human papilloma virus which seem to be a factor in many cases of the disease? Alternatively, does smoking, through some more direct effect, increase the likelihood of infection by the virus when exposure occurs. Or, does an altered immune status in smokers alter the likelihood that an infection will prosper to the extent that the integrity of the cervical epithelium is compromised. Perhaps a slightly higher inflammatory status in smokers promotes the progression of preneoplastic lesions towards invasive carcinoma while some angiogenic effect of smoking may have a similar effect. It is likely, moreover, that there are further possibilities which we have not thought of. But wait. What other behavioural correlates exist. Are smokers more or less likely to use contraceptives? Are they more or less likely to use physical methods which might reduce the risk of infection or to use hormonal contraceptives which might promote or inhibit carcinogenesis. And what about alcohol. Smokers are more likely to be drinkers than non-smokers. Are inebriated partners more or less likely to indulge in sexual intercourse or to use contraceptives when doing so? Are there relevant physiological, as opposed to behavioural, effects of alcohol?
I could continue for pages in this vein, but I hope that I have made my point. If we want to know what determines the risk of cervical cancer in a population we need to look at the behavioural, toxicological and pathophysiological factors involved as a whole and not partition risk between risk factors which are delineated not because they are discrete entities in some functional respect but because they are relatively easy to recognise and measure in populations. This is obvious. Less obvious is how it should be done.
Perhaps the question can be approached by a looking at risk factors in a different way and at different levels.
I have already referred ad nauseam to correlations in prevalence between risk factors. I pursue this elsewhere with a cluster analysis of one population which demonstrates how risk factors cluster or dissociate and with a simple numerical model intended to show that such clustering can lead to a transfer of apparent risk between different factors. We need to understand, however, why and how risk factors, as they are provisionally designated by observational epidemiology, assort themselves. Behavioural and social epidemiology are required to develop a better knowledge of how descriptive risk factors co-exist in populations. Given this, we can begin to assemble the existing knowledge in a more informative way.
Beyond this, however, we need to find a new basis for describing risk factors. Otherwise we will simply be shuffling the data and putting it into new piles. In considering risk factors the emphasis must shift from factor to risk, or more properly, hazard. I have discussed the likelihood that many risk factors present common biological effectors, such as oxidative stress or pro-inflammatory properties and suggested in figure 5 that the components of a sufficient cause can, in principle, be rearranged to represent components of biological effect. Can this be done now, or do we need to start from the beginning? The extensive literature of toxicology and molecular biology is already sufficient to allow a provisional approach to the problem and to begin to consider the biological challenges facing individuals and populations in such terms as genotoxic, of inflammatory or psychologically stressful lifestyles. Information already exists too to introduce this to observational epidemiology by the more widespread use of biological markers for such effects. Epidemiologists have sufficient experience to deal with the practical problems of somewhat more invasive interrogation of their subjects and the fortitude to deal with phials of blood as well as files of questionnaires and pails of urine as well as piles of data.
We must not forget of course that risk factors can act both to increase and reduce risk. Thus consumption of fruit and vegetables may protect against disease just as consumption of dietary fat or meat may promote it. We also need to acknowledge that smoking, regarded by many as the nastiest risk factor of them all, appears to be protective against Parkinson's disease, endometrial cancer, ulcerative colitis and pre-eclampsia in pregnancy. This must be informative at some level in our examination of biological effectors. Other observations need to be taken into account. As a generalisation, smokers have lower levels of blood pressure than smokers yet both smoking and high blood pressure are risk factors for cardiovascular disease. Would the apparent risk associated with smoking be even greater in the absence of this minor anti-hypertensive effect or is there a more subtle explanation. At a time when obesity is regarded as an increasing risk to populations, observations of lower rates of obesity in smokers need to be considered. Moreover, there is some evidence that obesity increases the risk of a particular type of cancer, adenocarcinoma, at various sites. The next edition of this website will discuss this in relation to changes in the histological types of cancer at various sites associated with smoking.
Such provisional relationships between the biological effects of risk factors will be raised to a higher heuristic status by the impressive progress currently being made in molecular pathology. We might, provisionally, choose genotoxicity as a biological effector of importance in the development of cancer and other chronic diseases. Will it be necessary, however, to divide this broad phenomenon into its components? It is likely that we will need to distinguish between mutagenicity, the forms of chromosome damage which lead to loss of heterozygosity, hypermethylation of gene promoters, acetylation of histones and other ways in which the orderly translation of genetic information into physiological organisation can be disrupted. Likewise, are increased levels of C-reactive protein, tumour necrosis a or interleukin-1b equally informative of an inflammatory change or does each, together with the many potential markers which I have not mentioned, tell us something different.
The type of approach which I am advocating is clearly not going to be easy. It is, however, necessary for an adequate appreciation of the risks which we all face and hence for the development of more precise means of ameliorating them.