Observational Nutrition Research is Valid

Picture of Shaun Ward

Shaun Ward

Founder & Author at MyNutritionScience

Author Page

Study design plays an important role in the quality, reliability, and interpretation of scientific research. On evidence hierarchies, experimental studies tend to be considered the highest quality, most reliable, and best able to answer causal questions. However, in nutrition science, there are numerous reasons why experimental studies are often unfeasible: whether that’s a lack of financial support, ethical issues, or low participant adherence, to name just a few. For this reason, observational studies (otherwise called epidemiology) are an important branch of scientific research that affects opinions about nutrition and undoubtedly influence public health policy. In contrast to experimental studies where researchers introduce a dietary intervention to a specific group of people and study the effects, though, observational research simply analyses the distribution (frequencies and patterns) and possible determinants (causes and risk factors) of health-related outcomes in more natural contexts, without intervention. Identifying and quantifying the relationship between a dietary exposure (such as a nutrient, food, or diet) and an outcome of interest (such as all-cause mortality) is its main purpose. 

There are 3 primary types of observational studies that are utilised in nutrition research:

  • Cross-Sectional Study – In a cross-sectional study, a population sample is enrolled, and at least one exposure and outcome are measured simultaneously. The purpose of this study design is to assess what dietary factors may have contributed to the prevalence of a health outcome at a single point in time.
  • Case-Control Study – Case-control studies are similar to cross-sectional studies but they make use of a control group. Investigators enrol two groups, one with the outcome of interest (cardiovascular disease, for example) and one without the outcome of interest (no cardiovascular disease, for example). The purpose of the study is to identify differences in dietary exposures between groups, which are then thought to possibly contribute to the outcome difference between groups.
  • Prospective Cohort Study – A prospective cohort study is the gold standard of observational research. Unlike the other two observational study designs, prospective studies look ahead in time rather than backwards, measuring real-time changes in exposures and outcomes. Participants are usually categorised into different levels of a dietary exposure (i.e. levels of meat intake) to see how that level of exposure associates with the prevalence of an outcome compared to another level of dietary exposure. Unlike experimental studies, though, the investigators observe rather than determine the participants’ dietary exposures.

 

Criticism of Observational Nutrition Research

What’s abundantly clear when listening to some discussions about nutrition science, however, is that observational is heavily criticised. I don’t just mean criticised by your average Joe either; there are many health professionals that join the epidemiological hate train and see it as inherently uninformative. A popular quote in the British Medical Journal stated that “definitive solutions [in nutrition] won’t come from another million observational papers”. Consequently, this type of thinking presents a huge barrier—maybe the largest—for communicating nutrition science and identifying what dietary factors are harmful or helpful, and why. No matter what observational study one might use to support their position on a given question, it’s only a matter of time before someone says that “it’s only observational research” or that “correlation is not causation”, followed by demands to present real evidence in the form of experimental research. But are these criticisms valid, or is there more to the story? This article directly addresses the 4 major criticisms of observational research and argues they are largely overblown:

  • “Observational studies do not distinguish between causal and spurious associations”
  • “Confounding factors cannot be entirely accounted for, even by the most sophisticated methods”
  • “The effect sizes in observational research are too small”
  • “Dietary assessment methods are inaccurate”

“Observational studies do not distinguish between causal and spurious correlations” and “Confounding factors cannot be entirely accounted for, even by the most sophisticated methods”

Let’s address the first two criticisms. I’m sure you’ve heard the phrase “correlation is not causation” before. As it sounds, it simply means that if two things are correlated or associated, as might be reported in an observational study, this doesn’t necessarily mean that one causes the other. Association without causation can have multiple explanations. The primary reason, though, is that observational studies don’t always consider or appropriately account for external factors (covariates) that may have influenced the outcome in question. If some external factors causally relate to the outcome, and these factors are unequally distributed between study groups, then the reported correlation or association between a dietary exposure and an outcome is said to be confounded. The external factors are called confounders, which mix and distort the exposure-outcome relationship we’re interested in by at least partially accounting for the measured association.

Experimental trials called randomised controlled trials (RCTs) are thought to bypass the issue of “correlation is not causation” by including randomisation into the research design. Randomisation is when participants are randomly assigned to an intervention or control group by chance, with the purpose of equally distributing covariates between groups. This allows researchers to more easily attribute group differences in an outcome to the difference in dietary exposure between groups. The difference in an outcome between groups is still an association, by definition, but the association is more able to be interpreted as a causal effect because of the presumed absence of confounding or alternate explanation. 

But due to the absence of randomisation in observational research, inferring causal associations from this research design is often doubted. Some might say impossible. Without appropriate statistical methods, any exposure-outcome association can arguably be explained by an unequal distribution of confounders between groups. Healthy user bias is a clear example of this in nutrition science. Since the quantities and types of foods eaten are closely related to other health-related behaviours (dietary or lifestyle), some people might be quick to claim that we don’t know if some other factor, known or unknown, was responsible for the outcome. For example, if red meat is found to be associated with higher cardiovascular disease risk in observational studies, one could argue that groups eating more red meat might also be eating more food in general, or smoke more, or be less active, or drink more alcohol, than groups that eating less red meat. So how do we know it was the meat and not something else?

This might sometimes be a good question to ask, but well-conducted observational studies are not blind to it, and often answer it when analysing their data. Although there are certainly issues with how a lot of observational research is conducted, good researchers use numerous methods to limit (and ideally eliminate) the influence of confounders on exposure-outcome associations. And if they do, refusing to accept the studies results because of other explanatory factors is not a legitimate criticism of the research design.

One standard practice in observational research that aims to eliminate the interference of external factors is the process of statistical adjustment during the analysis phase. I have an article dedicated to explaining the technicalities of statistical adjustment if you’re interested, but in brief for the sake of understanding in this article, statistical adjustment is a process whereby researchers control for the effects of known confounders. Controlling for a confounder can be a hard concept to grasp, but it essentially means that these variables form part of a statistical model such that confounding can be estimated and statistically removed from the relationship we’re interested in. Thus, the goal of statistical adjustment is to yield an estimate of the exposure-outcome association closer to the true value than when the confounder(s) is ignored. I strongly recommend reading my own article to know how this is done mathematically. Or, if you’re interested in a lengthy deep dive, check out a lengthy yet brilliant 4-part series of lectures by Jonas Peters, a Professor of Statistics at the University of Copenhagen. Both of these sources of information should provide you with more confidence that statistical adjustment is indeed a fair and validated process, at least when done correctly.

One part of my work that is worth raising here, though, is that inferring causality from observational research is dependent on knowing what the confounders of a dietary relationship are. What confounds the effect of red meat on cardiovascular disease, for example? Some people might say it’s a guessing game and that researchers will adjust for as little or as much as possible until they calculate the result they want, but I think this is a clear exaggeration of the truth. While it’s true that researchers must have adequate background information on a particular topic to pinpoint confounders, and confounders are not verifiable by the data itself, this isn’t a nonscientific ask and it’s not a process that should be frowned upon. Researchers are more than capable of reviewing the relevant literature to identify causal relationships that must be adjusted for, and they’re more than capable of defending their statistical models when questioned. Some degree of uncertainty will always remain, whether that’s due to an unmeasured known or unknown confounder, but that’s okay. Science is ever-evolving and we can only go by current knowledge, not what is yet to be discovered. We must begin to embrace uncertainty rather than demonise it. Less certainty compared to RCTs is not a reason to invalidate observational nutrition research; rather, it’s a reason to consider the findings within the wider context of the literature. 

Also, we should appreciate that confounders in nutrition are almost identical to confounders related to other lifestyle behaviours that people have no issues inferring causality for, such as smoking, alcohol, and physical activity. This often leads to very obvious contradictions in people’s lines of reasoning. The only way to escape such contradictions is to present an inherent complexity for observational research in nutrition that doesn’t apply to other lifestyle behaviours; otherwise, there is no sufficient difference between nutrition and other behaviours to reject the validity of observational research for one and not the other. Inherent nutrition issues may involve the unique complexities of dietary composition—different types and forms of nutrients being available in a plethora of foods—and the interaction of dietary constituents—nutrients acting differently in the presence or absence of other nutrients. Another inherent issue may also lie with the difficulties of dietary data collection (measurement error; to be discussed later). But again, greater complexities doesn’t equal invalidation; otherwise, you could argue for the invalidation of almost any lifestyle behaviour relative to the least complex. There has to be a more logically consistent reason why nutrition is treated differently. Moreover, even if someone held the position that nutrition is too complex to draw inferences from observational data, nutrition-specific complexities largely vanish when whole dietary patterns are analysed rather than isolated dietary components. If someone isn’t comfortable teasing out causal associations for individual nutrients, for example, given the multitude of independent nutrient interactions and synergies, then tell them to embrace these interrelationships and consider the overall dietary pattern as the exposure. It’s simply unjustified to write off the entire field of observational nutritional research blanketly.

To solidify why I think it’s unjustified, I’d present that there’s great evidence highlighting the convergence between observational and RCT results across nutrition sciences. Moorthy et al. was the first to highlight this, analysing the concordance between meta-analyses of RCTs and observational studies for diet-disease outcomes and finding that 23 of 34 (68%) comparisons between study designs had effect estimates in the same direction, with no statistically significant disagreement between research designs (z score not statistically significant). A similar yet larger review by Schwingshackl et al. built on this analysis recently, too, including a larger sample of diet-disease outcomes (83) and utilising more rigorous statistical examination. They found that of the 83 diet-disease outcome pairs included in the meta-analysis, 57 (69%) of the results between study designs were similar but not identical, and 26 (31%) were broadly similar. RCTs had slightly different effect estimates compared to cohort studies (a 9% difference, on average) when all diet-disease outcomes were assessed, and slightly less heterogeneity, but differences were explained not by an inherent flaw in a studies design, but actually differences in study populations, exposures, comparators, and defined outcomes. Therefore, even if somebody has their doubts about observational research, they must at least acknowledge that their results are similar to more robust research designs far more often than not. As stated by the authors of the latter study, “These findings could help researchers further understand the integration of such evidence into prospective nutrition evidence syntheses and improve evidence based dietary guidelines.”

“The Effect Sizes in Observational Research Are Too Small”

Let’s move on to the next criticism of observational research. That is, that many of the reported effect sizes–a measure of the strength of an effect–are too small to be considered meaningful. For example, even if a group with a high intake of a particular food are found to increase their all-cause mortality risk by 5%, one might say that result is too small to be taken seriously. This criticism may sound odd at first, as obviously there’s technically just as much chance an effect is small as there is large, but the core of this argument really boils down to statistics. What people mean by “too small effect size” is that the effect size is not sufficient to rule out the possibility of a spurious association due to residual uncertainty and confounding. In other words, small effects can be more easily explained away by unknown or unmeasured factors that have been overlooked during the analysis phase. It’s hard to explain away a very large risk increase for smoking and lung cancer, for example, whereas it might be easier to explain away a very small risk increase for potatoes and diabetes.

Now, there are partial truths here, but the issue with this thinking is twofold. First, although greater effect sizes can increase certainty that an association is of a causal nature within the context of the same disease, the magnitude of effects are completely dependent on the baseline prevalence of the disease. Any type of exposure has potential for relatively larger effects for diseases with relatively low baseline prevalence (i.e. lung cancer vs CVD) because, per person who gets the outcome, the relative difference in risk is higher. Considering this, as pointed out by Cohen et al., terms such as ‘small,’ ‘medium,’ and ‘large’ effect sizes are always relative, not only to each other, but also to the specific topic. The interpretation of an effect’s magnitude and importance depends on the research question and always requires an expert judgement, not to dismissed in favour of ones presumption about what constitutes a small effect.

Second, effect sizes should always be considered alongside the statistical confidence in the effect. As explained in our article on statistical inferences, the confidence interval is a fundamental tool that measures the degree to which an effect estimate is subject to variability. It’s composed of a lower and an upper bound, and a range of possible values in-between these bounds where the true effect lies 95% of the time (given a confidence level of 95%). Therefore, it’s actually possible to have greater confidence that a small effect estimate is of a causal nature than it is for a larger effect estimate, provided the confidence intervals are narrower for the small effect estimate than the larger one. Essentially context is everything, and blanketly ruling out observational findings due to small effect sizes is not a legimitate reason in and of itself.

“Observational dietary assessment methods are innaccurate”

Let’s now touch on the final main criticism of observational nutrition research: that measurements of dietary intake in observational studies are too difficult to take the results seriously. I’ve discussed this at length in my article about dietary collection and error, but I’ll discuss the main points here too. The first point is that while I disagree that it’s “too difficult” to measure dietary intake, measurement error is expected for numerous reasons:

  • Diet is a time-varying exposure. Most people have access to 100’s if not 1000’s of different foods, each of which is consumed in varying proportions, quantities, and combinations over time.
  • Self-reported dietary data is subject to systematic error, and is only as accurate as the questions used to gather information, and the memory and honesty of answering participants.
  • Social approval or desirability can lead to purposely misreporting dietary intake.

Observational dietary assessment methods clearly have a component of error for these reasons. My issue is that the magnitude of error is typically overblown by criticisers of observational research. Even if we take the most criticised (yet popular) dietary assessment, food frequency questionnaires (FFQs), for example, reviews have shown them to have acceptable validity when compared to weighed food records. More recent studies using better-designed FFQs have even shown similar precision levels to 7-day dietary records. FFQs also have good reproducibility, meaning there’s consistency in reported dietary intake for the same subject at different time points. Cui et al. conducted a meta-analysis to systematically assess the reproducibility of FFQs and found that correlation coefficients exceeded 0.5—at least a moderate association—for energy and most nutrients in general heathy populations. It’s now also common for researchers to couple dietary biomarkers (for energy and nutrient intake) with FFQs to provide a more objective dietary intake measure with less bias. Using both FFQ’s and biomarker data allows for triangulation methods to obtain improved dietary intake estimates, and further validates FFQ as a reliable tool for measuring dietary intake.

And again, the above is only discussing the validity of arguably the worst dietary assessment tool in observational research. Another common dietary assessment method that is gaining popularity are 24-hour dietary recalls. These involve a subject reporting all foods consumed in the previous 24 hours (or calendar day), possibly multiple times over the course of a study, to a trained interviewer. There is still measurement error due to the reliance on participant memory, but skilled interviewers can produce highly detailed and useful nutritional data comparable to the gold standard of food records. The USDA even introduced a five-step multiple-pass method in 2002 for 24-hour recalls, which helps to minimise the omission of foods and aids how participants report portion sizes by using visual aids. This particular method has been found (in validation studies) to agree reasonably well with actual intake assessed by direct observation (r=0.57).

But let’s forget the above and assume that there is always a significant measurement error, regardless of the dietary assessment tool used. Does this invalidate observational nutrition research? No. Reason being that causal associations are reliant only on establishing contrasts in dietary intake among a population, not on precise estimates of absolute intakes. So even if one argues that FFQs or dietary recalls are not very accurate, so what? Error estimates don’t necessarily interfere with establishing causal associations. Analysing exposure-outcome relationships is still more than possible in the presence of measurement error. As stated by Beaton et al., “There will always be error in dietary assessments. The challenge is to understand, estimate, and make use of the error structure during analysis”.  The science supports this assertion, and opposing statements do little to promote constructive dialogue and advancement in the nutritional field. 

Please refer to my article on dietary collection if you’re interested in my response to other criticisms, such as the unequal distribution of measurement error between groups, and the adjustment for energy intake (isocaloric statistical modelling) to help to reduce the effect of misreporting. Overall, though, we can say that although measurement error exists, it doesn’t invalidate observational findings. These are all considerations that researchers are well-aware of, processes are in place to minimise such error, and measurement error doesn’t necessarily have a meaningful influence when analysing diet-disease relationships. 

Final Thoughts

Despite the criticism, observational research is very much part of the framework for establishing causal inferences in nutrition and guiding public health policy. I hope to have defended this position further in this article by responding to the main criticisms, citing relevant research when necessary. Although observational research is by no means a perfect science, it isn’t meant to be, and it doesn’t have to be. Its utility holds. A great quote from Professor Miguel Hernan is that “epidemiology is useless, it can only give you answers”. Future observational studies should certainly attempt to improve on methods in design, analysis, and presentation, with better consideration of errors in measurement and potential biases. But until then, don’t lose sight of the forest and become confused by the trees.

If you’ve enjoyed this article and want to support My Nutrition Science and future content, please consider donating to us. We massively appreciate all donations, as it keeps the website running and the content flowing. Please also sign up to our email list below to be notified of future content.

SUBSCRIBE TO EMAIL LIST

Related Deep Dives

Research Skills
Screenshot 2022-01-17 at 16.05.56

Nutrition Misinformation: How to Spot a Quack

Research Skills
Screenshot 2021-12-30 at 16.20.41

Statistical Inferences in Nutrition: P-Values, Point Estimates, and Confidence Intervals

Disease Prevention
Screenshot 2021-11-27 at 15.59.15

Merging Reductionism and Holism in Nutrition