Dietary Collection, FFQ’s, and Error

Picture of Joe Mclean

Joe Mclean

Writer at My Nutrition Science

Our Team
Picture of Shaun Ward

Shaun Ward

Founder and Writer at My Nutrition Science

Our Team

For nutrition science to evolve, it’s essential to address the challenges involved in conducting nutrition research. Addressing these challenges speeds the advancement of scientific knowledge, clarifies contentious topics, and ultimately improves our understanding of the relationship between health and food. One common criticism of nutrition research, however, is that it relies too heavily on observational data, riddled with supposedly unreliable dietary assessment methods. Many people argue that the degree of measurement error when collecting information about dietary intake is so large that it renders a lot of nutrition studies as meaningless and untrustworthy. You may have already me briefly address this issue in our observational research is valid article, but this article addresses claims about dietary collection more extensively.

Measurement Error is Unavoidable

If you’re unaware, measurement error is the difference between the observed value and the true value. It’s unavoidable and is an inherent limitation for all fields of research, and is prevalent with all measurement instruments, from self-reported instruments to the most advanced clinical machines. Even the most accurate blood pressure monitors, for example, despite their great deal of accuracy and validity, can suffer from measurement error due to human error, faulty equipment, or the commonly observed ‘white coat effect’, describing a higher than usual reading due to patient stress and anxiety. Of course, nobody would use these as reasons to ignore the relationship between blood pressure and disease, but they’re factors to consider. Ultimately, though, the observed relationship is probably still robust enough to overcome any degree of doubt about the magnitude of measurement error. I say this example because although it’s a seemingly obvious, I can’t help but feel critics of nutrition research often overlook this simple point. Rather than just simply claiming “measurement error invalidates the results” as if it’s an argument in and of itself, critics should extend on this, outlining at what point the degree of measurement error invalidates the results of any particular study, and why. Measurement error doesn’t grant the rejection of scientific results unless there’s a succinct and logically consistent reason to do so. Otherwise, we’re left randomly claiming some studies are valid and others not, leaving our personal biases to roam freely and interpret as they see fit.

Types of Measurement Error

There are different types of measurement errors, each of which leads to different considerations when interpreting research and assessing the quality of the data. The two common types of measurement error are categorised as either ‘random’ or ‘systematic’.

Let’s clarify random error first. A random error occurs by chance and cannot be predicted (often termed “chance fluctuations”). Related to diet, random error explains the random variations in dietary intake over time, given that most people do not consume the same foods and quantities every day. Random error sounds a bit more complicated than it is, though, and it certainly is acknowledged when planning a nutrition study–at least it should be. This is possible because random errors average out closer to the true value with time as repeat measures are taken, as per ‘the law of large numbers’. We can actually illustrate random error on a normal distribution curve: although taking one isolated measure biases a wide range of observed values due to random error, when a measurement is collected repeatedly, the extremities of a normal distribution curve narrow, and the observed value becomes increasingly more representative of the true value. Take measuring your body weight on a scale as a real-world example here. Although your bodyweight fluctuates somewhat randomly around the mean value day-to-day, obtaining your true bodyweight requires calculating the average of different weigh-ins across multiple days. The same logic applies to dietary collection. Many authors, such as Bennett et al. have explained how accounting for random error can provide a low-bias estimate of true dietary intake. Further, note that some dietary collection methods inherently reduce random error by asking for average intakes over long periods (i.e. the previous 6 months), which lowers the chance of random error compared to collecting data from the previous 24-hours, let’s say. Again, using real-life as an example, if you wanted to know your friends average daily calorie intake, you would probably ask them what they a typical day of eating looks like over the past months, not just what they ate yesterday. Doing so increases the risk of an off-finding that doesn’t represent the norm.

Next up, let’s talk about the second type of measurement error called systematic error. This differs from random error as systematic error means the observed value is consistently different from the true value in a predictable and non-random fashion. For example, many studies have found that elderly and patients with obesity tend to misreport their dietary intake compared to younger and non-obese populations. Systematic errors tend to be more serious than random errors for this reason, because they do not average out closer to the true value over time, even when repeat measures are taken. However, this error type is still more than capable of being considered and adjusted for during the analysis phase of a study. This is because measurement instruments that cause a moderate to large amount of systematic error are often validated against other measurement instruments with low amounts of systematic error (the “gold standard” instrument) in what are known as calibration studies. By knowing how much one dietary collection method consistently differs from the “gold standard”, at least for the population under study, then the measurement differences can be adjusted for during data analysis. Data adjustment therefore partly overcomes known systematic error and seeks an observed value closer to the true value. For example, if we’re collecting dietary information from elderly participants, we can predict that much of the population don’t report the nutritional content of dietary supplements, and so the predictive error is added to each participant’s self-reported intake. Or, as populations with obesity tend to underreport their food intake relative to non-obese populations by x%, then adjusting for food intake in populations with obesity by x% is possible. To be clear, though, I’m not arguing that accounting for systematic leads to 100% accuracy; rather, that accounting for systematic error, when done correctly, is sufficient to overcome worries about measurement error invalidating nutrition findings. As stated by Beaton et al., “there will always be an error in dietary assessments. The challenge is to understand, estimate, and make use of the error structure during analysis”.

Validity of The Almighty FFQ

I’m sure you can forgive me for skipping over dietary collection methods in clinical studies where the researchers are literally observing what and how much the participants are eating. Although some amount of measurement error is still likely apparent, I’ve never heard anyone question the validity of this dietary assessment method, and rightly so. If you think this, then I nor anyone else can help you! The more important conversation pertains to nutrition epidemiology that tries to collect dietary information from participants that are not being directly observed, and are living in free-living conditions. Here, the method of collecting dietary information is typically via self-reported measures such as food frequency questionnaires (FFQ), food diaries, 24-hour recalls and diet history records. These dietary collection methods, particularly FFQs, have undeniably been central in epidemiological research and our understanding of most diet-disease relationships, hence why critics of nutrition science pick one of their many battles here.

Weighed food diaries are universally considered the ”gold standard” of dietary collection because rely little on participant memory, but due to the burden of this method (time-consuming and requires highly motivated individuals), other collection methods such as FFQ’s are implemented far more often. First created by Walter Willet and later improved upon by other experts, the FFQ is a semi-quantitive questionnaire designed to measure dietary patterns. They’re relatively easy for participants to understand, cheap to produce and distribute, and require lower administrative time than other methods.

But FFQ’s are criticised tremendously for a couple of reasons. People may argue that FFQs are flawed because they either do not provide results similar to gold standard methods (i.e. poor accuracy and systematic error), and/or because participants largely guess or misreport their normal dietary intake (i.e. poor reproducibility and random error). Luckily, though, we have numerous validation studies to test these claims.

First, reproducibility. To assess this, validation studies administer FFQs at two points in time to the same group of people. The correlation coefficients (or some other association test) are then used to assess the association between the two responses. Do people report wildly different intakes, or are frequent reports largely consistent with one another? The largest study of its kind that tests this is by Cui et al.. The authors conducted a meta-analysis of 123 studies that analysed the within-person reproducibility of FFQ data, comprising 20,542 participants. Based on the data analysed, the pooled correlation coefficients ranged from 0.50-0.80 and 0.50-0.72 for macronutrients and micronutrients, respectively. For reference, correlation coefficients are measured on a scale of 0 to 1, and above 0.5 is considered a moderate association and a reliable tool to measure dietary intake. Thus, the best meta-analysis in the area clearly indicates that dietary intake estimates from FFQs are not “completely random”, as some might claim, and people do not “just guess” based on what they can remember at the time of FFQ completion. There is moderate to large reproducibility in dietary intake with most nutrients and foods.

Next, accuracy. The most notable mention here is a comprehensive study by Yuan et al. with the Women’s Lifestyle Initiative Validation Study. This measured the performance of a semiquantitative FFQ, 24-hour dietary recalls, and 7-day dietary records against repeated dietary biomarker measurements (with the latter measuring ‘true intake’). 627 women took part in the study over a 15-month period, and the different dietary methods were used at least several weeks apart and in random order to avoid artificially high correlations. The results were insightful. Compared to repeated dietary biomarker measurements, FFQs had greater overall validity than 24-hour dietary recalls at the end of the data-collection year, alongside similar validity to 7-day dietary records that unsurprisingly showed the highest validity. When FFQs were adjusted for total energy intake, they also provided reasonably valid measurements for most of the individual nutrients assessed in the study, consistent with earlier conclusions derived using 7-day dietary records as the comparison method. The authors concluded that “multiple days of weighed diet records provide an optimal assessment of the dietary factors evaluated in this study, but for most nutrients, the validity of the SFFQ was only modestly less than that of the diet record”.

Again, though, I’m not trying to suggest FFQs are a perfect tool. The accuracy of information compared to weighed food records is often very dependent on the nutrient in question. For example, Cantin et al. and Steinemann et al. tested FFQs against multiple-day food records. In the former study,  2 FFQs and a 12-day dietary record were collected in a crossover fashion from 53 adult participants to measure dietary intake of individual nutrients over 1 month. The results showed that FFQ presents good reproducibility for energy and most nutrients since only 3 nutrients (vitamin B1, vitamin D and folate) out of 33 yielded correlations lower than 0.5. The mean correlation coefficient for energy and all nutrients was 0.63, indicating a moderate to large association. In the latter study, however, a greater degree of variance between nutrients was found compared to Castin et al. Participants completed both an FFQ and a 4-day food record within a period of 4 weeks, and the FFQ was found to significantly overestimate the absolute intake of some nutrients and foods, and significantly underestimate the intake of others. However, what’s worth noting is that, even here, the nutrients and foods reporting the greatest accuracy tended to be those of greater interest in most epidemiological studies: protein, fruits, eggs, meat, sausage, nuts, salty snacks, sugar-beverages, and alcohol. 

Finding Meaningful Results in the Presence of Measurement Error

Possibly the most crucial point of this entire article, however, is that even if a fair amount of measurement error is present with FFQ’s, reliably analysing exposure-outcome relationships is still possible. This point can be hard to understand. As an easy example, hypothetically let’s say that everyone in a sample population misreported their dietary intakes by ~20% from the true value. Now ask yourself why this matters? Does this mean the results are invalid? No. The reality is that the measurement error will probably have little effect on the true association between exposure and outcome because the degree of inaccuracy is likely distributed equally among the population studied. Error estimates are not necessarily interfering with establishing causal associations in this regard. They might do, but they probably don’t.

This is relevant because popular criticisers of FFQ’s such as John Ioannidis put across strong arguments that studies such as the National Health and Nutrition Examination Survey produce incompatible energy intakes for two-thirds of the participants. But this is far less important than it sounds. First, the argument pertains only to energy intake and not to the wide range of foods and nutrient assessed (we just covered the FFQ validity for most nutrients in the prior section). Second, the purpose of FFQ’s is rarely to precisely measure absolute values; rather, the purpose of FFQs is almost always to establish contrasts of food and nutrients intakes among a population. To group together higher levels of intake versus lower levels of intake. It is the contrasts of intake that are required to tease out the important associations between foods, nutrients and outcomes, not the absolute values.

Plus, the strange part is that flawed self-reported energy intake estimates actually help to statistically adjust for the measurement error of other self-reported dietary constituents. This is because the error in energy reporting is correlated with error in the reported dietary constituents. There are examples of BMI-dependent misreporting being cancelled out by energy adjustment, for example. In fact, researchers are now recommended to use self-reported energy intake for energy adjustment of other self-reported dietary constituents to improve risk estimation in studies of diet-health associations. It’s a cool little statistical trick that uses known measurement error to correct for unknown measurement errors.

A better criticism of FFQ’s would be to mention situations where there might be an unequal distribution of measurement error distribution among the population. For example, we know that the degree of underreporting dietary intake can positively correlate with body mass index (BMI), as higher BMI individuals tend to underreport food intake. But even in this case, this shouldn’t invalidate important findings that stem from FFQ data. Why? Because a greater underreporting of food intake in unhealthy populations doesn’t bias statistically significant associations. It will actually do the opposite: bias statistically nonsignificant associations that would have otherwise been statistically significant if there was no measurement error. If unhealthy populations report they eat 1 chocolate bar a day, not 3 (the truth), for example, this will only reduce the contrast in intake between healthy and unhealthy populations and make it seem as if chocolate bars are not a cause of the problem. So even if one were to argue that obese or unhealthy populations underreport food intake, I say cool, but this only means that strong statistical signals (P < 0.05) are more likely to be causal effects, not the opposite. It goes against their own argument.

We can see some examples of this in the literature. For example, Prentice et al. suggested the role of BMI-related underreporting of fat intake was responsible for the observed nonsignificant association between fat intake and breast cancer, and that a statistically significant result would have been found if dietary reporting errors were equally distributed among the different BMI groups. A more comprehensive example of relative risk attenuation comes from the Observing Protein and Energy Nutrition (OPEN) study [13]. In this study, participants completed an FFQ, 24-hour recall, doubly labelled water measurement, and 24-hour urinary potassium and nitrogen biomarker measurement. The attenuation of the relative risk using an FFQ was quantified for different nutrient exposures by calculating the attenuation factor that operates on the true regression coefficient in the disease model. The smaller the coefficient, the greater the attenuation of the relative risk estimate. Using protein and potassium measurements as examples, the estimated attenuation factor for protein was 0.16 for men and 0.14 for women, causing a ‘true’ relative risk of 2.0 to be estimated as only 1.10 – 1.12. For potassium, the attenuation factor was 0.29 for men and 0.23 for women, causing a true relative risk of 2.0 to be estimated as only 1.17 – 1.22. So again, although many critics will say that you cannot trust dietary associations from epidemiological data, I’d argue that meaningful associations are probably underrepresented. The likelihood and severity of the underestimation will depend on the relationship between the exposure, confounder, and outcome, but the attenuation effect is thought to be common in nutrition epidemiology.

Final Thoughts

Labelling self-reported dietary collection methods as flawed and systematic is unfair and not backed up by science – such a stance does little to promote constructive dialogue. Yes, FFQs are by no means perfect, but they don’t need to be for their intended purpose, which is to establish contrasts of intake among the population studied. To measure higher intakes versus lower intakes. There has been a considerable amount of work done to ensure the reliability of FFQs in validation studies, and to elucidate the degree of error with various dietary collection methods to develop appropriate adjustment models. Hopefully future insights continue to improve the precision and reliability of dietary assessment techniques, such as including more culturally inclusive and ethnic-specific FFQs, and to finely tune them along with in-app dietary tracking and the use of dietary biomarkers, but regardless, they are already valid! 

If you’ve enjoyed this article and would be kind enough to support My Nutrition Science, please consider donating to us. This helps keep the site running and the content flowing. Please also sign up to our email list below to be notified of future content and updates.

SUBSCRIBE TO EMAIL LIST

Related Deep Dives

Research Skills
Screenshot 2022-01-17 at 16.05.56

Nutrition Misinformation: How to Spot a Quack

Research Skills
Screenshot 2021-12-30 at 16.20.41

Statistical Inferences in Nutrition: P-Values, Point Estimates, and Confidence Intervals

Disease Prevention
Screenshot 2021-11-27 at 15.59.15

Merging Reductionism and Holism in Nutrition