The Tromsø study is a population study conducted in the municipality of Tromsø, Norway. Seven surveys were conducted between 1974 and 2016 (Tromsø1-Tromsø7) to which birth cohorts and random population samples were invited. [19].

### The study sample

The current study used data from the seventh survey (Tromso 7) conducted in 2015-2016. All residents of Tromsø municipality aged 40 or over were invited and 21,083 women and men (65% attendance) attended. All participants received an FFQ to complete on paper and return it by regular mail in a prepaid envelope. The study sample included those who answered the Questions and Answers (FFQ).(n = {15,146})). We excluded those who completed less than 90% of the FFQ ( (n = {3,489} )), those with missing values at the educational level or PAL ( (n = 358 )(and those with maximum values for total energy intake or total water intake) (n = 400 )), see details in Supplementary File 1. The final study sample then included 10,899 participants (Fig. 1).

### Measurements

Self-reported educational level was divided into low (including partial primary/secondary with up to 9 years of education, secondary with 10-12 years of education) or high (short higher education with (<4) College/university years and higher education with (c4) college/university years). Self-reported leisure time PAL was divided into low or high according to the validated Saltin-Grimby questionnaire. [20]. Specifically, the low PAL category here includes both sedentary and light exercise, ranging from those who are almost completely inactive to those who engage in light physical activity at least 4 hours per week (level 1 and 2 of the original Saltin-Grimby Scale). A high PAL level ranges from regular, moderate training at least 4 hours per week to vigorous strenuous training for competitions (level 3 and 4 of the original Saltine-Grumpy Scale).

The FFQ was validated by Carlsen et al. [21], included questions about the frequency and quantity of eating 244 food items, dishes, and drinks. Calculation of total energy- (TEI), total water- (TWI), food and nutrient intakes in kilojoules (kJ), and grams (g) per day, respectively, was performed using the Food Composition Database and Nutrient Counting System (KBS) at the University of Oslo, Oslo University. AE14 data (based on 2014 and 2015 Norwegian food composition tables), software version 7.3 [22].

We pooled intakes from 244 food items into 33 new variables as well as one variable not used in this study. Pooling was performed in a supervised manner, with each variable representing total intake for groups of solid foods and beverages (see Table S1, Supplementary File 2). This type of grouping is in line with the literature, where Q&A questions are typically summarized by 30-60 variables, see for example Karageorgou et al. [17].

### Pretreatment: graded intake values and stratification of the study sample

To identify similarities in dietary intake and composition of food preferences, each dietary variable was measured according to individual energy intake. The measured intake values were calculated by

$$start {align}mathrm{Food}^*_{ji}=mathrm{Food}_{ji}frac {mathrm{overline{TEI}}}{mathrm{TEI}_i} end {align} $$

where (mathrm{Food}_{ji}^*) And the (mathrm{Food}_{ji}) It represents the calculated and unmeasured intake of the food variable *y* per person *I*, Straight. The scaling factor divides the total average intake of all individuals ((mathrm{overline{TEI}})), with total energy intake per capita *I* ((mathrm{TEI}_i)).

Although we excluded participants with extreme values from intake, some participants still had significantly high values of food intake for some food variables. Due to the inherent uncertainty of FFQ data at the individual level, it is recommended to use FFQ data at the group level [23]. To group food variables, we thus chose to divide the study sample into groups of approximately 10 participants, resulting in a total of 1,091 groups. Segmentation was performed into groups by sampling randomly from participants, under the condition that the cohorts were homogeneous with respect to background variables. This means that participants assigned to each group were matched by gender, same age, education level (low/high) and PAL (low/high). The average intake values for different food variables within each group were then calculated, reducing the influence of possible outliers and false values.

### statistical analysis

Hierarchical clustering method was applied to group similar food variables into non-overlapping clusters. Pairwise distances between the 33 pooled food variables were measured by Spearman’s rank correlation. Dietary variables were then combined according to the complete linkage method into groups. Cluster analysis was repeated for 100 different random samples from a total of 1091 groups. The final diet groups were based on these 100 repetitions, with each dietary variable assigned to the diet group in which it occurred most frequently. This was done to assess the variability of mass results and to increase the health of the resulting diet groups [24]. Also, random cohort sampling was used to determine the number of groups that give the most stable classification result.

After identifying dietary patterns by cluster analysis, we calculated individual diet intake scores. The result is found by standardizing first measured individual intake values ((mathrm{Food}^{*}_{ji})) for each food variable *y*. This standardization was made to weigh the intake of all dietary variables equally. For each individual, we then averaged standard food variables within each diet group. This made the final intake results comparable across diets. Note that these individual diet scores reflect diet preferences, i.e. larger values mean greater intake of the specific diet. Also, the sum over all individual scores in each diet is 0.

The scores for each diet were then modeled as the dependent variable in regression models over age, including educational level (low/high) and PAL (low/high) as categorical covariates. The association between diet scores and age was modeled using a linear or nonlinear trend, where the nonlinear trend was modeled using normalized cubic slices. The different models were evaluated using the modified coefficient of determination ((R^2_{text{mode}})). Post hoc analyzes included analysis of covariance and post hoc pairwise testing of diet outcomes between groups with different combinations of the two categorical variables. Specifically, this included comparing groups with lower education and lower PAL ((mathrm{L_{edu} L_{PAL}})), low and high education PAL ((mathrm{L_{edu}H_{PAL}})(Higher and Lower PAL)(mathrm{H_{edu} L_{PAL}})), tertiary and tertiary PAL ((mathrm{H_{edu} H_{PAL}})). Analysis was performed separately for men and women and p-values were adjusted using Tukey’s method, ensuring a global significance level of 0.05.

#Determining #dietary #patterns #age #education #physical #activity #level #crosssectional #study #Tromsø #study #BMC #Nutrition