Main

Systems vaccinology uses high-throughput omics measurements together with systems-based analysis approaches to better understand immune responses to vaccination1,2. The recent growth of this field, which began with initial studies of yellow fever3,4 and seasonal influenza5,6 vaccines, has rapidly expanded to include studies profiling responses to a range of vaccines and vaccine platforms, including those targeting diverse pathogens and age groups7,8,9,10,11,12,13,14,15,16,17,18,19,20. These studies have led to important discoveries such as the role for the nutrient sensor general control nonderepressible 2 (GCN2) in enhancing antigen presentation during responses to yellow fever vaccination21, as well as the impact of the gut microbiota in promoting antibody responses to inactivated influenza vaccination22,23. However, apart from a few studies8,9,10, thus far the vast majority have examined immune responses to a single vaccine, hindering the ability to contextualize the findings and understand how differences in vaccine formulations can impact immunogenicity.

Another important outcome of such studies has been the identification of early transcriptional signatures predictive of immune response quality such as subsequent antibody3,5,18 or antigen-specific T cell3,7,14 responses. These findings may enable more rapid and personalized evaluation of vaccine efficacy and development of improved next-generation vaccines. Yet again, a current limitation is that the identified predictive signatures thus far have been described in the context of responses to a single vaccine, and the extent to which predictors of immune response quality are conserved across vaccines is unclear24,25. We previously sought to address the question of whether there was a ‘universal signature’ that could be used to predict antibody responses to any vaccine by analyzing the transcriptional response to five different human vaccines. Our analysis revealed distinct transcriptional signatures of antibody responses to different classes of vaccines, and provided key insights into primary viral, protein recall and anti-polysaccharide responses8, yet failed to identify a universal signature of vaccination.

Here we leverage the Immune Signatures Data Resource26, a curated database of publicly available datasets containing transcriptional and immune response profiling of peripheral blood following vaccination in humans, to perform a comparative analysis of transcriptional responses from 820 healthy young adults across 13 different vaccines. We find that while a common transcriptional program is shared across many vaccines, there is considerable heterogeneity especially in the kinetics of immune responses. In particular, the live-attenuated yellow fever vaccine induces a unique transcriptional response, with a surprisingly early upregulation of B and T cell modules within a day of vaccination, and a delayed induction of innate responses, including antiviral and interferon signaling, peaking at 10–14 d following vaccination. Furthermore, in an analysis of predictive signatures of antibody responses across vaccines, adjusting for time of peak expression enabled a gene module associated with plasma cells and immunoglobulins to consistently predict antibody responses across vaccines, demonstrating the importance of accounting for immune response kinetics in the development of universal predictors of response quality. Together, these findings highlight the spectrum of immune responses across vaccines and serve as a basis for future studies to understand the mechanisms underlying variation in immune responses across vaccines and inform future vaccine development.

Results

An integrated database of transcriptional responses to vaccination

As part of an effort to enable comparative studies and benchmarking of human vaccine responses, we curated a database of transcriptomic responses of 820 healthy adults (18–50 years old) across 13 different vaccines from previously published datasets. These datasets were compiled into ImmPort, a National Institutes of Health (NIH)-funded repository for immunological data27, and uploaded to ImmuneSpace (https://www.immunespace.org/) for centralized quality control (QC) and processing (Fig. 1a). This combined database, named the Immune Signatures Data Resource26, includes responses to a broad range of vaccines, including live viruses (for example, yellow fever, smallpox and influenza vaccines), recombinant viral vectors (for example, Ebola and HIV vaccines), inactivated viruses (for example, seasonal influenza vaccine), glycoconjugate vaccines (for example, pneumococcal and meningococcal vaccines; Table 1 and Supplementary Table 1). It also contains samples spanning multiple response time points, ranging from a few hours to more than 3 weeks after vaccination (Fig. 1b). Included participants in our initial analysis were restricted to 18–50 years old, and there were similar age and sex distributions across vaccines (Fig. 1c). For analysis, all post-vaccination samples were normalized by pairwise fold-change calculation with their matched pre-vaccination samples. Principal variance component analysis28 revealed that demographic features such as age and sex had relatively small contributions to variation in responses. In contrast, the post-vaccination time point of the sample explained 15% of the variance in the data, suggesting that there are shared kinetics of immune responses across vaccines (Fig. 1d).

Fig. 1: An integrated database of transcriptional responses to vaccination.
figure 1

a, Workflow for collection, curation and standardization of datasets in the Immune Signatures Data Resource. b, Histogram of the number of samples included per vaccine at each time point. Day 0 represents day of vaccination. c, Age distribution of participants in the Immune Signatures Data Resource by vaccine. The shape of points denotes the participant’s inferred sex based on Y-chromosome-specific gene expression. For participants with missing age data, ages were estimated using RAPToR43. For the boxplot, the center line represents the median, box limits represent upper and lower quartiles, and whiskers indicate 1.5 times the interquartile range. Ebola (RVV), n = 13; hepatitis A/B (IN/RP), n = 26; HIV (RVV), n = 10; influenza (IN), n = 496; influenza (LA), n = 28; malaria (RP), n = 44; meningococcus (CJ), n = 19; meningococcus (PS), n = 14; pneumococcus (PS), n = 12; smallpox (LA), n = 8; tuberculosis (RVV), n = 12; varicella zoster (LA), n = 31; yellow fever (LA), n = 107. d, Bar plot representing the proportion of variance in post-vaccination transcriptional responses that can be attributed to clinical (age, sex, ethnicity) and experimental (time after vaccination, vaccine) variables via principal variance component analysis. The residual represents the proportion of the variance that could not be explained by any of the included variables.

Table 1 Table of included vaccines

Common and unique transcriptional responses across vaccines

To examine the overlap in responses across vaccines, we identified differentially expressed genes after vaccination relative to the pre-vaccination baseline as well as differential expression of blood transcriptional modules (BTMs), a set of gene modules developed through large-scale network integration of publicly available human blood transcriptomes10. There was much less overlap at a gene level (Extended Data Fig. 1a) than at a module level (Extended Data Fig. 1b), where a majority of differentially expressed modules were shared across four or more vaccines. Based on temporal expression patterns, the most commonly induced modules clustered into four groups (Fig. 2a). Cluster 1, upregulated at days 1 and 3 after vaccination, represented BTMs related to innate responses and included modules associated with Toll-like receptor (TLR) and inflammatory signaling, antigen presentation and monocyte signatures. Cluster 2 contained multiple natural killer cell modules and was significantly downregulated on day 1 (Extended Data Fig. 1c). Finally, clusters 3 and 4 generally peaked in activity on day 7 and reflected plasma cell and cell cycle signatures, respectively, corresponding with expansion of antibody-producing plasmablasts. The ‘innate’ cluster 1 was most prominently induced in vaccines containing a live viral vector (Ebola, HIV), or an adjuvant (malaria; Fig. 2b). Meanwhile, the plasma cell signature in cluster 3 was strongly increased in the polysaccharide pneumococcal vaccine and the conjugate meningococcal vaccine (Fig. 2c).

Fig. 2: Common and unique transcriptional responses across different vaccines.
figure 2

a, Heat map of common differentially expressed modules (regulated in 7 or more vaccines) over time (asterisks denote false discovery rate (FDR) < 0.05). Color represents the QuSAGE activity score. Clustering on columns was performed separately for days 1, 3, 7, 14 and 21 after vaccination. Clusters 1, 2, 3 and 4 are indicated by the blue, yellow, pink and green vertical bars to the left of the heat map. TBA, to be annotated. b, Kinetics of the mean FC of cluster 1 modules across vaccines. c, Kinetics of the mean FC of cluster 3 modules across vaccines. DCs, dendritic cells; FC, fold change; MHC, major histocompatibility complex; NK, natural killer.

We next analyzed how differentially expressed modules were shared across vaccine responses (Fig. 3). In agreement with the previous analysis of common responses (Fig. 2), the response to most vaccines on days 1 (Fig. 3a) and 7 (Fig. 3c) reflected innate and plasma cell/cell cycle responses, respectively, while the day 3 response (Fig. 3b) appeared as an intermediate between these states, with both innate and cell cycle signatures present. However, such responses were not universally shared across all of the vaccines. In particular, the early innate and antiviral responses common to most vaccines on day 1 were not observed in the varicella zoster and yellow fever vaccine responses. While these signatures appeared at later time points (days 3 and 7) in yellow fever vaccine responses, they were not observed at all following varicella zoster viral vaccine. Additionally, the day 7 cell cycle signature was not observed following smallpox, varicella zoster and polysaccharide meningococcal vaccines. Notably, this signature was observed in the case of the meningococcal conjugate vaccine, where the bacterial polysaccharides have been conjugated to a diphtheria toxoid protein to induce memory and helper T cell responses29. Since diphtheria toxoid protein is used in other vaccines such as the Haemophilus influenzae type B vaccine30, the cell cycle signature observed at day 7 likely reflects the plasmablast response of the recall response to diphtheria toxoid, consistent with our previous study10.

Fig. 3: Overlap in transcriptional responses across vaccines.
figure 3

ac, Circos plots of the overlap in differentially expressed modules (FDR < 0.05) across vaccines on days 1 (a), 3 (b) and 7 (c). Each segment of the circle represents one vaccine, and each point in a segment represents a single module. Heat maps in the outermost ring represent the correlation with day 28 antibody responses, and bars in the middle ring represent the activity score of differentially expressed modules. Lines connect modules with a significant positive score shared between vaccines. Boxes and line colors in the innermost ring represent the functional groups of the modules. ECM, extracellular matrix.

Early adaptive and delayed innate signatures of yellow fever vaccine

At the gene level, responses were highly correlated across most vaccines on day 1 after vaccination (Fig. 4a) but became more divergent at later time points (Extended Data Fig. 2a,b). On day 1, Ebola, inactivated influenza, HIV and malaria vaccines exhibited the strongest similarity (Fig. 4a and Extended Data Fig. 2c). However, the yellow fever vaccine YF-17D induced a very distinct response that had little or even negative correlation with responses to all other vaccines, including other live viral vaccines such as varicella zoster virus, HIV and Ebola (Fig. 4a and Extended Data Fig. 2d). The innate pathways that were upregulated in other vaccine responses were, in fact, downregulated in response to yellow fever vaccine on day 1 (Fig. 4b). Instead, YF-17D induced early expression of multiple B and T cell modules. Analysis using the xCell deconvolution algorithm31 detected an increase in estimated frequencies of total B cells (Extended Data Fig. 2e,f), suggesting that this signature may reflect an induction of adaptive cells into the periphery.

Fig. 4: Early adaptive and delayed innate transcriptional signatures of yellow fever vaccine.
figure 4

a, Correlation matrix of pairwise Spearman correlations of day 1 gene-level FC values between vaccines. b, Heat map of day 1 activity scores of modules differentially expressed in response to yellow fever vaccination (QuSAGE FDR < 0.2). c, Heat map of day 7 activity scores of modules differentially expressed in response to yellow fever vaccination (QuSAGE FDR < 0.05, activity score > 0.2). d, Kinetics of the mean FC of module M75 across vaccines. e, Heat map of the post-vaccination FC of genes in module M75. LPS, lipopolysaccharides.

Another surprising feature of the yellow fever vaccine response was the relatively late expression of antiviral and interferon pathways, whose expression started to be observed on day 3 and peaks on day 7 (Fig. 4c,d). While these modules were also upregulated at this time point in Ebola vaccine responses, their expression waned rapidly following a robust early induction at day 1. Some of the genes in these pathways that were strongly upregulated on day 1 in response to most vaccines, such as CXCL10 and OAS1, were upregulated as late as 21 d after vaccination with YF-17D (Fig. 4e). Importantly, both the early adaptive and delayed innate responses were consistent across multiple studies from diverse geographical locations (Extended Data Fig. 2g,h). Together, these results highlight the unique kinetics of transcriptional responses to yellow fever vaccine relative to other vaccines.

Impact of preexisting immunity on transcriptional responses to vaccination

One possible contributor to the unique kinetics of responses to the yellow fever vaccine is that of the studies included in the data resource; all participants were naïve to yellow fever and the vaccine was inducing a primary response. This is in contrast to other vaccines such as influenza and varicella zoster that elicit a recall response in the adult populations studied. However, it is difficult to evaluate the impact of preexisting immunity and compare primary and recall responses across vaccines because of intrinsic differences in the nature of these vaccines as well as other factors such as the vaccine platform and type of pathogen that also affect the nature and kinetics of immune response and confound such an analysis.

To address this, we therefore performed a vaccine-specific analysis using all inactivated influenza studies. The inactivated influenza vaccine was ideal for this analysis as there are a large number of such studies in the data resource, and there is substantial heterogeneity in the amount of preexisting immunity and previous exposure to influenza across the population. We used pre-vaccination antibody levels as a marker of preexisting immunity and defined ‘high’ and ‘low’ baseline antibody participants as the top and bottom 30% of participants in each study based on baseline geometric mean antibody titer. Increased levels of preexisting antibodies resulted in diminished induction of interferon (Extended Data Fig. 3a,c) and plasma cell (Extended Data Fig. 3b,d) signatures on days 1 and 7, respectively, but did not appear to impact the kinetics of these responses (Extended Data Fig. 3e,f). One possible explanation for this effect is that increased amounts of preexisting antibodies may bind antigen from the vaccine, thereby reducing the amount available to be processed and presented by innate cells and lowering the vaccine ‘take’.

Time-adjusted transcriptional predictors of antibody responses

A key goal of systems vaccinology is to identify early signatures predictive of subsequent protection from infection. Antibody titers have been established as a reliable correlate of protection against many pathogens32 and previous studies have identified transcriptional signatures predictive of antibody responses to several vaccines, including inactivated influenza6,11,18,33,34, yellow fever3 and hepatitis B12. However, these signatures have thus far been developed for single vaccines, and it remains to be seen whether a ‘universal signature’ exists that can predict antibody responses across vaccines. Our curated data resource is uniquely suited to address this question, as ten of the encompassed vaccines had at least one dataset with antibody titer measurements before and ~1 month after vaccination (Extended Data Fig. 4a). As there was substantial variability in antibody responses across vaccines, we defined ‘high’ and ‘low’ responders on a per dataset basis as the top and bottom 30% of participants according to antibody titer fold changes. We then used an elastic-net machine learning algorithm to develop classifiers capable of distinguishing between high and low responders based on early transcriptional signatures (Methods).

As an initial approach, we wanted to examine whether a model trained using responses to a single vaccine could reliably predict responses to other vaccines. We therefore first built models using all 15 inactivated influenza vaccine datasets (the vaccine for which there was the largest number of samples) in a leave-one-study-out training/testing configuration. As validation that our model could predict responses within the same vaccine, classifiers trained using fold-change expression data at day 7 were able to predict high-versus-low antibody response in the left-out influenza dataset, with areas under the receiver operating characteristic (ROC) curve (AUCs), a measure of classification performance taking into account both true positive and false positive rates, ranging between 0.55 and 0.9 (Fig. 5a). The modules in these classifiers were primarily associated with cell cycle and plasma cell modules (Extended Data Fig. 4b). These results are consistent with previous work showing that classifiers built using similar pathways are predictive of antibody responses to influenza vaccination across multiple seasons18. However, when we examined their performance in other vaccines, they were not reliably predictive (Extended Data Fig. 4c). Moreover, the expression of modules associated with antibody response to inactivated influenza vaccination at day 7 was not generally correlated with antibody responses across vaccines (Fig. 5b).

Fig. 5: Time-adjusted transcriptional predictors of antibody responses.
figure 5

a, AUC bar plot of antibody response prediction performance per dataset for the elastic-net classifier trained on inactivated influenza datasets only. Data are presented as mean values ± 95% confidence intervals (CIs). n = 2000 bootstrap replicates. b, Heat map of high-versus-low antibody responder difference across vaccines of modules differentially expressed (FDR < 0.05) between high and low antibody responders to inactivated influenza vaccination. c, Kinetics of the predictive power of M156.1 across vaccines. For each vaccine/time point combination, the AUC was computed based on difference in the geometric mean of the fold changes of the genes in the M156.1 between high and low responders (Methods). d, Weighted ROC curves for a logistic regression classifier using M156.1 expression either at day 7 in all vaccines (day 7) or at the vaccine-specific peak expression time point. FPR, false positive rate; TPR, true positive rate. e, Per-vaccine AUC bar plot for a logistic regression classifier using M156.1 expression either at day 7 in all vaccines (yellow) or at the vaccine-specific peak expression time point (green indicates peak at day 7, and pink indicates peak at other time points). Data are presented as mean values ± 95% CIs. n = 2,000 bootstrap replicates.

We then asked whether training across multiple vaccines would improve the universality of the identified signatures. Neither a leave-one-vaccine-out approach, nor a tenfold cross-validation approach combining all datasets, was able to identify signatures on day 3 or day 7 after vaccination that could accurately discriminate high versus low responders across all vaccines (Extended Data Fig. 4d,e). However, analysis of the predictive power of specific modules over time, such as M156.1, one of the plasma cell modules associated with response in influenza vaccination on day 7, revealed that this module was predictive of response across many vaccines but at different time points (Fig. 5c). While many vaccines saw a strong association between M156.1 on day 7 and subsequent antibody response, in certain vaccines such as yellow fever and smallpox, expression of the module was not associated with response until much later, at days 10–14 and 21, respectively, consistent with the delayed kinetics of this BTM with these vaccines (Fig. 2).

These results suggest that differential kinetics of immune responses across vaccines pose a confounding variable in the identification of universal predictive signatures of response at a single time point, but that using vaccine-specific time points dictated by the particular kinetics of immune responses for identification of predictive biomarkers of vaccine responses may improve the universality of such signatures. To test this hypothesis in the context of the plasma cell signature, we identified the time point at which expression of the plasma cell module M156.1 peaked in response to each vaccine (Fig. 2c). We then trained a logistic regression classifier with M156.1 expression as an input in a tenfold cross-validation approach using fold-change data at the peak M156.1 expression time point for each vaccine. Indeed, using M156.1 peak expression time points improved the overall performance of the classifier compared with using a single time point (day 7) for all of the vaccines (Fig. 5d). This improvement was driven by increases in response prediction among vaccines in which the plasma cell signature peaked at time points other than day 7, such as the yellow fever and smallpox vaccines (Fig. 5e). Thus, expression of the plasma cell module M156.1 acts as a time-variable universal signature of antibody responses to vaccination.

Impact of aging on transcriptional responses to vaccination

The impairment of vaccine efficacy with age is a major challenge for vaccine development and public health. Although declining vaccine efficacy can broadly be attributed to effects of immunosenescence such as loss and dysfunction of naïve T cells35, diminished class-switch capability of B cells36 and decreased TLR function among innate cells37,38, the molecular mechanisms responsible for impaired vaccine responses among older adults are not yet fully understood. While most of the curated datasets in the Human Immunology Project Consortium (HIPC) resource contained only young adult participants, some studies, including those of inactivated influenza18, varicella zoster13 and hepatitis B12 vaccines, profiled responses of both young (≤50 years) and older (≥60 years) vaccinees. As expected, post-vaccination antibody responses were diminished in older compared to younger participants across all three vaccines (Fig. 6a).

Fig. 6: Impact of aging on transcriptional responses to vaccination.
figure 6

a, Box plots of day 30 antibody responses to vaccination in young (≤50 years) and older (≥60 years) participants across vaccines. The center line represents the median, box limits represent upper and lower quartiles, and whiskers indicate 1.5 times the interquartile range. Hepatitis A/B (IN/RP)—young, n = 25; hepatitis A/B (IN/RP)—older, n = 135; influenza (IN)—young, n = 123; influenza (IN)—older, n = 175; varicella zoster (LA)—young, n = 16; varicella zoster (LA)—older, n = 19. b, Modules differentially expressed between young and older participants in response to inactivated influenza vaccination (QuSAGE FDR < 0.05). c, Network plot of module M75 on day 1 following inactivated influenza vaccination in young and older participants. Each edge represents a coexpression relationship, as described in Li et al.10; colors represent the day 1 log2 FC. d, Bar plot of the day 7 AUC of module M156.1 across vaccines in young and older participants. Data are mean values ± 95% CIs. n = 2,000 bootstrap replicates.

We sought to examine for the effect of aging on immune responses across vaccines by comparing BTM activity scores of the most commonly induced BTMs (Fig. 2a) between young and older participants across all three vaccines at each time point. Broadly, transcriptional responses to the three vaccines were similar between the two age groups (Extended Data Fig. 5a). However, there were significant age-associated differences in several pathways in response to inactivated influenza vaccination, including decreased expression of interferon and other innate immune modules in older compared to young participants early after vaccination (Fig. 6b,c), consistent with previous findings18. Despite these differences, the power of the plasma cell signature to predict the antibody response was similar in both young and older individuals (Fig. 6d). These results suggest conservation in the pathways responsible for successful antibody production after vaccination, consistent with earlier findings for influenza vaccination20.

Discussion

The high degree of homology in the vaccine-induced signatures demonstrates that diverse vaccines that differ widely in target pathogens and composition stimulate conserved immunological networks. Despite this homology, there was still substantial heterogeneity in both the magnitude and kinetics of the induced responses across vaccines. The most distinct in this regard were responses to the yellow fever vaccine YF-17D, which displayed several unique features: (1) a delayed innate and antiviral response that did not peak until days 3–7 after vaccination (Fig. 4d), (2) an early upregulation of B and T cell signatures at day 1 (Fig. 4b and Extended Data Fig. 2e) not observed in other vaccines until much later, and (3) a delay in cell cycle and plasma cell signatures typically associated with the expansion of antigen-specific antibody-secreting cells (Fig. 2a,b).

The observed temporal differences in the plasmablast signature likely reflects differences in the kinetics of the plasmablast response. This in turn will depend on a number of factors, such as immune memory caused by previous exposure to the vaccine or pathogen, the persistence and distribution of the vaccine in the body, as well as the nature of the innate signals triggered by the vaccine. Vaccines such as YF-17D that induce a primary immune response in individuals who have not previously been exposed to the vaccine or virus, will mount a delayed plasmablast response. Furthermore, because YF-17D consists of a live virus that causes an acute viral infection during which viral loads in the serum can be detected for a week or longer, the systemic and sustained presence of the virus may result in a more prolonged and robust plasmablast response. Finally, the triggering of multiple TLRs and innate receptors by YF-17D3,39 may result in potent activation of the innate immune system that results in a prolonged antibody response. Indeed, consistent with this notion, our previous work has shown that synthetic nanoparticles containing antigens plus ligands that signal through TLR4 and TLR7 induces synergistic increases in antigen-specific, neutralizing antibodies compared to immunization with nanoparticles containing antigens plus a single TLR ligand40.

Of note, yellow fever and other flaviviruses have a specific capability to inhibit interferon signaling via multiple mechanisms, including suppression of JAK–STAT signaling41, which could potentially cause the observed delay in interferon responses following YF-17D vaccination. Interestingly, the vaccinia virus also has several mechanisms for inhibition of interferon responses, including preventing activation of IRF-3 and NF-κB and dephosphorylation of STAT1/2 transcription factors42. Although early response data was not available, the smallpox vaccine containing vaccinia also induced some degree of delayed interferon response following vaccination (Fig. 4d).

While YF-17D demonstrated delayed induction of interferon signatures, induction of B and T cell signatures at day 1 was much earlier than typically observed with other vaccines. This timing is most likely too early to represent an antigen-specific response but could reflect nonspecific activation or recruitment of naïve cells into the circulation. Alternatively, these signatures could be a result of increased relative proportions of adaptive cells in the blood due to extravasation of innate cells into tissues at the site of injection. Further investigation at a cellular level is required to address these hypotheses and elucidate the mechanisms by which YF-17D exerts such unique early effects on the adaptive immune system.

Finally, our analysis of predictive signatures of antibody responses (Fig. 5) indicates that vaccine response kinetics play an important role in determining such signatures. Here we have illustrated this principle for a single plasma cell transcriptional module; however, future analyses may enable detection of additional and more accurate signatures. We have previously proposed the concept of a ‘vaccine chip’ that could measure defined biomarkers and be used to predict protective immune responses across vaccines25. We proposed that this chip would be designed to measure expression of a select set of genes or modules, subsets of which would predict a particular type of functional or protective immune response (for example, neutralizing antibody titers, effector CD8+ T cell responses, frequency of polyfunctional T cells, response bias in TH1 versus TH2 subsets of helper T cells, and so on)25. The results of the present study pave the way for the development of a simple PCR assay (a ‘vaccine chip’) that can be used to monitor plasmablast signatures in vaccinees. Indeed, PCR-based assays are practical and widely deployable globally, as witnessed during the coronavirus disease 2019 pandemic. Thus, they are likely to be of greater practical value than a fluorescence-activated cell sorting-based assay, involving multicolor fluorescence-activated cell sorting analysis to define plasmablast signatures, as such an assay may be too complex to set up in the field. The concept of time-adjusted signatures based on vaccine-specific response kinetics will be useful in the development of future signatures. In practice, small phase I/II trials could be used to define response kinetics and enable the successful application of a ‘vaccine chip’ to predict immune responses in subsequent trials.

Due to the considerable costs needed to perform a clinical trial of sufficient size, such vaccine studies are rarely performed with more than one vaccine. Here, we have demonstrated that meta-analysis of vaccine trials can provide valuable insights into the common and unique aspects of immune responses across vaccines. These findings complement those of a companion manuscript by S.F., L.T., M.M., D.C., B.G., et al. describing a set of pre-vaccination transcriptional immune states within the same set of participants that influence responsiveness to vaccination. Combined with the Immune Signatures Data Resource26, these computational approaches and repositories will enable future research into the mechanisms of vaccine-induced immunity to inform development of improved adjuvants and vaccines.

Methods

Gene expression preprocessing

An extensive description of the preprocessing of microarray and RNA-sequencing (RNA-seq) datasets included in the Immune Signatures Data Resource can be found in the associated paper26. The dataset includes 2,949 samples from published studies and 228 samples not included in previously published studies. All these samples were assembled into a single resource. Briefly, raw probe intensity data for Affymetrix studies were background corrected and summarized using the RMA algorithm44. For studies using the Illumina array platform, background-corrected raw probe intensities were used. For RNA-seq studies, count data were transformed using voom45 to mimic the distribution of microarray expression intensities. Expression data within each study were quantile normalized and log transformed separately for each study.

Batch correction

An extensive description of the across-studies normalization used to correct for batch effects can be found in the Immune Signatures Data Resource paper26. Briefly, a linear model was fit using the pre-vaccination normalized gene expression as a dependent variable and platform, study and blood sample type (that is, whole blood or peripheral blood mononuclear cell) as independent variables. The estimated effect of the platform, study and sample type was then subtracted from the entire gene expression (before and after vaccination) to obtain batch-corrected gene expression.

Identification of differentially expressed genes

To determine differentially expressed genes, P values were first computed within each study using two-sided paired student’s t-tests. Next, Stouffer’s method was used to combine P values across studies via the sumz function in the metap R package46, with weighting according to the square root of the study sample size. Finally, combined P values were then adjusted for multiple testing using the Benjamini–Hochberg procedure. Similarly, average gene fold changes for each vaccine at each time point were computed by averaging across studies while using weighting equal to the study sample size.

Gene-set enrichment analysis

The enrichment analysis of BTMs was performed in two steps. First, for every study and time point, enrichment was calculated using QuSAGE47, providing as contrast ‘day X–day 0’, where X is the current time point, and also a ‘pairVector’ containing the subject identifiers so that a paired analysis would be performed. Second, to integrate the results from multiple studies of the same vaccine, we performed a meta-analysis for every vaccine plus time point combination, using the ‘combinePDFs’ function of QuSAGE.

Gene and module sharing analysis

The sharing number of a gene/module is computed as the maximum number of vaccines it is significantly differentially expressed (FDR < 0.05) in, irrespective of time point. For modules, the P values were calculated using QuSAGE47 (‘Gene-set enrichment analysis’). A null distribution for sharing was generated by performing 10,000 permutations of gene/module labels within each vaccine plus time point group.

Antibody titer measurements and identification of high and low responders

Depending on the study, antibody titers were measured by neutralization assays, hemagglutination inhibition assay or IgG levels measured by ELISA26. Because some vaccines include multiple strains of viral antigens, the fold change in the antibody response metric was defined as the maximum fold change (MFC) of any strain in the vaccine at day 28 (±7 d) compared to before vaccination. To minimize the difference in antibody response between studies (for example, due to different vaccines or different techniques used for antibody concentration assessment), the high and low responders were identified for each study separately by selecting the participants with an MFC equal or above the 70th percentile as high responders and participants with an MFC equal or below the 30th percentile as low responders.

Identification of predictive signatures of antibody responses

Four training/testing setups were used for identification of predictive signatures of antibody responses: (1) inactivated influenza datasets only, leave-one-study-out; (2) training on all inactivated influenza datasets, testing on other vaccines; (3) leave-one-vaccine-out (all datasets combined); and (4) tenfold cross-validation (all datasets combined). All models were trained using elastic-net logistic regression using the ‘caret’ and ‘glmnet’ R packages. BTM enrichment scores were calculated for each sample using the single-sample gene-set enrichment analysis (ssGSEA) function and used as input features to the models, filtering for modules with a standard deviation > 75% quantile of the standard deviation. Models were fit using either the day 3 fold-change value or the day 7 fold-change value of the ssGSEA score separately. Tuning parameters and performance metrics were estimated using tenfold cross-validation. CIs were estimated using the ‘ci.auc’ function from the pROC R package.

When developing predictive models for the time point adjustment approach, logistic regression models were trained using ssGSEA score fold change for module M156.1, either at day 7 or at the time point of peak expression in a given vaccine. AUC CIs were estimated using linear mixed-effects models fitted with 100 Monte Carlo resamples. When computing AUCs across multiple vaccines, a weighted AUC was computed using sample size as the weights. For the analysis of temporal change in the predictive capability of M156.1 (Fig. 4c), a weighted mean AUC (based on number of samples in each study) was computed using the calculateROC function of MetaIntegrator R package based on the geometric mean of gene fold changes in the M156.1 module.

Statistics and reproducibility

ArrayQualityMetrics R package was used for QC of all microarray data. Outlier detection was based on the following statistics: (1) mean absolute difference of M values (log-ratios) of each pair of arrays; (2) the Kolmogorov–Smirnov statistic Ka between each array’s signal intensity distribution and the distribution of the pooled data; and (3) the Hoeffding’s statistic Da on the joint distribution of A (average) and M values for each array. Samples that failed all three QC statistics were removed from further analysis. As this study only involved reanalysis of public datasets, no statistical method was used to predetermine sample size, experiments were not randomized and investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.