8. Magnetic resonance imaging (MRI) data acquisition
9. Processing and statistical analysis of the imaging data
10. Supplementary Figures
All sequenced samples are Han Chinese from Sichuan province of China, consisting of 97 schizophrenia patients and 137 normal controls. All patients with schizophrenia were recruited at the Mental Health Centre of the West China Hospital, Sichuan University, People’s Republic of China and the diagnoses of schizophrenia were confirmed by an attending psychiatrist and a trained psychiatrist. Case inclusion criteria: ①Fulfilling the diagnostic criteria for schizophrenia as specified in the structured clinical interview for DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, fourth edition)-PatientVersion (SCID-P)1; ② First-episode patients. Case exclusion criteria: Other SCID-P diagnosis or a neurological or medical condition; ②Severe physical illness; ③History of consciousness disturbance after traumatic brain injury. All patients have been followed up for at least 6 months in order to confirm the diagnosis of schizophrenia. 140 healthy volunteers were recruited from the community. Control inclusion criteria:①Screen for a lifetime absence of psychiatric illnesses using the DSM-IV-NonPatient Edition (SCID-NP).②Excluded significant physical illnesses, pregnancies, or any psychiatric disorders. Ethics Committee of the West China Hospital of Sichuan University approved all procedures. All next of kin, carertakers or guardians consented on the behalf of participants to provided written informed consent for their participation. We used the following criteria to evaluate whether the participants had the capacity to consent: Firstly, patients have the ability to understand; Secondly, patients have the ability to know reason; Thirdly, patients have the ability to make rational decisions. If participants failed to fill out the consent form more than twice, their guardians were asked to fill out the consent on the behalf of patients.
Exome sequencing and variants calling:
Peripheral blood was collected from all participants, and genomic DNA was extracted according to a standard phenol–chloroform procedure. Exome sequencing was performed by Macrogen (http://www.macrogen.com/), a commercial service. DNAsample was prepared according to the Illumina Protocol. Targeted enrichment was performed with TruSeqExome Enrichment Kit optimized for Illumina sequencing. Briefly, genomic DNA was sheared to 350-400 base pair using Illumina adapters. The fragment was end repaired and polyA was ligated to the 3'and 5' end and Illumina adapters. The size selected product was PCR amplified, and the final product was validated using the Agilent Bioanalyzer. Two steps of hybridization and wash were needed for construction. PCR was used to amplify the enriched DNA library for sequencing which produced 101bp end reads, approximately 6-10 billion base calls were generated for each sample. PCR was performed with the same PCR primer cocktail used in TruSeq DNA Sample Preparation. The Enrichment Kit is designed to cover 62 Mbexomic sequences. Pipeline of raw data processing and calling was present in Supplementary Figure 1.Quality control of raw data was conducted by FastQCsoftware (http://www.bioinformatics.babraham.ac.uk/projects /fastqc/).Reads were mapped to a custom hg19 build using Burrows-Wheeler Alignment tool (BWA)2. The duplicate reads were flagged using Picard-tools (http://picard.sourceforge.net/). GATK3 IndelRealigner module was used to realign reads around insertion/deletion (Indel) sites. Individual sequence data (in BAM format) was preprocessed as whole NGS community suggests which mainly include local Indel realignment, PCR duplicates removal and base quality recalibration. Read qualities were recalibrated using GATK Table Recalibration. GATK unified Genotyper module was then used to call variants (both SNVs and Indels) from multiple samples simultaneously, which create a single Variant Call Format (VCF) file. Raw read data were visualized using the Integrative Genome Viewer (IGV)4.
Individual level QC
Individual level quality control was conducted on raw and clean variant to make sure avoiding false positive variants. A suite of per-individual metrics, which included the total number of alternate alleles, dbSNP coverage (build137), and Transition/Transversion (Ti/Tv) ratio, and variant quality recalibration (VQSR) were calculated. From available exome data, we extracted common variants and estimated per-individual heterozygosity (~inbreeding), pairwise relatedness, and sex-check using PLINK5. We also use EIGENSOFT6 doing population stratification analysis.
After calculating all of metrics above, we remove 6 outliers (Ncase=3 and Ncontrol=3), all of results show in Supplementary Figure2. Using verifyBamID (http://genome.sph.umich.edu/wiki/VerifyBamID), we estimated that samples of 6 outliers had above 20% contamination. High levels of contamination can reduce the accuracy of variant calls and lead to false positives calls, so we discard these samples for following analysis.
Variants quality control was conducted by in house software KGGSeq (http://statgenpro.psychiatry.hku.hk/limx/kggseq/doc/UserManual.html)7,which were carefully designed to filter and prioritize gene variants in exome sequencing of rare Mendelian and common complex disorder. Variants were kept if ①The minimum overall sequencing quality scores≧50 (--seq-qual 50) andthe minimum overall mapping quality score≧20 (--seq-mq 20);②The minimal genotyping quality per genotype≧30 (--gty-qual 30) and the minimal read depth per genotype≧30 (--gty-qual 30); ③The fraction of the reads carrying alternative allele≦5% at a reference-allele homozygous genotype (--gty-af-ref 0.05), the fraction of the reads carrying alternative allele≧25% at a heterozygous genotype (--gty-af-het 0.25), and the fraction of the reads carrying alternative allele≧75% at a alternative-allele homozygous genotype(--gty-af-alt 0.75);④Minimal observed number of non-missing genotypes in all samples as 50 (--min-obs 50); ⑤Variants in controls with the Hardy-Weinberg test p value≧0.00001 (--hwe-control 1.0E-5);⑥Variants with "FILTER"matching the VQSRlabels (--vcf-filter-inPASS, VQSRTrancheSNP90.00to93.00,VQSRTrancheSNP93.00to95.00,VQSRTrancheSNP95.00to97.00,VQSRTrancheSNP97.00to99.00).
Single-site and gene-based association analysis:
Methods implemented in PLINK/Seq (http://atgu.mgh.harvard.edu/plinkseq/) were employed for single site and gene based association analysis. Using a permutation scheme (--perm -1) to control for potential confounding effects, allele counts between cases and controls did not show very significant difference after adjustment. We evaluated evidence for association at the gene level using the basic burden test and the CALPHA test 8. Both of these test indicated that gene NRK (Nik Related Kinase, NM_198465, chrX: 105132399...105199499) were detected for a very small P-values (p ≦1.0E-06).
Alternative strategy for filtering rare variants
Base on previous knowledge of highly genetic heterogeneity9 and the contribution of rare variants in schizophrenia10-12, we focus on the difference of very rare variants distribution between cases and controls. So we design a personality analysis pipeline. In current methods, we split the variants for case-unique and control-unique (Supplementary Figure 1) respectively. Variants were kept if ①Nonsynonymous; All mutations were annotation by KGGSeq, we ignored synonymous variants because nucleotide substitution of these kind variants does not change amino acid and we suppose theses mutations contributed little to the cause of schizophrenia.②Predict damaging; All of nonsynonymous variants that met any of the following criteriawere considered potentially damaging: frameshift, nonsense, stoploss, stopgain, splicing and missense mutation with Polyphen score≧0.90 and/or SIFT p≦0.05 and/orGrantham score≧100 and/or phyloP score≧2.013-16. ③MAF(Minor Allele Frequency)≦0.1%;The reason why we focus on ultra rare variants is that shaun et al17 have proved these kinds of variants may contribute more to schizophrenia. In fact, after filtering by serious condition above, all of variants left proved to be very rare (MAF≦0.1%).④Variants within a gene express in braintissue. Brain expression data were extracted using existing database in the BrainSpan: Atlas of the Developing Human Brain (http://hbatlas.org/).
Variants filtering were conducted by KGGSeq7 mentioned above. After filtering, 2895 mutations within 2442 genes in cases and 4484 mutations within 3481 genes in controls were done Gene Ontology (GO) enrichment using GeneMANIA (http://www.genemania.org/).The results of enrichment indicated that two GO (GO: 0006281 and GO:0007049) only present in cases.
Candidate 52 variants within 41 genes were tested using standard Sanger sequencing on an ABI3730xl DNA Analyzer to validate presence of each mutation in the subjects, by designing custom primers (Sigma) based on ~500 bp of genomic sequence flanking each variant.
41 genes from two GO enrichments above were selected for weighted gene co-expression network analysis (WGCNA), which design based on the pair-wise correlations between gene expression profiles across different developmental stages to infer a gene co-expression network. Based on the network, clusters of highly correlate genes were detected. The WGCNA clusters the genes using a measure of topological overlap, based the correlation matrix raised to a power chosen to meet a scale-free topology criterion18, and all of which was done using the WGCNA R package19. All of Brain expression data from different points in life were acquired from a published study (The Human Brain Transcriptome; HBT)20.We use the standard parameters of the software package, with power set to 6 and a minimum module size of 10. In the resultant network, the expression levels of the genes in each module were represented by the first eigenvector of the correlation matrix between the genes (the module “epigene”). Plotting of epigene values was performed using the R package ggplot221.
Samples for imaging
39 of 94 schizophrenic patients were detected with 52 variants in 41 genes in two pathways. So we divided cases into two groups (one group enrichment variants in two pathways, another group without such variants). Next, we checked hippocampus imaging between these three groups (cases with variants, cases without variants and controls).In total, 74 healthy controls, 48 patients without variants and 26 patients with variants underwent MRI scans.
Magnetic resonance imaging (MRI) data acquisition
Participants underwent MRI scans in the Department of Radiology at West China Hospital with a Signa 3.0-T scanner (GE Medical Systems, Milwaukee, WI) with an eight-channel phase array head coil. High-resolution T1 images were acquired by 3D spoiled gradient echo sequence (SPGR). The sequence used in this protocol was as follow: TR=8.5 ms; TE=3.93 ms; flip angle =12°; thickness of slice=1 mm; single shot; field of view (FOV) =24 cm× 24 cm; matrix =256 × 256; size of voxel=0.47×0.47×1 mm3; in total, 156 slices of axial images were collected from a brain. Resting-state functional MRI (fMRI) images were obtained using a gradient–echo echo-planar imaging (EPI) sequence. The subject was instructed to lie inside the scanner, remained relaxed with eyes closed when T2*-weighted fMRI images were acquired with a gradient-echo echo-planar imaging (EPI) sequence: repetition time/echo time = 2000/30 milliseconds; flip angle = 90°; slice thickness = 5 mm (no slice gap), 30 axial slices; 64×64 matrix size; field of view = 240×240 mm2; voxel size = 3.75×3.75×5 mm3. Each fMRI run contained 200 image volumes. During the rsfMRI scanning, Brainwave 2.0 software was used to monitor the head motion of the subject. The head translational movement and rotation were less than 1.5 mm and 1.5°, respectively.
Processing of the data
Hippocampal delineation and analysis
The hippocampi from each brain were traced using MultiTracer (http://bishopw.loni.ucla.edu/MultiTracer/MultiTracer.html)22.The hippocampi were manually traced in coronal brain slices from anterior to posterior contiguously using standard protocol by two trained investigators (ML L and HY), blind to all of the demographic variables. Anatomical landmarks were confirmed in all three orthogonalviewing planes, and contours were drawn onmagnified images (4x) to facilitate the accurate identification of neuroanatomic boundaries and faithful tracking of small-scalefeatures. The hippocampal borders were determined by the temporal horn, choroidal fissure, uncal and ambient cisterns, and the gray-white junction between the subiculum and parahippocampal gyrus (Supplementary Figure 3).Volumes were obtained from the hippocampal surface tracings for use as dependent measures in statistical analyses.
To quantify inter-rater reliability between two different investigators (ML L and H Y), the hippocampus on six brains werer andomly selected to traced and the volume of hippocampus were calculated. The intra-class correlation coefficient for hippocampal volume was 0.89. To establish intra-rater reliability, the hippocampusfrom6 randomly chosen brains repeatedly contoured, the raters were 0.91 and 0.90 for ML L and H Y, respectively.
The volume of bilateral hippocampal subregions including cornuammonis (CA)1-3, CA4-dentate gyrus (DG), subicular complex (SC) were auto calculated using SPM Anantomy Toolbox Version 2.0,from modulated gray matter volume map of every subject by Diffeomorphic Anatomical Registration Through Exponentiated Lie algebra (DARTEL) toolbox in Statistical Parametric Mapping (SPM) 8. The details are as follows: all structural MRI scans were realigned manually according to the anterior commissure-posterior commissure line and were segmented into probability maps of gray matter (GM), white matter (WM), and cerebrospinal fluid using the ‘newsegment’ routine implemented in SPM8. Flow fields and a series of template images were produced by running the ‘DARTEL (create templates)’ routine using imported versions of the GM and WM generated in the previous step. The flow fields as well as the final template images were used to generate smoothed (6 mm full-width at halfmaximum isotropic Gaussian kernel), Jacobian modulated, and spatially normalized GM in Montreal NeurologicalInstitute (MNI) space.
Covariance analysis (ANCOVA) by least significant difference post hoc test was used to analysis the volume of hippocampus (Left and Right, respectively) and hippocampus subregions among three groups with gray matter, sex, age and duration of education as covariates, and significant differences was set at p< 0.05.
rsfMRI image processing
rsfMRI image processing was carried out using Statistical Parametric Mapping (SPM8, http://www.fil.ion.ucl.ac.uk/spm) and Data Processing Assistant for Resting-State fMRI (DPARSF)23. The first 10 time points were removed to allow the fMRI signal to reach steady state. Raw rsfMRI images were first slice time corrected and realigned, and were subsequently unwarped to correct for susceptibility-by-movement interaction. Next, nuisance covariates including six motion parameters, cerebrospinal fluid and white matter signals were regressed out and each image was spatially normalized to the Montreal Neurological Institute (MNI) EPI template. Then, all images were linearly detrended and bandpass filtered (0.01–0.08 Hz) to eliminate the high-frequency physiological noise. Finally, and smoothed with a Gaussian kernel (full-width half maximum = 6mm).
Left and right hippocampus masks were derived from the Wake Forest University Pick atlas software, left and right hippocampal subregions including cornuammonis (CA)1-3, CA4-dentate gyrus (DG), subicular complex (SC)were auto-segment using SPMAnantomy Toolbox Version 2.024, 25 and applied to all preprocessed images after reslicing. The time courses averaged over all voxels of left /right hippocampus and six hippocampal subregions were extracted. Pearson correlation coefficients (r) between time courses of each amygdala and all other voxels were calculated and transformed to Fisher’s z scores to derive functional connectivity maps (FCM). One sample t-test was applied to test the connectivity of the right / left hippocampus within the healthy controls group (t>11 and cluster>500 voxels) (FCM of hippocampus in controls is shown in Supplementary Figure 4).
Statistical tests on the functional connectivity maps of hippocampus across groups were performed using analysis of covariance (ANCOVA) with sex, age and education years, and volume of hippocampus as covariates usingSPM8. The significant threshold were set at P<0.05, corrected for multiple comparisons based on Monte Carlo simulations. Subsequently, the mean Z value of each cluster with a significant FC difference was extracted and were compared by ANOVA followed by post hoc test in SPSS 18.0, across patients with RDVs, patients without RDVs and healthy controls((SPSS Inc., Chicago, IL), significant level of P values were set at less than 0.05.
Relationships Between neuroimaging characters of hippocampus, clinical and neurocognitive variables Partial correlation analysis was used to analyze the relationship between the aberrant neuroimaging characters of hippocampus (volume and functional connectivity) and PANSS scores / logical memory/ spatial working memory with age, sex and educations as covariance. A significant statistical correlation was set at p<0.05, uncorrected.
Supplementary Figure 1| Pipeline of sequencing analysis and alternative strategy for filtering rare variants
A. the pipeline describes the procedure for sequence data analysis. B. alternative strategy for filtering rare variants. All the variants after filtering must be nonsynonymous, very rare, predict damaging, and express in brain.