Loading...

Reset Password

NDAR provides a single access to de-identified autism research data. For permission to download data, you will need an NDAR account with approved access to NDAR or a connected repository (AGRE, IAN, or the ATP). For NDAR access, you need to be a research investigator sponsored by an NIH recognized institution with federal wide assurance. See Request Access for more information.

Warning Notice

This is a U.S. Government computer system, which may be accessed and used only for authorized Government business by authorized personnel. Unauthorized access or use of this computer system may subject violators to criminal, civil, and/or administrative action.

All information on this computer system may be intercepted, recorded, read, copied, and disclosed by and to authorized personnel for official purposes, including criminal investigations. Such information includes sensitive data encrypted to comply with confidentiality and privacy requirements. Access or use of this computer system by any person, whether authorized or unauthorized, constitutes consent to these terms. There is no right of privacy in this system.

You have logged in with a temporary password. Please update your password. Passwords must contain 8 or more characters and must contain at least 3 of the following types of characters:

Subscribe to our mailing list

Mailing List(s)
Email Format

You are now leaving the National Database for Autism Research (NDAR) web site to go to:

Click on the address above if the page does not change within 10 seconds.

Disclaimer

NDAR is not responsible for the content of this external site and does not monitor other web sites for accuracy.

Selected Filters
No filters selected

The filters you have selected from various query interfaces will be stored here, in the 'Filter Cart'. The database will be queried using filters added to your 'Filter Cart', when multiple filters are defined, each will be executed using 'AND' logic, so with each filter that is applied the result set gets smaller.

From the 'Filter Cart' you can inspect each of the filters that have been defined, and you also have the option to remove filters. The 'Filter Cart' itself will display the number of filters applied along with the number of subjects that are identified by the combination of those filters. For example a GUID filter with two subjects, followed by a GUID filter for just one of those subjects would return only data for the subject that is in both GUID filters.

If you have a question about the filter cart, or underlying filters please contact the help desk at The NDA Help Desk

Description
Value Range
Notes
Data Structures with shared data
No filters have been selected

1 Numbers reported are subjects by age
New Trial
New Project

Format should be in the following format: Activity Code, Institute Abbreviation, and Serial Number. Grant Type, Support Year, and Suffix should be excluded. For example, grant 1R01MH123456-01A1 should be entered R01MH123456

Please select an experiment type below

New Documentation

Please enter the name of the data structure to search or if your definition does not exist, please upload that definition so that it can be appropriately defined for submission. Multiple data structures may be associated with a single Data Expected entry. Please add only one data structure per assessment.

Please provide a reason for the requested submission exemption and the
time-frame during which the exemption will be active.
Shared

Collection Owners and those with Collection Administrator permission, may edit a collection. The following is currently available for Edit on this page:

General

Title, investigators, and Collection Description may be edited along with the Collection Phase. For Collection Phase, the options Pre-enrollment, Enrollment, and Completed can be chosen allowing the Collection Owner to indicate the stage of data collection.

Funding Source

The ability to associate the funding source for the project is provided. For NIH funded grants, linkage to Project Reporter information (e.g. R01MH123456) is supported. Projects funded by others, including the URL of the project, are listed. Non NIH funded projects will become available here to link that data with the appropriate funding agency.

Supporting Documentation

Any documents related to the project may be uploaded clarifying the data or acquisition methods used may be uploaded and made available here. The default is to share these documents to the general public. An option to share only to qualified Researchers is also an option.

Clinical Trials

For clinical trials, the option to link to the clinical trial in clinicaltrials.gov is optionally provided.

Collection Summary Collection Charts
Collection Title Collection Investigators Collection Description
SSC total recall project
Eichler, Evan 
This collection consists of sequencing and variation data resulting from the reanalysis of Whole Exome Sequences from 9047 individual subjects belonging to the Simons Simplex Collection (SSC). Original data were contributed by a collaboration between NDAR Collections 1878 (Eichler Lab, University of Washington), 1936 (Wigler Lab, Cold Spring Harbor Laboratories), and 1985 (State Lab, UCSF). Reanalysis of this data was done by members of the Eichler Lab, sequences were realigned to a common reference genome (human_g1k_v37) and analyzed for possible genomic variants (SNVs, InDels, and CNVs). Details on the analysis/methods can be found in the following individual NDAR Studies: 1)realigned BAM files - NDAR Study 334 (http://ndar.nih.gov/study.html?id=334); 2)unfiltered SNV/InDel variant calls made using GATK with and without annotations - NDAR Study 348 (http://ndar.nih.gov/study.html?id=348); 3)unfiltered SNV/InDel variant calls made using FreeBayes with and without annotations - NDAR Study 349 (http://ndar.nih.gov/study.html?id=349); 4)CNV variant calls made using XHMM and CoNIFER - NDAR Study 361 (http://ndar.nih.gov/study.html?id=361).
NDAR
Closed
Shared
$0.00
9,047
0
0

No Data Shared

Loading...

Loading...

No Data Shared

Loading...

Chart Expander
NIH - Contract None



Collection Owners and those with Collection Administrator permission, may edit a collection. The following is currently available for Edit on this page:

Experiments

To create a new Omics, eye tracking, fMRI, or EEG experiment, press the "+ New Experiment" button. Once an experiment is created, then raw files for these types of experiments should be provided, associating the experiment – through Experiment_ID – with the metadata defined in the experiments interface.

IDNameCreated DateStatusType
No records found.

Collection Owners and those with Collection Administrator permission, may edit a collection. The following is currently available for Edit on this page:

Shared Data

Data structures with the number of subjects submitted and shared are provided.

Genomics Sample Genomics 10060
Genomics Subject Genomics 9047
NGS QA Genomics 9047

Collection Owners and those with Collection Administrator permission, may edit a collection. The following is currently available for Edit on this page:

Publications

Publications relevant to NDAR data are listed below. Most displayed publications have been associated with the grant within Pubmed. Use the "+ New Publication" button to add new publications. Publications relevant/not relevant to data expected are categorized. Relevant publications are then linked to the underlying data by selecting the Create Study link. Study provides the ability to define cohorts, assign subjects, define outcome measures and lists the study type, data analysis and results. Analyzed data and results are expected in this way.

PubMed IDStudyTitleJournalAuthorsDateStatus
No records found.

This tab provides a general status on the data expected to be shared. There are two types of data expected.

  1. By Relevant publications — Those publications that reported for the collection's grant and have a status of "relevant" for sharing are listed first. The grantee is expected to share the data specific to those publications using the NDA Study feature. If a publication is erroneously marked relevant, the PI should simply change the status. When sharing a study, only the outcome measures for the subjects/time-points are shared. Other data that have not met the share date, defined below, will remain embargoed. To initiate study creation, simply login, mark your publication as relevant and click on the link listed to begin.

  2. By Data Structure — The number of subjects expected, received and shared is provided. Investigators are expected to update the data that they are collecting, the initial submission date and initial share dates. The NIMH Data Archive shares data when those dates are met.

  3. Submission Exemption — Those with Administrative or Submission Access to the Collection may request an exemption for submission for a defined period by stating the reason and timeframe. Note that the program officer on the grant may review this request.


Relevant Publications
PubMed IDStudyTitleJournalAuthorsDate
No records found.

For those with privileges to edit the collection, it is possible to upload your data definitions using this interface. NDA support staff will then follow up with a harmonized data definition for you to use in providing additional data.

Data Expected
Data ExpectedTargeted EnrollmentInitial SubmissionSubjects SharedStatus
Data QA info iconApproved
genomics/omics info iconApproved
Structure not yet defined

Collection Owners and those with Collection Administrator permission, may edit a collection. The following is currently available for Edit on this page:

Associated Studies

Studies that have been defined using data from a Collection are important criteria to determine the value of data shared. The number of subjects column displays the counts from this Collection that are included in a Study, out of the total number of subjects in that study. The Data Use column represents whether or not the study is a primary analysis of the data or a secondary analysis. State indicates whether the study is private or shared with the research community.

Study Name Description Number of Subjects
Collection / Total
Data Use State
Copy Number Variants from SSC Collection ~ 2500 families by two Methods (XHMM and Conifer) XHMM was run on a set of realigned BAM files from the SSC collection (see NDAR Study 334 for BAM files) using the attached scripts. These scripts calculate depth of coverage using GATK, pull the GATK output from an instance on NDAR's cloud, merge the output of GATK into a single matrix, process the read depth matrix (filter, center), normalize the matrix using principal component analysis (PCA), process the normalized read depth matrix (filter, z-score), run a hidden markov model (HMM) on this matrix to identify CNVs in the normalized data, and generate family level vcfs from the xhmm data. XHMM produces as output coverage summary tables produced by GATK (sample_interval_statistics, sample_interval_summary, sample_summary, sample_statistics), principal component data files, a genotyped CNV output VCF file, and some example plots and graphics. For this study, the GATK output is available. Additional information about XHMM is available here: http://atgu.mgh.harvard.edu/xhmm/tutorial.shtml 9041 / 9041 Secondary Analysis Shared
Insights into Autism Spectrum Disorder Genomic Architecture and Biology from 71 Risk Loci Analysis of de novo CNVs (dnCNVs) from the full Simons Simplex Collection (SSC) (N = 2,591 families) replicates prior findings of strong association with autism spectrum disorders (ASDs) and confirms six risk loci (1q21.1, 3q29, 7q11.23, 16p11.2, 15q11.2-13, and 22q11.2). The addition of published CNV data from the Autism Genome Project (AGP) and exome sequencing data from the SSC and the Autism Sequencing Consortium (ASC) shows that genes within small de novo deletions, but not within large dnCNVs, significantly overlap the high-effect risk genes identified by sequencing. Alternatively, large dnCNVs are found likely to contain multiple modest-effect risk genes. Overall, we find strong evidence that de novo mutations are associated with ASD apart from the risk for intellectual disability. Extending the transmission and de novo association test (TADA) to include small de novo deletions reveals 71 ASD risk loci, including 6 CNV regions (noted above) and 65 risk genes (FDR ≤ 0.1). 8190 / 9975 Secondary Analysis Shared
Variant Recalling (GATK) from Whole Exome Sequencing data for 2415 families in SSC Collection Whole Exome Sequencing has been completed for ~ 2500 families from the Simons Simplex Collection. Sequencing was performed at three individual sequencing centers with original data submitted to NDAR Collections 1878, 1895, and 1936; subsets of these data have been analyzed by various methods and published. This study represents an effort to call and annotate SNPs and Indels on data from all three collections in a uniform manner using the latest toolchains and algorithms available. Variant calls from this study were generated using GATK, Famseq, and some custom scripts; annotation was provided by SnpEff, dbNSFP, and vcftools. Note that variants were called in batches with ~ 20 families per batch. Complete methods, including source code for pipeline and custom scripts can be found at: https://github.com/nkrumm/asd-jre-public The data package for this study represents the genomics_subject02, genomics_sample03 structures which include annotated and un-annotated VCF files for each family. Another NDAR Study (349) is available with VCF files generated using FreeBayes (https://ndar.nih.gov/study.html?id=349), and the complete set of BAM files used for variant calling are available in NDAR Study 334 (https://ndar.nih.gov/study.html?id=334) 8976 / 8976 Secondary Analysis Shared
Variant Recalling (FreeBayes) from Whole Exome Sequencing data for 2415 families in SSC Collection Whole Exome Sequencing has been completed for ~ 2500 families from the Simons Simplex Collection. Sequencing was performed at three individual sequencing centers with original data submitted to NDAR Collections 1878, 1895, and 1936; subsets of these data have been analyzed by various methods and published. This study represents an effort to call and annotate SNPs and Indels on data from all three collections in a uniform manner using the latest toolchains and algorithms available. Variant calls from this study were generated using FreeBayes, Famseq, and some custom scripts; annotation was provided by SnpEff, dbNSFP, and vcftools. Note that variants were called in batches with ~ 20 families per batch. Complete methods, including source code for pipeline and custom scripts can be found at: https://github.com/nkrumm/asd-jre-public The data package for this study includes the genomics_sample02, genomics_sample03 structures with annotated and un-annotated VCF files for each family. Another NDAR Study (348) is available with VCF files generated using GATK (https://ndar.nih.gov/study.html?id=348), and the complete set of BAM files used for variant calling are available in NDAR Study 334 (https://ndar.nih.gov/study.html?id=334) 8976 / 8976 Secondary Analysis Shared
The contribution of mosaic variants to autism spectrum disorder De novo mutation is highly implicated in autism spectrum disorder (ASD). However, the contribution of post-zygotic mutation to ASD is poorly characterized. We performed both exome sequencing of paired samples and analysis of de novo variants from whole-exome sequencing of 2,388 families. While we find little evidence for tissue-specific mosaic mutation, multi-tissue post-zygotic mutation (i.e. mosaicism) is frequent, with detectable mosaic variation comprising 5.4% of all de novo mutations. We identify three mosaic missense and likely-gene disrupting mutations in genes previously implicated in ASD (KMT2C, NCKAP1, and MYH10) in probands but none in siblings. We find a strong ascertainment bias for mosaic mutations in probands relative to their unaffected siblings (p = 0.003). We build a model of de novo variation incorporating mosaic variants and errors in classification of mosaic status and from this model we estimate that 33% of mosaic mutations in probands contribute to 5.1% of simplex ASD diagnoses (95% credible interval 1.3% to 8.9%). Our results indicate a contributory role for multi-tissue mosaic mutation in some individuals with an ASD diagnosis. 9047 / 9047 Secondary Analysis Shared
Complete Realignment of Whole Exome Sequencing data from 2415 families in SSC Collection Whole Exome Sequencing has been completed for ~ 2500 families from the Simons Simplex Collection. Sequencing was performed at three individual sequencing centers with original data submitted to NDAR Collections 1878, 1895, and 1936; subsets of these data have been analyzed by various methods and published. This study represents an effort to realign sequencing data from all three collection sin a uniform manner using the latest toolchains and algorithms available, which can be used as a resource for the entire ASD Community. Original sequence data has been realigned to a single reference genome (1000 Genomes / GRCh37) using BWA, Picardtools, Samtools, and some custom python scripts. QC summary data were generated as part of the realignment process using the aforementioned tools in addition to QPLOT and some custom scripts. Complete methods, including source code for pipeline and custom scripts can be found at: https://github.com/nkrumm/asd-jre-public. The data package for this study represents the genomics_subject02, genomics_sample03, and omics_qa01 data structures which include realigned BAM files and QC files (i.e., QPLOT output and BAM header files). Variant calling an annotation for these data are provided in NDAR Studies 348 (https://ndar.nih.gov/study.html?id=348) and 349 (https://ndar.nih.gov/study.html?id=349). 9047 / 9047 Secondary Analysis Shared
Excess of rare inherited truncating mutations in autism In order to quantify the effect of private, inherited mutations on autism risk, we generated a callset of both inherited and de novo single nucleotide variants (SNVs) and copy number variants (CNVs) across 2,377 Simons Simplex Collection families. The publically deposited dataset includes 1,786 parents-child-unaffected sibling "quads" allowing us to compare burden of inherited and de novo mutations between affected and unaffected siblings in simplex autism families. We find that private, inherited truncating SNV mutations in conserved genes are significantly enriched in probands (odds ratio = 1.14, p = 0.0002) and more likely to be transmitted to children with autism when compared to their unaffected siblings (p < 0.0001). We find that this effect becomes more pronounced with increasing gene conservation (Residual Variation Intolerance Score, RVIS). Likewise, we observe a similar bias for inherited CNVs specifically for small (<100 kbp), maternally inherited events (p = 9.6x10^-3) that are enriched in CHD8 target genes (OR = 3.6, p = 2.0x10^-3). We quantified autism spectrum disorder (ASD) risk for de novo and inherited CNVs and SNVs by using a conditional logistic regression model. Independent from de novo mutations, private truncating SNVs and rare, inherited CNVs contribute an increase in risk with an odds ratio 1.11 (p = 0.0002) and 1.23 (p = 0.01), respectively. Our results indicate a statistically independent role for inherited mutations in ASD risk and identify additional high-impact risk candidate genes (e.g., RIMS1, CUL7, LZTR1 and CC2D2A) where transmitted mutations may create a sensitized background for autism but are unlikely to be necessary and sufficient for the disorder. 8917 / 8917 Secondary Analysis Shared
* Data not on individual level