About NDAR

Vision | Implementation | Federation | Sponsoring Organizations | NDAR Team | NDAR News (2015, 2014, 2013, Older Items)


NDAR is an extensible, scalable informatics platform for ASD relevant data at all levels of biological and behavioral organization (molecules, genes, neural tissue, behavioral, social and environmental interactions) and for all data types (text, numeric, image, time series, etc.). NDAR was developed to share data across the entire ASD field and to facilitate collaboration across laboratories, as well as interconnectivity with other informatics platforms. Sharing data, associated tools, and methodologies, rather than just summaries or interpretations of them, can accelerate research progress by allowing re-analysis of data, as well as re-aggregation, integration, and rigorous comparison with other data, tools, and methods. This community-wide sharing requires common data definitions and standards, as well as comprehensive and coherent informatics approaches.

The NDAR team has developed and implemented several tools for data definition, standardization and validation in order to help researchers to adopt community data standards across all projects and research institutions. NDAR's Data Dictionary, developed after thorough analyses and input from the whole ASD research community, now comprises over 400 pre-defined data structures. NDAR's Data Dictionary Tool allows researchers to define their own data structures and operates with the NDAR's Validation Tool to ensure the data quality. NDAR requires only minimal adjustments to the way raw data is entered, and multiple web tutorials and demos are available for researchers willing to submit their data.


NDAR is a secure informatics platform for the ASD research community that encompasses the full range of data collected by investigators and combines technologies and policy regimes in order to:

  • define explicitly the nature of the data and how it was collected
  • allow re-aggregation and reanalysis of data
  • assure confidentiality of research subjects
  • promote scientific collaboration
  • provide a convenient schedule for the sharing of descriptive and experimental data
  • promote standardization and harmonization of informatics approaches used within NDAR and across the ASD research community

NDAR combines the function of a data repository, which holds genetic, phenotypic, clinical, and medical imaging data, and the function of a scientific community platform, which defines the standard tools and policies to integrate the computational resources developed by scientific research institutions, private foundations, and other federal and state agencies supporting ASD research. Furthermore, NDAR is working to develop the means to connect relevant repositories together through data federation.

Federation with Other Important Data Repositories

The concept of federated repositories enables data resident in NDAR to be connected with other major public or private autism databases located elsewhere. When such resources are federated, investigators are able to access data, tools, and information across all federated resources from a single point of entry.

The technical architecture of NDAR provides this linkage, regardless of their location or ownership, and in ways that respect the policies, authorization, and implementations of the particular institutions and data resources.

NDAR has or is in the process of federating with the following repositories:

  • Pediatric MRI Data Repository — stores rich phenotypic and imaging data from more than 500 typically developing children, from birth to young adulthood
  • The Autism Tissue Program — a fully funded science program of Autism Speaks that is committed to promoting high quality brain tissue acquisition, processing, stewardship, thorough supportive clinical data acquisition and distribution for research
  • The Autism Genetic Resource Exchange — an electronic data repository housing information from more than 1,000 families affected by ASD. AGRE was created by the advocacy group Cure Autism Now and is currently supported by Autism Speaks.
  • The Interactive Autism Network — an online project of the Kennedy Krieger Institute with funding from Autism Speaks, which contains data on 30,000 individuals and families with an ASD diagnosis who have voluntarily submitted information of interest to scientists.

If your research site is interested in federating with NDAR, please contact us at ndarhelp@mail.nih.gov and refer to SOP-06 Establishment of a Federated Data Resource for an introduction to the process for federation.

Sponsoring Organizations

NDAR is sponsored by the National Institutes of Health (NIH), the nation's medical research agency, and is supported by the following NIH Institutes and Centers:

NDAR supports the aims of the Interagency Autism Coordinating Committee (IACC), which coordinates all efforts within the agencies of the U.S. Department of Health and Human Services (HHS) concerning autism spectrum disorders (ASD).

If your research site is interested in federating with NDAR, please contact us at ndarhelp@mail.nih.gov and refer to SOP-06 Establishment of a Federated Data Resource for an introduction to the process for federation.


NDAR Director — Dr. Greg Farber, Director Office of Technology Development and Coordination
NDAR Manager — Mr. Dan Hall, NIMH
Principal Scientist — Dr. Svetlana Novikova, NIMH
Operations Manager — Mr. Brian Koser, NIMH
Principal Analyst, Outreach and Communication — Ms. Gretchen Navidi, NIMH


Recent news articles are also available from the NDAR News web feed.


APR 06, 2015 - JUL 31, 2015

Data Elements Needed to Define Ontological Concepts Now Required — As requested by those using NIMH data for secondary analysis, the NIMH has made over 2500 data elements used in defining ontological concepts as ?required" or "conditionally required" fields for projects contributing shared data to the NIMH Data Archive (NDAR, NDCT, and RDoCdb). These concepts (e.g. Verbal IQ, Expressive Lexicon, Form Perception, etc.), are defined by rules tied directly to values in the data and allow for querying by defining elements across structures using NDAR?s ?Query by Concept? tool. For cases where this data was not collected, a coding system for ?Missing/NA? is in place. Researchers can check their Data Dictionary structure definitions to see these updates and the Missing/NA coding. Contact us at NDAHelp@mail.nih.gov with any questions, comments on the Query by Concept tool, or to suggest a new concept or definition.

MAR 31, 2015 - APR 25, 2015

Analyzing Connectomes in the Cloud with the NIMH Data Archive - Webinar April 24 — In an addendum and encore to our webinar series on cloud computational techniques, Dr. Cameron Craddock presents on his work in neuroimaging pipelines at the Child Mind Institute and Nathan Kline Institute for Psychiatric Research, highlighting the opportunities presented by NIMH Data Archive cloud tools. April 24, 2:00-4:00 PM EDT Register Here to Attend

MAR 20, 2015 - APR 30, 2015

Notice Posted on Data Sharing Expectations for NIMH-Funded Clinical Research — On March 17 this notice was issued regarding the NIMH's expectations on data sharing for future clinical research. Take a look to learn the details of this exciting development.

MAR 17, 2015 - APR 30, 2015

NIH/NIMH Data Archive - March 2015 Newsletter — Topics covered in this issue of the the NIH/NIMH Data Archive Newsletter include: Concepts and semantic webs, upcoming webinars, the SSC data release, SRCD2015, and the ABCD study

FEB 01, 2015 - MAR 01, 2015

Release of SSC genomics dataset with data on unfiltered rare and common variants from WES.

We are announcing the release of sequencing and variation data resulting from the reanalysis of Whole Exome Sequences from subjects belonging to the Simons Simplex Collection (SSC).  Original data were contributed by a collaboration between NDAR Collections 1878 (Eichler Lab, University of Washington), 1936 (Wigler Lab, Cold Spring Harbor Laboratories), and 1985 (State Lab, UCSF).  Reanalysis of these data was done by members of the Eichler Lab, sequences were realigned to a common reference genome (human_g1k_v37) and analyzed for possible genomic variants (SNVs, InDels, and CNVs).

The resulting dataset on 2415 SSC families (9047 individual subjects) now available in NDAR, includes:
  1. realigned BAM files -  NDAR Study 334 (http://ndar.nih.gov/study.html?id=334);
  2. unfiltered SNV/InDel variant calls made using GATK with and without annotations - NDAR Study 348 (http://ndar.nih.gov/study.html?id=348);
  3. unfiltered SNV/InDel variant calls made using FreeBayes  with and without annotations - NDAR Study 349 (http://ndar.nih.gov/study.html?id=349);
  4. CNV variant calls made using XHMM and CoNIFER - NDAR Study 361 (http://ndar.nih.gov/study.html?id=361).
The entire dataset is also available in NDAR Collection 2042 (https://ndar.nih.gov/edit_collection.html?id=2042)

JAN 01, 2015 - FEB 28, 2015

OmicSearch, Imaging Pipelines and Computational Science Webinars Announced

As an encore to our SfN Workshop, we will be hosting webinars on leveraging the NIH/NIMH Data Repositories' capabilities to advance scientific discovery.

Foundations of Data Exploration and Repeatable Workflows in the Cloud Freesurfer, deployed using the NITRC Computational Environment, and ANT pipelines have been used across a similar cohort of images shared in the NIH/NIMH Data Repositories.  Learn how the data was processed and the software and environment are shared and made available for future collaboration. Data Repositories staff will present on the use of the miNDAR database available for data exploration, results reporting and the issuance of a Repositories Digital Object Identifier. January 21, 2015 at 2:00 PM (EST)

Register Here

Demonstration of Omics Query and Computation in the Cloud Reanalysis of over 2,000 families has been performed in the cloud using raw data available and previously shared in NDAR. Sequence alignment files (BAM), unfiltered and annotated variant calls for SNVs and Indels (VCF), and the detection of Copy Number Variants has resulted in over 500,000,000 records from unfiltered variant data (~10,000 subjects x ~ 50,000 variants per exome) that will soon be available for query. Learn how the cloud-based genomics pipeline that produced these results was used, how it can be extended for use in your environment through NDARs public GitHub repository, and the opportunities for community collaboration related to these and similar results using the almost unlimited resources now available in the cloud or your academic environment. January 28, 2015 at 2:00 PM (EST)

Register Here

Cloud Computational Approaches: Priming the Semantic Web 500,000,000 omic alterations from 10,000 sequences, 3,000,000 regions of interest from 1,000 structural images and 1,000,000 ontological concepts from over 80,000 subjects are now shared. Computational techniques using semantic web technologies against these data are now possible.  While the Repositories offer no specific tools, these results are available in an RDF-like format for use.  The data are sparse.  However, with the inclusion of data from the RDoC initiative, Repository data may now be combined with other datasets using these techniques for data exploration and hypothesis generation.  Learn how these data are being made available and can be combined with other available datasets for those interested in this emerging area of scientific discovery. February 12, 2015 at 3:00 PM (EST)

Register Here

Register for all three sessions to learn how to use NIH/NIMH Data Repositories and cloud computation resources to take advantage of scientific opportunities that previously could not have been considered.


DEC 18, 2014 - JAN 31, 2015

NIH/NIMH Data Repositories - December 2014 Newsletter — Topics covered in this issue of the the NIH/NIMH Data Repositories Newsletter include: Webinar Series | Winter Submission | New Shared Data

DEC 18, 2014 - JAN 31, 2015

NIH/NIMH Data Repositories - November 2014 Newsletter — Topics covered in this issue of the the NIH/NIMH Data Repositories Newsletter include: New Features | RDoC Projects | New Omics Policy | Future Webinars

DEC 17, 2014 - JAN 15, 2015

Data Sharing Among Top Mental Health Developments of 2014 — In NIMH Director Dr. Tom Insel's latest blog post, he names new efforts to increase data sharing and reproducability, including RDoCdb & NDCT, as number 7 on his list of the top ten developments of 2014 in mental health.

NOV 17, 2014 - NOV 30, 2015

The P-Hacking Problem Potentially Solved by Using Study Functionality — In his recent blog, NIMH Director Tom Insel, M.D., discusses the issue of P-hacking offering a potential and partial solution. The Study functionality available through NDAR, NDCT and RDoCdb provides a way for researchers to expose the underlying data and describe methods/analyses in detail presenting a potential solution to reduce P-hacking.

NOV 17, 2014 - NOV 30, 2015

NDAR mentioned in November 2014 Issue of Nature — NDAR mentioned in Nature article as a successful, well-populated repository.

SEP 24, 2014 - SEP 25, 2014

NIMH Strategic Plan, Including the Implementation of RDoCdb and NDCT Mentioned in the 2014 Autumn Edition of Inside NIMH — NIMH Director, Tom Insel, M.D., discusses data-sharing expansion efforts of the NIMH Data Repositories in the 2014 Autumn edition of Inside NIMH.

SEP 24, 2014 - SEP 25, 2015

NIMH Director's Blog: From My Data to Mined Data — NIMH Director, Tom Insel, M.D., discusses genomic data sharing policies of the NIMH Data Repositories in the latest edition of the NIMH Director's Blog.

SEP 16, 2014 - SEP 25, 2014

New Grants Fund Cross-Lifespan Services Research for ASD — NIH-funded projects aim at improving access and timeliness of interventions are expected to submit data to NDAR.

AUG 29, 2014 - DEC 11, 2014

SfN Workshop Announced - Big Data Opportunities Using NIH/NIMH Data Repositories

Symposia Meeting

Date & Time: Thursday, November 13, 2014 9:30am - 4pm
Location: Carlyle Crescent, Alexandria, VA, 1940 Duke St., 2nd Floor
Sponsor Category: University/Non-Profit
Sponsored By: NIMH

Organizer/Moderator: Svetlana I Novikova, PhD

Registration: http://fs30.formsite.com/NDAR/BigDataOpportunities/index.html

The NIH established the National Database for Autism Research (NDAR) in 2008. Since then, de-identified human subjects research data on 77,000 research participants across hundreds of cognitive, diagnostic and clinical measures are now shared. Secured in the Amazon cloud, large datasets from over 8,000 exome sequences, 2,000 structural images exists (see NDAR Query) and event based fMRI, EEG, and eye tracking experiments make NDAR one of the largest data repositories in the neurosciences. Recently, the NIMH Research Domain Criteria (RDoC) and Clinical Trials (NDCT) initiatives have adopted the same data sharing platform. Together, these platforms provide new and emerging opportunities that previously could not have been considered. This symposium will help educate the neuroscience community on the best practices used to drive scientific discovery using these resources. Further, NIH program officers will present on how such big data scientific efforts are now being funded.


Svetlana Novikova, PhD, NDAR
Email: novikovas@mail.nih.gov
Phone: 301.443.0212
Web: ndar.nih.gov

Dan Hall, MS, NIMH
Email: dan.hall@nih.gov
Phone: 301.467.0823
Web: ndar.nih.gov

MAY 12, 2014 - SEP 25, 2014

Dr. Farber discusses the NIH BRAIN Initiative with NBC4 Washington — Dr. Gregory K. Farber speaks with NBC4 Washington's Doreen Gentzler about the NIH BRAIN Initiative aimed at expanding our understanding of the human brain.

MAY 01, 2014 - SEP 25, 2014

NIMH to use NDAR model for Research Domain Criteria Database (RDoCdb) — The National Institute of Mental Health (NIMH) is looking to model the RDoCdb data sharing infrastructure similar to NDAR. In an article published in the American Journal of Psychiatry, Thomas R. Insel, M.D., NIMH Director, explains the RDoCdb approach to classifying mental disorders and how NDAR has provided a useful example on what is needed.

APR 16, 2014 - MAY 16, 2014

NDAR Mentioned in UC San Diego Press Release — NDAR was recently mentioned in a UC San Diego press release relating to the usage of splice variants to uncover associations among autism genes.

APR 11, 2014 - APR 28, 2014

Join us for Query by Concept Webinar in April — 'NDAR has collaborated with a team at Harvard Medical School, lead by Alexa McCray to implement a search capability based on the phenotype ontology defined in the Modeling the Autism Spectrum Disorder Phenotype paper. Join us for a discussion on how to use and extend this useful method querying autism data.


FEB 05, 2014 - APR 07, 2014

EEG, fMRI, Eye Tracking Experiment Definition Released — NDAR has defined the experimental parameters for event/evoked response data making these data more useful for sharing. Beginning in July, these data are will be expected using this definition. Register to attend one of our free webinars which will review the preparation of this neuro-signal recording data for submission to NDAR.

FEB 05, 2014 - APR 07, 2014

Enhanced Query Capabilities — NDAR Query has been extended which allows users query and download data based upon any combination of data elements defined in the autism data dictionary. For more information and to watch an overview of NDAR¿s query capabilities and query computational results, register for one of our free webinars. We will also cover the process to request access to NDAR shared data. Use the link for your selected session to register.


JUN 20, 2013 - JAN 27, 2014

NIMH Director's Blog on Open Data — Please visit Dr. Tom Insel's blog for his latest post on open data.

JUN 13, 2013 - FEB 05, 2014

NDAR Cloud Computation Capability — Rich datasets (e.g., FASTQ and brain imaging) are stored and protected in object-based storage (Amazon S3) enabling parallel data download through NDAR's Download Manager. NDAR now supports the creation of MySQL databases in the Amazon Cloud which will be hosted by NDAR for 15 days, or longer as needed. These databases, called miNDARS (miniature NDARs), will contain a table for each data structure in a package. Files are granted read-only access to NDAR's S3 objects, and the reference to those objects are provided within the tables that have associated files (e.g., image03 and genomics_sample03). By providing these databases, NDAR envisions real-time computation against rich datasets that can be initiated without the need to move the objects. A new data structure category, evaluated data, has also been created. Tables for these structures will be created for each miNDAR, allowing computational pipelines to write any analyzed data back to the miNDAR database allowing NDAR to make this data available - when appropriate - to the general research community.

Following this approach, NDAR is moving from a "store once, download many" approach to an architecture where computation is moved and performed in place.

Older Items

Older items can be found on the News Archive page.