Loading...

Login
Reset Password

NDAR provides a single access to de-identified autism research data. For permission to download data, you will need an NDAR account with approved access to NDAR or a connected repository (AGRE, IAN, or the ATP). For NDAR access, you need to be a research investigator sponsored by an NIH recognized institution with federal wide assurance. See Request Access for more information.

Warning Notice

This is a U.S. Government computer system, which may be accessed and used only for authorized Government business by authorized personnel. Unauthorized access or use of this computer system may subject violators to criminal, civil, and/or administrative action. All information on this computer system may be intercepted, recorded, read, copied, and disclosed by and to authorized personnel for official purposes, including criminal investigations. Such information includes sensitive data encrypted to comply with confidentiality and privacy requirements. Access or use of this computer system by any person, whether authorized or unauthorized, constitutes consent to these terms. There is no right of privacy in this system.

You have logged in with a temporary password. Please update your password. Passwords must contain 8 or more characters and must contain at least 3 of the following types of characters:

Subscribe to our mailing list

Mailing List(s)
Email Format

You are now leaving the National Database for Autism Research (NDAR) web site to go to:

Click on the address above if the page does not change within 10 seconds.

Disclaimer

NDAR is not responsible for the content of this external site and does not monitor other web sites for accuracy.

Switch User

Harmonization Standards

Clinical/Phenotypic | Neurosignal Recordings | Omics |  GUID | Resolve Identifiers | Data Definition | Validation

Information on what is accepted and methods for harmonizing data to the NIMH Data Archive (NDA) standard can be found here, listed by data type.

 

Clinical/Phenotypic Data Standards

NDA supports an unlimited amount of clinical, demographic, and phenotypic data associated with human subjects research (see Data Dictionary). To ensure the harmonization of data across projects, all data submitted to NDA must conform to a standardized data structure as defined in the Data Dictionary. See the steps to data sharing for more information. Researchers should locate structures that can be used to harmonize all their assessments, and are encouraged to extend these by providing NDA with the information needed to define any new assessments. Projects submitting data to the NDA can provide new assessments as part of creating their Data Expected list in a format similar to this example when starting up. Others interested in defining a new assessment can simply send new definitions to the NDA Help Desk in that format. It is helpful to also provide an electronic copy of the assessment with any instructions and supporting documentation such as codebooks. NDA staff will then curate the definition and make it available to the research community for data submission. As part of the NDA Harmonization Standard for clinical/phenotypic data, all projects are expected to provide the structure Research Subject, found here, unless otherwise noted.

Once a structure is defined, the standard for submission of this type of data is to harmonize it to the standard structure as a CSV file with one participant record per row, and upload it using the Validation and Upload Tool.

^ top of page

Neuro-signal Recordings

NDA accepts evoked response/event based data from EEG, fMRI, eye tracking, MEG, and EGG experiments. Each of these types of data has one standard structure in the Data Dictionary that can be used to upload the associated files. These are:

These structures allow you to provide required information and contain a data file element that allows you to specify the path of associated data files for upload. To provide information on experimental parameters used to collect the data (e.g. event/task descriptions, acquisition hardware, postprocessing, etc.) on a participant-record level, these data types also require you to define an Experiment. This is done in your project's Collection, and when the Experiment definition is completed it will be assigned an ID number that must be provided in the "experiment_id" element in the appropriate structure. You can find a tutorial on using the definition tool here. When successfully harmonized, this data will include the CSV submission structure specifying the location of associated data files, and an associated Experiment in your Collection. The CSV and your associated data files are then uploaded using the Validation and Upload Tool. Additionally, the NDA provides imaging QA for submitted imaging files using the FSL Fast/First computational pipelines. These quality assurance results are now available to authorized users for query and download under 'Evaluated Data' in the Data Dictionary.

^ top of page

Omics Definition

To submit omics data to the NDA, you will first need to use the Experiments tab of your project's Collection to create an Experiment that defines parameters like molecule, platform, software, etc.  You can find a tutorial on using the definition tool here. Once the experiment is created, it will be assigned an ID number. You can then use the standard omics definition (genomics_sample structure) to provide required sample information, and enter the Experiment ID in the field of the same name. This CSV file includes a field for specifying the path of the associated omics data files you will upload. Once your Experiment is defined, your CSV populated, and files specified, you can upload this data using the Validation and Upload Tool. Please note Contact us at the NDA Help Desk for more information.

Please note the following omics standards that may differ from other NDA data types:

  • NDA as a standard accepts only omics data related to the study of ASD or related disorders into the National Database for Autism Research. Projects collecting omics data for other research and submitting phenotypic, imaging, or neurosignal recordings data to NDCT or RDoCdb should share their omics data through another repository; exceptions to this are considered only on a case by case basis. 
  • Projects submitting omics data should use the summary data structure Genomics Subject, rather than the standard Research Subject structure mentioned in the section on clinical/phenotypic data.

^ top of page

NDA GUID

GUID Training

The NDA GUID is a universal subject ID that allows researchers to share data specific to a study participant without exposing personally identifiable information (PII) and makes it possible to match participants across labs and research data repositories. The NDA GUID is the subject ID standard developed for autism research and now adopted across mental health. Every data structure in the NDA Data Dictionary includes this identifier (labelled as the element subjectkey). Additionally, the GUID is used by researchers publishing the results of primary or secondary analyses on data shared throuhh NDA to associate subjects to cohorts in an NDA Study. This allows a researcher to link publications directly to raw/analyzed data in NDA (see NDA Study).

The tool itself is a GUI or command line Java webstart application that you can launch directly from the NDA website. It supports single subject data entry or bulk GUID generation. Email us at NDAHelp@mail.nih.gov for information on the command line tool.

To create a GUID requires an individual's legal name at birth, date of birth, sex, and city/municipality of birth. Because information on the birth certificate is constant over an individual's life, it is very important to include the information as it appears on the birth certificate. Otherwise, a subject mismatch will occur if the research subject enrolls in other autism research studies and another source is used. When generating GUIDs for twin subjects, the Get GUIDs for Multiple Subjects function must be used as described below in order to prevent a false positive match.

If you are submitting data to NDA, you can check the box to request access to the GUID Tool when creating your account. Please contact us if you already have an account and need access to the GUID Tool.

You can find more information about this feature on the NDA GUID website.

^ top of page

Resolve Subject Identifiers

The mental health research community has standardized on the NDA GUID for cross-project subject identifiers. However, many other identifiers do remain in use by the research community. To resolve the appropriate GUID/subjectkey and ensure that no duplicate subjects exist in data retrieved from NDA, use the Resolve Subject Identifiers interface (single or multiple entries using the csv template).

If a match is not found, NDA has not yet received that subject identifier. To add subject identifier associations to NDA, include the subject identifier and submit to NDA using one of our Resolve Identifiers data structures (e.g. ndar_subject, genomics_subject). We will then resolve the identifiers for you within NDA allowing us to collectively fix duplicate subject identifiers that are used in different systems. Note that NDA PseudoGUID promotions are automatically applied to the repository. If a source exists that we currently don't support, please contact us at the NDA Help Desk so that we can add it.

^ top of page

Data Definition

NDA has worked with the mental health research community to create a Data Dictionary containing standard  structures for hundreds of assessments.

Here are a few notes about the data dictionary:

  • You can browse structures by Type (e.g. Omics, Neurosignal Recordings, Clinical Assessments), Source (e.g. NDAR, PediatricMRI, AGRE, NDCT) or Category (e.g. Behavior, IQ) to identify available data structures, as well as refine the results displayed with a text search.
  • By clicking on the name of a data structure, you can view a list of data elements and their attributes, download the detailed definition and a blank template for the submission of data, and see any related URLs.
  • IQ descriptions have been removed by request but are available if needed.
  • The column "Submission" indicates whether NDA currently accepts data uploads of this structure. If Submission is "Not Allowed" a more current measure usually exists.
  • There is a link to a Change History on each structure page that shows all the changes to the structure made within the last six months.
  • The Alias column in the structure displays other names that the Validation and Upload Tool will recognize for a specific data element.
  • NDA supports translation for data values, allowing data to be converted from a lab-specific value to the NDA-recognized value (e.g. Male to M). While this may be helpful to labs that have already collected data using a different set of values, most labs that are not using the standard should consider performing this conversion prior to submission.
  • A web service into our Data Dictionary is available with no authentication required. It is available at https://ndar.nih.gov/api/datadictionary/ and please contact us with any questions.
  • For Autism Centers of Excellence II grantees, the ACE Common Measures Version 2 are used across projects, replacing the original ACE Common Measures, which have been deprecated.
  • Research Subject: This is a general summary structure that allows you to provide one record for each participant indicating the NDAR clinical diagnosis. If a clinical diagnosis is not provided elsewhere, which is typical for control subjects, a diagnosis can be provided using this measure. Additionally, this data structure is used to provide subject identifiers beyond the NDA GUID (e.g. AGRE, Rutgers, SFARI), allowing us to match subjects across repositories (see Resolve Identifiers). The same structure is also used by all projects collecting data not related to ASD.
  • Genomics Subject: This structure is required for Omics submission to NDAR in place of Research Subject. It is similar to Research Subject, but includes other data-/bio-repository identifiers that also allow us to match subjects across these repositories (see Resolve Identifiers).
  • Genetic Test: This structure allows you to specify a genetic test used and its result. It should currently include options for all known genetic tests. Please contact us at the NDA Help Desk if a new test is not represented in this form.

^ top of page

Data Validation

NDA requires all data to be successfully validated prior to submission. This means that researchers submitting data must use the Validation and Upload Tool, available freely from NDA websites, to check their files against the standards defined in the Data Dictionary structure. A recent version of Java is needed to run the tool (see http://java.com/en/download/manual.jsp). Essentially, the tool allows you to specify where your data is located or drag and drop your files in, and will then inspect your files to determine which data structures they use (short_name and version are used by NDA to identify the data structure), and verify the data in your records conforms to the definition. When your data passes validation, you can then use the same tool to create and upload a submission package recognized by NDA.

A few notes about the details of using the Validation and Upload Tool:

  • Fields marked as Required in the data structure must be a column in your data and cannot be null/blank. If you do not have the data for that element, you must positively identify that the data is not available as defined by the valid values. If the valid values do not provide such an entry, contact us and we will add it.
  • Empty Recommended fields (as indicated in the same "Required" column), will not prevent submission. Please note that all item-level details are expected. The Validation and Upload Tool provides a warning if data for a recommended field is null and no warning if data for an optional field is null.
  • The Validation and Upload Tool will test each row to ensure that it is harmonized to the Data Dictionary and will validate that the GUID/subjectkey exists within NDA. Each field must conform to the data element's value range, if one has been defined.
  • The notation of "::" is used to indicate a range. For instance, 0::1200 for interview_age means within a range of 0 to 1200 months old.
  • Associated files (e.g. genomic and imaging files) do not need to be loaded into the tool. When the file is validated upon loading, or you re-run the validation, the tool will check for the existence of the file in the specified location.
  • The NDA is currently piloting programmatic data submission through web services. If you are interested in joining this pilot, please contact the NDA Help Desk.

For information on how to prepare your data for submission to NDA, please review the data submission tutorials here.

^ top of page