The NIH offers bioinformatics and data management tools for the autism research community to facilitate data sharing and scientific collaboration.
Although data may exist in NDAR, it also exists in the research labs acquiring the data as well as a myriad number of other data repositories. To achieve its vision, it became imperative for NDAR to provide a community wide data definition while providing investigators and federated repositories with the tools and resources to extend the existing data definition as ASD research advances. In this way, NDAR is able to support all research data.
The NDAR data dictionary, now comprising definitions of tens of thousands of data elements, allows investigators to define the data they submit, as well as alternate terms for the same element (i.e., aliasing) and translation across terms. Importantly, the NDAR data dictionary provides those who access the data with clear and precise information about what it is they are accessing.
To extend or enhance the NDAR data dictionary, we request that you follow these guidelines:
To have your data included into the NDAR Data Dictionary, you have two options:
Data Dictionary Resources:
Once your data are defined, see the sections on genomics, imaging, or clinical assessments to validate your data with the new definition.
The NDAR Query Tool provides a single search interface to search the data stored in the NDAR Central Repository and data stored in federated data repositories.
Query results can be saved in XML or CSV format. Queries can be formed to unify Clinical Assessments, Imaging, and Genomics data into a single result. Participant data returned in the results are associated with the NDAR Global Unique Identifier (GUID), a universal subject ID that protects personally identifiable information. GUIDs can be saved to an NDAR Study, allowing investigators to define sub-populations within NDAR.
For permission to query autism-relevant data through NDAR, investigators must complete the NDAR Data Access Agreement and the simplified SF-424 (R&R). Please refer to SOP-04 Data Access Permission Request for the complete procedure.
The GUID Tool is a customized software application that generates a Global Unique Identifier for each study participant. The GUID is a universal subject ID that allows researchers to share data specific to a study participant without exposing personally identifiable information (PII). The GUID has been approved by the NIH Office of General Counsel.
The GUID system was conceptualized by the Simons Foundation Autism Research Initiative (SFARI) and was designed, developed, and tested in close collaboration between the SFARI and the NDAR project teams.
The system is implemented as an NDAR Web service; an investigator inputs identifying information about a participant into a client application and sends encrypted information to a server application, which then returns a GUID.
Generic unique identifiers have the potential to link collections of research data, augment the amount and types of data available for individuals, support detection of overlap between collections and facilitate replication of research findings. You may request GUID software through the NDAR portal. Please refer to SOP-08 GUID Generation Permission Request for the complete procedure.
Four pieces of identifying information must be collected from each study participant to generate a valid GUID:
The GUID is generated using a free software application installed at the research site. The four items from the birth certificate are encrypted into a hash code, which is then transmitted to NDAR. NDAR then encrypts the hash code to generate a GUID and sends it back to the research site for use. The personally identifiable information (PII) about each participant remains at the research site.
NDAR expects all prospective studies to include a GUID in the data submission. For retrospective studies, the NDAR team understands that the participant data needed to generate a GUID may not be available. Additionally, the informed consent may be inadequate for an investigator to provide data accompanied by a valid GUID. To account for this, NDAR provides the capability to generate pseudo-GUIDs, which are random identifiers that, unlike GUIDs, are not derived from data associated with the research participant. Not having a valid GUID associated has limitations in NDAR. However, NDAR will accept data without a valid GUID for retrospective studies.
GUID Tool Resources:
NDAR supplies a Validation Tool to assist researchers with the submission of data into the NDAR Central Repository. The NDAR Data Validation Tool verifies that submitted data conform to the required format and range values defined in the NDAR Data Dictionary. The Validation Tool imports the NDAR Data Dictionary and validates the metadata associated with the files identified by the NDAR user for submission against the data dictionary. The tool provides a report of any data discrepancies and warnings. If errors are found, a submission package cannot be created. The tool, which runs as a Java Web Start application, runs locally on a user's computer, requiring the Java runtime environment to be installed.
To generate test data, use the utility at http://thedevsite.net/NDAR_AutoXML. Test data is based on the demonstration portal and should be used with the Demonstration Validation Tool.
As NDAR matures, so too will the Validation Tool, allowing NDAR to provide robust, multivariate validation and to enable the community to help NDAR extend this tool.
After thorough analysis of functional genomics data acquisition and storage criteria and community review of needs, NDAR developed a tool to clearly define the relationship between samples and data files as simply as possible. The NDAR Genomics Experiment Definition Tool, standardizes the naming of data processing and analysis protocols, requires entering sufficient details and enforces unambiguous interpretation of the entered information. At the same time, the flexible design of this tool allows users to define a new parameter if the needed parameter is not found in the provided list. New parameters are verified by NDAR personnel. This ensures that data obtained from the raw files is useful for other scientists and can be easily followed and reproduced as necessary.
For more information about Genomics and NDAR, view our Genomics Standards.
NDAR has adopted the MIPAV XML format for gathering metadata on images to be submitted to NDAR. The MIPAV (Medical Image Processing, Analysis, and Visualization) application enables quantitative analysis and visualization of medical images for numerous modalities such as PET, MRI, CT, or microscopy.
MIPAV contains a module that will process images and generate the MIPAV XML required for imaging data submission to NDAR. For instructions on using this tool, refer to http://ndar.nih.gov/ndarpublicweb/standards.html#Imaging