NIMH Data Archive - Data

Genomics

Neuroimaging

Phenotype¹

New Trial
Clinical Trial

¹ Numbers reported are subjects by age

New Project
Grant/Project Number

Format should be in the following format: Activity Code, Institute Abbreviation, and Serial Number. Grant Type, Support Year, and Suffix should be excluded. For example, grant 1R01MH123456-01A1 should be entered R01MH123456

Collection - Use Existing Experiment

To associate an experiment to the current collection, just select an axperiment from the table below then click the associate experiment button to persist your changes (saving the collection is not required). Note that once an experiment has been associated to two or more collections, the experiment will not longer be editable.

The table search feature is case insensitive and targets the experiment id, experiment name and experiment type columns. The experiment id is searched only when the search term entered is a number, and filtered using a startsWith comparison. When the search term is not numeric the experiment name is used to filter the results.

Select	Experiment Id	Experiment Name	Experiment Type	Created On

24	HI-NGS_R1	Omics	02/16/2011
475	MB1-10 (CHOP)	Omics	06/07/2016
490	Illumina Infinium PsychArray BeadChip Assay	Omics	07/07/2016
501	PharmacoBOLD Resting State	fMRI	07/27/2016
506	PVPREF	Omics	08/05/2016
509	ABC-CT Resting v2	EEG	08/18/2016
13	Comparison of FI expression in Autistic and Neurotypical Homo Sapiens	Omics	12/28/2010
18	AGRE/Broad Affymetrix 5.0 Genotype Experiment	Omics	01/06/2011
22	Stitching PCR Sequencing	Omics	02/14/2011
26	ASD_Methylation	Omics	03/01/2011
29	Microarray family 03 (father, mother, sibling)	Omics	03/24/2011
37	Standard paired-end sequencing of BCRs	Omics	04/19/2011
38	Illumina Mate-Pair BCR sequencing	Omics	04/19/2011
39	Custom Jumping Libraries	Omics	04/19/2011
40	Custom CapBP	Omics	04/19/2011
41	Immunofluorescence	Omics	05/11/2011
43	Autism brain sample genotyping, Illumina	Omics	05/16/2011
47	ARRA Autism Sequencing Collaboration at Baylor. SOLiD 4 System	Omics	08/01/2011
53	AGRE Omni1-quad	Omics	10/11/2011
59	AGP genotyping	Omics	04/03/2012
60	Ultradeep 454 sequencing of synaptic genes from postmortem cerebella of individuals with ASD and neurotypical controls	Omics	06/23/2012
63	Microemulsion PCR and Targeted Resequencing for Variant Detection in ASD	Omics	07/20/2012
76	Whole Genome Sequencing in Autism Families	Omics	01/03/2013
519	Resting	fMRI	11/08/2016
90	Genotyped IAN Samples	Omics	07/09/2013
91	NJLAGS Axiom Genotyping Array	Omics	07/16/2013
93	AGP genotyping (CNV)	Omics	09/06/2013
106	Longitudinal Sleep Study. H20 200. Channel set 2	EEG	11/07/2013
107	Longitudinal Sleep Study. H20 200. Channel set 3	EEG	11/07/2013
108	Longitudinal Sleep Study. AURA 200	EEG	11/07/2013
105	Longitudinal Sleep Study. H20 200. Channel set 1	EEG	11/07/2013
109	Longitudinal Sleep Study. AURA 400	EEG	11/07/2013
116	Gene Expression Analysis WG-6	Omics	01/07/2014
131	Jeste Lab UCLA ACEii: Charlie Brown and Sesame Street - Project 1	Eye Tracking	02/27/2014
132	Jeste Lab UCLA ACEii: Animacy - Project 1	Eye Tracking	02/27/2014
133	Jeste Lab UCLA ACEii: Mom Stranger - Project 2	Eye Tracking	02/27/2014
134	Jeste Lab UCLA ACEii: Face Emotion - Project 3	Eye Tracking	02/27/2014
145	AGRE/FMR1_Illumina.JHU	Omics	04/14/2014
146	AGRE/MECP2_Sanger.JHU	Omics	04/14/2014
147	AGRE/MECP2_Junior.JHU	Omics	04/14/2014
151	Candidate Gene Identification in familial Autism	Omics	06/09/2014
152	NJLAGS Whole Genome Sequencing	Omics	07/01/2014
154	Math Autism Study - Vinod Menon	fMRI	07/15/2014
155	Resting	fMRI	07/25/2014
156	Speech	fMRI	07/25/2014
159	Emotion	fMRI	07/25/2014
160	syllable contrast	EEG	07/29/2014
167	School-age naturalistic stimuli	Eye Tracking	09/19/2014
44	AGRE/Broad Affymetrix 5.0 Genotype Experiment	Omics	06/27/2011
45	Exome Sequencing of 20 Sporadic Cases of Autism Spectrum Disorder	Omics	07/15/2011

Collection - Add Experiment

Add Supporting Documentation

Funding Source:
URL:

To add an existing Data Structure, enter its title in the search bar. If you need to request changes, select the indicator "No, it requires changes to meet research needs" after selecting the Structure, and upload the file with the request changes specific to the selected Data Structure. Your file should follow the Request Changes Procedure. If the Data Structure does not exist, select "Request New Data Structure" and upload the appropriate zip file.

Use/Modify Existing Data Structure

Request New Data Structure

Targeted Enrollment:

Initial Submission Date:

Initial Share Date:

Data Structure Search:

Data Structures:

Submit

Request Submission Exemption

Not Eligible

The Data Expected list for this Collection shows some raw data as missing. Contact the NDA Help Desk with any questions.

Please confirm that you will not be enrolling any more subjects and that all raw data has been collected and submitted.

Collection Updated

Your Collection is now in Data Analysis phase and exempt from biannual submissions. Analyzed data is still expected prior to publication or no later than the project end date.

[CMS] Error

[CMS]

Unable to change collection phase where targeted enrollment is less than 90%

You have requested to move the sharing dates for the following assessments:

Data Expected Item	Original Sharing Date	New Sharing Date

Please provide a reason for this change, which will be sent to the Program Officers listed within this collection:

Explanation must be between 20 and 200 characters in length.

Please press Save or Cancel

Deep sequencing of autism candidate genes in 2000 families from the Simons Simplex Collection (SSC) #1936

General
Experiments (2)
Shared Data
Publications (1)
Data Expected (2)
Associated Studies (9)

Collection Title	Collection Investigators	Collection Description
Collection Title:	Deep sequencing of autism candidate genes in 2000 families from the Simons Simplex Collection (SSC)
Collection Investigators:	Michael Wigler
Collection Description:	343 families from the Simons Simplex Collection. Each family includes father, mother, a proband and an unaffected sibling.
Data Repository:	NIMH Data Archive
Permission Group:
Collection Creation Date:	04/18/2012
Collection Phase:	Funding Completed
Collection Sub-Phase:	Close Out
Blinded Clinical Trial:	No
Total Funded Amount:	$2,779,842.00
Subjects Shared:	4,558

{"values":[["Next Generation Sequencing: sequencing",5272]]}

{"values":[]}

{"values":[["Autism Spectrum Severely Affected",1421],["Typical Control",4],["Autism Spectrum Mildly Affected",63],["Sibling Control",2150],["Neurological Control",7],["Parental Control",3679],["Autism Spectrum Affected",45]]}

Loading Chart...

Funding Sources:

Funding Source Name	Funding Source URL
NIH - Extramural	None

Supporting Documentation:

File Name	File Type	Description	Audience
sequencing_files_readme_col_1936.pdf	Background	Readme for sequencing files	Qualified Researchers

Grant Information:

Project Number	Project Title	Start Date	End Date	Planned Enrollment	Actual Enrollment	Organization	Funds Obligated
RC2MH090028-01	Deep sequencing of autism candidate genes in 2000 families from the Simons Simple	09/30/2009	08/31/2012	Not Reported	Not Reported	COLD SPRING HARBOR LABORATORY	$2,779,842.00

Clinical Trials:

helpcenter.collection.general-tab

Collection - General Tab

Fields available for edit on the top portion of the page include:

Collection Title
Investigators
Collection Description
Collection Phase
Funding Source
Clinical Trials

Collection Phase: The current status of a research project submitting data to an NDA Collection, based on the timing of the award and/or the data that have been submitted.

Pre-Enrollment: The default entry made when the NDA Collection is created.
Enrolling: Data have been submitted to the NDA Collection or the NDA Data Expected initial submission date has been reached for at least one data structure category in the NDA Collection.
Data Analysis: Subject level data collection for the research project is completed and has been submitted to the NDA Collection. The NDA Collection owner or the NDA Help Desk may set this phase when they’ve confirmed data submission is complete and submitted subject counts match at least 90% of the target enrollment numbers in the NDA Data Expected. Data submission reminders will be turned off for the NDA Collection.
Funding Completed: The NIH grant award (or awards) associated with the NDA Collection has reached its end date. NDA Collections in Funding Completed phase are assigned a subphase to indicate the status of data submission.
- The Data Expected Subphase indicates that NDA expects more data will be submitted
- The Closeout Subphase indicates the data submission is complete.
- The Sharing Not Met Subphase indicates that data submission was not completed as expected.

Blinded Clinical Trial Status:

This status is set by a Collection Owner and indicates the research project is a double blinded clinical trial. When selected, the public view of Data Expected will show the Data Expected items and the Submission Dates, but the targeted enrollment and subjects submitted counts will not be displayed.
Targeted enrollment and subjects submitted counts are visible only to NDA Administrators and to the NDA Collection or as the NDA Collection Owner.
When an NDA Collection that is flagged Blinded Clinical Trial reaches the maximum data sharing date for that Data Repository (see https://nda.nih.gov/nda/sharing-regimen.html), the embargo on Data Expected information is released.

Funding Source

The organization(s) responsible for providing the funding is listed here.

Supporting Documentation

Users with Submission privileges, as well as Collection Owners, Program Officers, and those with Administrator privileges, may upload and attach supporting documentation. By default, supporting documentation is shared to the general public, however, the option is also available to limit this information to qualified researchers only.

Grant Information

Identifiable details are displayed about the Project of which the Collection was derived from. You may click in the Project Number to view a full report of the Project captured by the NIH.

Clinical Trials

Any data that is collected to support or further the research of clinical studies will be available here. Collection Owners and those with Administrator privileges may add new clinical trials.

Frequently Asked Questions

How does the NIMH Data Archive (NDA) determine which Permission Group data are submitted into?

During Collection creation, NDA staff determine the appropriate Permission Group based on the type of data to be submitted, the type of access that will be available to data access users, and the information provided by the Program Officer during grant award.
How do I know when a NDA Collection has been created?

When a Collection is created by NDA staff, an email notification will automatically be sent to the PI(s) of the grant(s) associated with the Collection to notify them.
Is a single grant number ever associated with more than one Collection?

The NDA system does not allow for a single grant to be associated with more than one Collection; therefore, a single grant will not be listed in the Grant Information section of a Collection for more than one Collection.
Why is there sometimes more than one grant included in a Collection?

In general, each Collection is associated with only one grant; however, multiple grants may be associated if the grant has multiple competing segments for the same grant number or if multiple different grants are all working on the same project and it makes sense to hold the data in one Collection (e.g., Cooperative Agreements).

Glossary

Administrator Privilege

A privilege provided to a user associated with an NDA Collection or NDA Study whereby that user can perform a full range of actions including providing privileges to other users.
Collection Owner

Generally, the Collection Owner is the contact PI listed on a grant. Only one NDA user is listed as the Collection owner. Most automated emails are primarily sent to the Collection Owner.
Collection Phase
The Collection Phase provides information on data submission as opposed to grant/project completion so while the Collection phase and grant/project phase may be closely related they are often different. Collection users with Administrative Privileges are encouraged to edit the Collection Phase. The Program Officer as listed in eRA (for NIH funded grants) may also edit this field. Changes must be saved by clicking the Save button at the bottom of the page. This field is sortable alphabetically in ascending or descending order. Collection Phase options include:
- Pre-Enrollment: A grant/project has started, but has not yet enrolled subjects.
- Enrolling: A grant/project has begun enrolling subjects. Data submission is likely ongoing at this point.
- Data Analysis: A grant/project has completed enrolling subjects and has completed all data submissions.
- Funding Completed: A grant/project has reached the project end date.
Collection Title

An editable field with the title of the Collection, which is often the title of the grant associated with the Collection.
Grant

Provides the grant number(s) for the grant(s) associated with the Collection. The field is a hyperlink so clicking on the Grant number will direct the user to the grant information in the NIH Research Portfolio Online Reporting Tools (RePORT) page.
Supporting Documentation

Various documents and materials to enable efficient use of the data by investigators unfamiliar with the project and may include the research protocol, questionnaires, and study manuals.
NIH Research Initiative

NDA Collections may be organized by scientific similarity into NIH Research Initiatives, to facilitate query tool user experience. NIH Research Initiatives map to one or multiple Funding Opportunity Announcements.
Permission Group

Access to shared record-level data in NDA is provisioned at the level of a Permission Group. NDA Permission Groups consist of one or multiple NDA Collections that contain data with the same subject consents.
Planned Enrollment

Number of human subject participants to be enrolled in an NIH-funded clinical research study. The data is provided in competing applications and annual progress reports.
Actual Enrollment

Number of human subjects enrolled in an NIH-funded clinical research study. The data is provided in annual progress reports.
NDA Collection

A virtual container and organization structure for data and associated documentation from one grant or one large project/consortium. It contains tools for tracking data submission and allows investigators to define a wide array of other elements that provide context for the data, including all general information regarding the data and source project, experimental parameters used to collect any event-based data contained in the Collection, methods, and other supporting documentation. They also allow investigators to link underlying data to an NDA Study, defining populations and subpopulations specific to research aims.
Data Use Limitations

Data Use Limitations (DULs) describe the appropriate secondary use of a dataset and are based on the original informed consent of a research participant. NDA only accepts consent-based data use limitations defined by the NIH Office of Science Policy.
Total Subjects Shared

The total number of unique subjects for whom data have been shared and are available for users with permission to access data.

Contact NDA Help Desk

ID	Name	Created Date	Status	Type
66	SSC samples exome sequencing	08/03/2012	Approved	Omics
170	Whole genome sequencing of 8 SSC samples	10/22/2014	Approved	Omics

helpcenter.collection.experiments-tab

Collection - Experiments

The number of Experiments included is displayed in parentheses next to the tab name. You may download all experiments associated with the Collection via the Download button. You may view individual experiments by clicking the Experiment Name and add them to the Filter Cart via the Add to Cart button.

Collection Owners, Program Officers, and users with Submission or Administrative Privileges for the Collection may create or edit an Experiment.

Please note: The creation of an NDA Experiment does not necessarily mean that data collected, according to the defined Experiment, has been submitted or shared.

Frequently Asked Questions

Can an Experiment be associated with more than one Collection?
Yes -see the “Copy” button in the bottom left when viewing an experiment. There are two actions that can be performed via this button:
1. Copy the experiment with intent for modifications.
2. Associate the experiment to the collection. No modifications can be made to the experiment.

Glossary

Experiment Status

An Experiment must be Approved before data using the associated Experiment_ID may be uploaded.
Experiment ID

The ID number automatically generated by NDA which must be included in the appropriate file when uploading data to link the Experiment Definition to the subject record.

Contact NDA Help Desk

Shared Data:

Title	Type	Number of Subjects
Genomics Sample	Genomics	4558
Genomics Subject	Genomics	4558
Research Subject	Clinical Assessments	1360

helpcenter.collection.shared-data-tab

Collection - Shared Data

This tab provides a quick overview of the Data Structure title, Data Type, and Number of Subjects that are currently Shared for the Collection. The information presented in this tab is automatically generated by NDA and cannot be edited. If no information is visible on this tab, this would indicate the Collection does not have shared data or the data is private.

The shared data is available to other researchers who have permission to access data in the Collection's designated Permission Group(s). Use the Download button to get all shared data from the Collection to the Filter Cart.

Frequently Asked Questions

How will I know if another researcher uses data that I shared through the NIMH Data Archive (NDA)?

To see what data your project have submitted are being used by a study, simply go the Associated Studies tab of your collection. Alternatively, you may review an NDA Study Attribution Report available on the General tab.
Can I get a supplement to share data from a completed research project?

Often it becomes more difficult to organize and format data electronically after the project has been completed and the information needed to create a GUID may not be available; however, you may still contact a program staff member at the appropriate funding institution for more information.
Can I get a supplement to share data from a research project that is still ongoing?

Unlike completed projects where researchers may not have the information needed to create a GUID and/or where the effort needed to organize and format data becomes prohibitive, ongoing projects have more of an opportunity to overcome these challenges. Please contact a program staff member at the appropriate funding institution for more information.

Glossary

Data Structure

A defined organization and group of Data Elements to represent an electronic definition of a measure, assessment, questionnaire, or collection of data points. Data structures that have been defined in the NDA Data Dictionary are available at https://nda.nih.gov/general-query.html?q=query=data-structure
Data Type

A grouping of data by similar characteristics such as Clinical Assessments, Omics, or Neurosignal data.
Shared

The term 'Shared' generally means available to others; however, there are some slightly different meanings based on what is Shared. A Shared NDA Study is viewable and searchable publicly regardless of the user's role or whether the user has an NDA account. A Shared NDA Study does not necessarily mean that data used in the NDA Study have been shared as this is independently determined. Data are shared according the schedule defined in a Collection's Data Expected Tab and/or in accordance with data sharing expectations in the NDA Data Sharing Terms and Conditions. Additionally, Supporting Documentation uploaded to a Collection may be shared independent of whether data are shared.

Contact NDA Help Desk

Collection Owners and those with Collection Administrator permission, may edit a collection. The following is currently available for Edit on this page:

Publications

Publications relevant to NDA data are listed below. Most displayed publications have been associated with the grant within Pubmed. Use the "+ New Publication" button to add new publications. Publications relevant/not relevant to data expected are categorized. Relevant publications are then linked to the underlying data by selecting the Create Study link. Study provides the ability to define cohorts, assign subjects, define outcome measures and lists the study type, data analysis and results. Analyzed data and results are expected in this way.

PubMed ID	Study	Title	Journal	Authors	Date	Status
22542183	Study (318)	De novo gene disruptions in children on the autistic spectrum.	Neuron	Iossifov, Ivan; Ronemus, Michael; Levy, Dan; Wang, Zihua; Hakker, Inessa; Rosenbaum, Julie; Yamrom, Boris; Lee, Yoon-Ha; Narzisi, Giuseppe; Leotta, Anthony; Kendall, Jude; Grabowska, Ewa; Ma, Beicong; Marks, Steven; Rodgers, Linda; Stepansky, Asya; Troge, Jennifer; Andrews, Peter; Bekritsky, Mitchell; Pradhan, Kith; Ghiban, Elena; Kramer, Melissa; Parla, Jennifer; Demeter, Ryan; Fulton, Lucinda L; Fulton, Robert S; Magrini, Vincent J; Ye, Kenny; Darnell, Jennifer C; Darnell, Robert B; Mardis, Elaine R; Wilson, Richard K; Schatz, Michael C; McCombie, W Richard; Wigler, Michael	April 26, 2012	Relevant

helpcenter.collection.publications-tab

Collection - Publications

The number of Publications is displayed in parentheses next to the tab name. Clicking on any of the Publication Titles will open the Publication in a new internet browsing tab.

Collection Owners, Program Officers, and users with Submission or Administrative Privileges for the Collection may mark a publication as either Relevant or Not Relevant in the Status column.

Frequently Asked Questions

How can I determine if a publication is relevant?

Publications are considered relevant to a collection when the data shared is directly related to the project or collection.
Where does the NDA get the publications?

PubMed, an online library containing journals, articles, and medical research. Sponsored by NiH and National Library of Medicine (NLM).

Glossary

Create Study

A link to the Create an NDA Study page that can be clicked to start creating an NDA Study with information such as the title, journal and authors automatically populated.
Not Determined Publication

Indicates that the publication has not yet been reviewed and/or marked as Relevant or Not Relevant so it has not been determined whether an NDA Study is expected.
Not Relevant Publication

A publication that is not based on data related to the aims of the grant/project associated with the Collection or not based on any data such as a review article and, therefore, an NDA Study is not expected to be created.
PubMed

PubMed provides citation information for biomedical and life sciences publications and is managed by the U.S. National Institutes of Health's National Library of Medicine.
PubMed ID

The PUBMed ID is the unique ID number for the publication as recorded in the PubMed database.
Relevant Publication

A publication that is based on data related to the aims of the grant/project associated with the Collection and, therefore, an NDA Study is expected to be created.

Contact NDA Help Desk

Data Expected List: Mandatory Data Structures

These data structures are mandatory for your NDA Collection. Please update the Targeted Enrollment number to accurately represent the number of subjects you expect to submit for the entire study.

For NIMH HIV-related research that involves human research participants: Select the dictionary or dictionaries most appropriate for your research. If your research does not require all three data dictionaries, just ignore the ones you do not need. There is no need to delete extra data dictionaries from your NDA Collection. You can adjust the Targeted Enrollment column in the Data Expected tab to “0” for those unnecessary data dictionaries. At least one of the three data dictionaries must have a non-zero value.

Data Expected	Targeted Enrollment	Initial Submission	Subjects Submitted	Initial Share	Subjects Shared	Status
Research Subject and Pedigree	1,360	07/15/2010	4,558	11/15/2010	4,558	Approved

To create your project's Data Expected list, use the "+New Data Expected" to add or request existing structures and to request new Data Structures that are not in the NDA Data Dictionary.

If the Structure you need already exists, locate it and specify your dates and enrollment when adding it to your Data Expected list. If you require changes to the Structure you need, select the indicator stating "No, it requires changes to meet research needs," and upload a file containing your requested changes.

If the structure you need is not yet defined in the Data Dictionary, you can select "Upload Definition" and attach the necessary materials to request its creation.

When selecting the expected dates for your data, make sure to follow the standard Data Sharing Regimen and choose dates within the date ranges that correspond to your project start and end dates.

Please visit the Completing Your Data Expected Tutorial for more information.

Data Expected List: Data Structures per Research Aims

These data structures are specific to your research aims and should list all data structures in which data will be collected and submitted for this NDA Collection. Please update the Targeted Enrollment number to accurately represent the number of subjects you expect to submit for the entire study.

Data Expected	Targeted Enrollment	Initial Submission	Subjects Submitted	Initial Share	Subjects Shared	Status
Genomics/omics	1,360	07/15/2010	4,558	11/15/2010	4,558	Approved

Structure not yet defined

No Status history for this Data Expected has been recorded yet

helpcenter.collection.data-expected-tab

Collection - Data Expected

The Data Expected tab displays the list of all data that NDA expects to receive in association with the Collection as defined by the contributing researcher, as well as the dates for the expected initial upload of the data, and when it is first expected to be shared, or with the research community. Above the primary table of Data Expected, any publications determined to be relevant to the data within the Collection are also displayed - members of the contributing research group can use these to define NDA Studies, connecting those papers to underlying data in NDA.

The tab is used both as a reference for those accessing shared data, providing information on what is expected and when it will be shared, and as the primary tracking mechanism for contributing projects. It is used by both contributing primary researchers, secondary researchers, and NIH Program and Grants Management staff.

Researchers who are starting their project need to update their Data Expected list to include all the Data Structures they are collecting under their grant and set their initial submission and sharing schedule according to the NDA Data Sharing Regimen.

To add existing Data Structures from the Data Dictionary, to request new Data Structure that are not in the Dictionary, or to request changes to existing Data Structures, click "+New Data Expected".

For step-by-step instructions on how to add existing Data Structures, request changes to an existing Structure, or request a new Data Structure, please visit the Completing Your Data Expected Tutorial.

If you are a contributing researcher creating this list for the first time, or making changes to the list as your project progress, please note the following:

Although items you add to the list and changes you make are displayed, they are not committed to the system until you Save the entire page using the "Save" button at the bottom of your screen. Please Save after every change to ensure none of your work is lost.
If you attempt to add a new structure, the title you provide must be unique - if another structure exists with the same name your change will fail.
Adding a new structure to this list is the only way to request the creation of a new Data Dictionary definition.

Frequently Asked Questions

What is an NDA Data Structure?

An NDA Data Structure is comprised of multiple Data Elements to make up an electronic definition of an assessment, measure, questionnaire, etc will have a corresponding Data Structure.
What is the NDA Data Dictionary?

The NDA Data Dictionary is comprised of electronic definitions known as Data Structures.

Glossary

Analyzed Data

Data specific to the primary aims of the research being conducted (e.g. outcome measures, other dependent variables, observations, laboratory results, analyzed images, volumetric data, etc.) including processed images.
Data Item

Items listed on the Data Expected list in the Collection which may be an individual and discrete Data Structure, Data Structure Category, or Data Structure Group.
Data Structure

A defined organization and group of Data Elements to represent an electronic definition of a measure, assessment, questionnaire, or collection of data points. Data structures that have been defined in the NDA Data Dictionary are available at https://nda.nih.gov/general-query.html?q=query=data-structure
Data Structure Category

An NDA term describing the affiliation of a Data Structure to a Category, which may be disease/disorder or diagnosis related (Depression, ADHD, Psychosis), specific to data type (MRI, eye tracking, omics), or type of data (physical exam, IQ).
Data Structure Group

A Data Item listed on the Data Expected tab of a Collection that indicates a group of Data Structures (e.g., ADOS or SCID) for which data may be submitted instead of a specific Data Structure identified by version, module, edition, etc. For example, the ADOS Data Structure Category includes every ADOS Data Structure such as ADOS Module 1, ADOS Module 2, ADOS Module 1 - 2nd Edition, etc. The SCID Data Structure Group includes every SCID Data Structure such as SCID Mania, SCID V Mania, SCID PTSD, SCID-V Diagnosis, and more.
Evaluated Data

A new Data Structure category, Evaluated Data is analyzed data resulting from the use of computational pipelines in the Cloud and can be uploaded directly back to a miNDAR database. Evaluated Data is expected to be listed as a Data Item in the Collection's Data Expected Tab.
Imaging Data

Imaging+ is an NDA term which encompasses all imaging related data including, but not limited to, images (DTI, MRI, PET, Structural, Spectroscopy, etc.) as well as neurosignal data (EEG, fMRI, MEG, EGG, eye tracking, etc.) and Evaluated Data.
Initial Share Date

Initial Submission and Initial Share dates should be populated according to the NDA Data Sharing Terms and Conditions. Any modifications to these will go through the approval processes outlined above. Data will be shared with authorized users upon publication (via an NDA Study) or 1-2 years after the grant end date specified on the first Notice of Award, as defined in the applicable Data Sharing Terms and Conditions.
Initial Submission Date

Initial Submission and Initial Share dates should be populated according to these NDA Data Sharing Terms and Conditions. Any modifications to these will go through the approval processes outlined above. Data for all subjects is not expected on the Initial Submission Date and modifications may be made as necessary based on the project's conduct.
Research Subject and Pedigree

An NDA created Data Structure used to convey basic information about the subject such as demographics, pedigree (links family GUIDs), diagnosis/phenotype, and sample location that are critical to allow for easier querying of shared data.
Submission Cycle

The NDA has two Submission Cycles per year - January 15 and July 15.
Submission Exemption

An interface to notify NDA that data may not be submitted during the upcoming/current submission cycle.

Contact NDA Help Desk

Collection Owners and those with Collection Administrator permission, may edit a collection. The following is currently available for Edit on this page:

Associated Studies

Studies that have been defined using data from a Collection are important criteria to determine the value of data shared. The number of subjects column displays the counts from this Collection that are included in a Study, out of the total number of subjects in that study. The Data Use column represents whether or not the study is a primary analysis of the data or a secondary analysis. State indicates whether the study is private or shared with the research community.

Study NameFilter by Study Name	AbstractFilter by Abstract	Collection/Study SubjectsFilter by Collection/Study Subjects	Data UsageFilter by Data Usage	StateFilter by State
Hotspots of missense mutation identify neurodevelopmental disorder genes and functional domains	Although de novo missense mutations have been predicted to account for more cases of autism than gene-truncating mutations, most research has focused on the latter. We identified the properties of de novo missense mutations in patients with neurodevelopmental disorders (NDDs) and highlight 35 genes with excess missense mutations. Additionally, 40 amino acid sites were recurrently mutated in 36 genes, and targeted sequencing of 20 sites in 17,600 NDD patients identified 21 new patients with identical missense mutations. One recurrent site (p.Ala636Thr) occurs in a glutamate receptor subunit, GRIA1. This same amino acid substitution in the homologous but distinct mouse glutamate receptor subunit Grid2 is associated with Lurcher ataxia. Phenotypic follow-up in five individuals with GRIA1 mutations shows evidence of specific learning disabilities and autism. Overall, we find significant clustering of de novo mutations in 200 genes, highlighting specific functional domains and synaptic candidate genes important in NDD pathology.	1/18812	Primary Analysis	Shared
Insights into Autism Spectrum Disorder Genomic Architecture and Biology from 71 Risk Loci	Analysis of de novo CNVs (dnCNVs) from the full Simons Simplex Collection (SSC) (N = 2,591 families) replicates prior findings of strong association with autism spectrum disorders (ASDs) and confirms six risk loci (1q21.1, 3q29, 7q11.23, 16p11.2, 15q11.2-13, and 22q11.2). The addition of published CNV data from the Autism Genome Project (AGP) and exome sequencing data from the SSC and the Autism Sequencing Consortium (ASC) shows that genes within small de novo deletions, but not within large dnCNVs, significantly overlap the high-effect risk genes identified by sequencing. Alternatively, large dnCNVs are found likely to contain multiple modest-effect risk genes. Overall, we find strong evidence that de novo mutations are associated with ASD apart from the risk for intellectual disability. Extending the transmission and de novo association test (TADA) to include small de novo deletions reveals 71 ASD risk loci, including 6 CNV regions (noted above) and 65 risk genes (FDR ≤ 0.1).	4111/9975	Secondary Analysis	Shared
Recurrent de novo mutations implicate novel genes underlying simplex autism risk.	Autism spectrum disorder (ASD) has a strong but complex genetic component. Here we report on the resequencing of 64 candidate neurodevelopmental disorder risk genes in 5,979 individuals: 3,486 probands and 2,493 unaffected siblings. We find a strong burden of de novo point mutations for these genes and specifically implicate nine genes. These include CHD2 and SYNGAP1, genes previously reported in related disorders, and novel genes TRIP12 and PAX5. We also show that mutation carriers generally have lower IQs and enrichment for seizures. These data begin to distinguish genetically distinct subtypes of autism important for aetiological classification and future therapeutics.	1888/6400	Primary Analysis	Shared
The evolution and population diversity of human-specific segmental duplications	Segmental duplications contribute to human evolution, adaptation and genomic instability but are often poorly characterized. We investigate the evolution, genetic variation and coding potential of human-specific segmental duplications (HSDs). We identify 218 HSDs based on analysis of 322 deeply sequenced archaic and contemporary hominid genomes. We sequence 550 human and nonhuman primate genomic clones to reconstruct the evolution of the largest, most complex regions with protein-coding potential (n=80 genes/33 gene families). We show that HSDs are non-randomly organized, associate preferentially with ancestral ape duplications termed “core duplicons”, and evolved primarily in an interspersed inverted orientation. In addition to Homo sapiens-specific gene expansions (e.g., TCAF1/2), we highlight ten gene families (e.g., ARHGAP11B and SRGAP2C) where copy number never returns to the ancestral state, there is evidence of mRNA splicing, and no common gene-disruptive mutations are observed in the general population. Such duplicates are candidates for the evolution of human-specific adaptive traits.	743/6360	Primary Analysis	Shared
Mitochondrial DNA mutations in Autism Spectrum Disorder	Mitochondrial dysfunction is frequently observed in Autism Spectrum Disorders (ASD). Thus, variations in the mitochondrial DNA (mtDNA) sequences may contribute to increased ASD risks. In the current study, we evaluated mtDNA variations, including homoplasmy and heteroplasmy, in 903 ASD individuals along with their mothers and non-ASD siblings by using off-target reads from whole-exome sequencing data sets of Simons Foundation Autism Research Initiative (SFARI) Simons Collection available on NDAR. We found that heteroplasmic mutations in ASD individuals were enriched at non-polymorphic mtDNA sites (P = 0.0015) compared to their non-ASD siblings, which were more likely to confer deleterious effects than heteroplasmies at polymorphic mtDNA sites. Accordingly, we observed a ~1.5-fold enrichment of nonsynonymous mutations as well as a ~2.2-fold enrichment of predicted pathogenic mutations (P < 0.003) in ASD individuals compared to their non-ASD siblings. Our genetic findings substantiate pathogenic mtDNA mutations as a potential cause for ASD and synergize with recent work calling attention to their unique metabolic phenotypes for diagnosis and treatment of ASD.	2709/2709	Secondary Analysis	Shared
Transmission disequilibrium of small CNVs in simplex autism.	Cohorts: 411 ASD Quads from Simons Simplex Collection 177 Quads from Sanders et al. (PubMed ID: 22495306) 166 Quads from I. Iossifov et al. (PubMed ID: 22542183) 71 Quads from O'Roak et al. (PubMed ID: 22495309) Publication Abstract: We searched for disruptive, genic rare copy-number variants (CNVs) among 411 families affected by sporadic autism spectrum disorder (ASD) from the Simons Simplex Collection by using available exome sequence data and CoNIFER (Copy Number Inference from Exome Reads). Compared to high-density SNP microarrays, our approach yielded ¿2× more smaller genic rare CNVs. We found that affected probands inherited more CNVs than did their siblings (453 versus 394, p = 0.004; odds ratio [OR] = 1.19) and that the probands' CNVs affected more genes (921 versus 726, p = 0.02; OR = 1.30). These smaller CNVs (median size 18 kb) were transmitted preferentially from the mother (136 maternal versus 100 paternal, p = 0.02), although this bias occurred irrespective of affected status. The excess burden of inherited CNVs among probands was driven primarily by sibling pairs with discordant social-behavior phenotypes (p < 0.0002, measured by Social Responsiveness Scale [SRS] score), which contrasts with families where the phenotypes were more closely matched or less extreme (p > 0.5). Finally, we found enrichment of brain-expressed genes unique to probands, especially in the SRS-discordant group (p = 0.0035). In a combined model, our inherited CNVs, de novo CNVs, and de novo single-nucleotide variants all independently contributed to the risk of autism (p < 0.05). Taken together, these results suggest that small transmitted rare CNVs play a role in the etiology of simplex autism. Importantly, the small size of these variants aids in the identification of specific genes as additional risk factors associated with ASD.	462/1644	Secondary Analysis	Shared
Brain-based sex differences in autism spectrum disorder across the lifespan: A systematic review of structural MRI, fMRI, and DTI findings	Females with autism spectrum disorder (ASD) have been long overlooked in neuroscience research, but emerging evidence suggests they show distinct phenotypic trajectories and age-related brain differences. Sex-related biological factors (e.g., hormones, genes) may play a role in ASD etiology and have been shown to influence neurodevelopmental trajectories. Thus, a lifespan approach is warranted to understand brain-based sex differences in ASD. This systematic review on MRI-based sex differences in ASD was conducted to elucidate variations across the lifespan and inform biomarker discovery of ASD in females. We identified articles through two database searches. Fifty studies met criteria and underwent integrative review. We found that regions expressing replicable sex-by-diagnosis differences across studies overlapped with regions showing sex differences in neurotypical (NT) cohorts, in particular regions showing NT male>female volumes. Furthermore, studies investigating age-related brain differences across a broad age-span suggest distinct neurodevelopmental patterns in females with ASD. Qualitative comparison across youth and adult studies also supported this hypothesis. However, many studies collapsed across age, which may mask differences. Furthermore, accumulating evidence supports the female protective effect in ASD, although only one study examined brain circuits implicated in “protection.” When synthesized with the broader literature, brain-based sex differences in ASD may come from various sources, including genetic and endocrine processes involved in brain “masculinization” and “feminization” across early development, puberty, and other lifespan windows of hormonal transition. Furthermore, sex-related biology may interact with peripheral processes, in particular the stress axis and brain arousal system, to produce distinct neurodevelopmental patterns in males and females with ASD. Future research on neuroimaging-based sex differences in ASD would benefit from a lifespan approach in well-controlled and multivariate studies. Possible relationships between behavior, sex hormones, and brain development in ASD remain largely unexamined.	3/759	Secondary Analysis	Shared
Identification of differentially methylated regions (DMRs) and cytosine sites (DMCs) in DNA methylation data of autism cases and unaffected siblings	We compared blood-based DNA methylation profiles between children with autism spectrum disorder (ASD) and carefully matched, unrelated neurotypical control children. Using sequencing-based method, we identified ASD-specific differentially methylated regions (DMRs) and cytosine sites (DMCs). We carried out comparative analyses with datasets from the NDA Collection 1650 (SFARI - DNA Methylation Analysis Cohort) that measured blood DNA methylation in ASD using microarray technology. We also identified DMRs and DMCs using metilene and minfi pipelines in the DNAm datasets from the NDA Collection 1650.	292/728	Secondary Analysis	Shared
Phenotypic subtyping and re-analysis of existing methylation data from autistic probands in simplex families reveal ASD subtype-associated differentially methylated genes and biological functions	Autism spectrum disorder (ASD) describes a group of neurodevelopmental disorders with core deficits in social communication and manifestation of restricted, repetitive, and stereotyped behaviors. Despite the core symptomatology, ASD is extremely heterogeneous with respect to the severity of symptoms and behaviors. This heterogeneity presents an inherent challenge to all large-scale genome-wide 'omics analyses. In the present study, we address this heterogeneity by stratifying ASD probands from simplex families according to severity of behavioral scores on the Autism Diagnostic Interview-Revised diagnostic instrument, followed by re-analysis of existing DNA methylation data from individuals in three ASD subphenotypes in comparison to that of their respective unaffected siblings. We demonstrate that subphenotyping of cases enables the identification of over 1.6 times the number of statistically significant differentially methylated genes (DMGs) between cases and controls, compared to that identified when all cases are combined. Our analyses also reveal ASD-related neurological functions and comorbidities that are enriched among DMGs in each phenotypic subgroup but not in the combined case group. These findings may aid in the development of subtype-directed diagnostics and therapeutics.	70/584	Secondary Analysis	Shared

* Data not on individual level

helpcenter.collection.associated-studies-tab

Collection - Associated Studies

Clicking on the Study Title will open the study details in a new internet browser tab. The Abstract is available for viewing, providing the background explanation of the study, as provided by the Collection Owner.

Primary v. Secondary Analysis: The Data Usage column will have one of these two choices. An associated study that is listed as being used for Primary Analysis indicates at least some and potentially all of the data used was originally collected by the creator of the NDA Study. Secondary Analysis indicates the Study owner was not involved in the collection of data, and may be used as supporting data.

Private v. Shared State: Studies that remain private indicate the associated study is only available to users who are able to access the collection. A shared study is accessible to the general public.

Frequently Asked Questions

How do I associate a study to my collection?

Studies are associated to the Collection automatically when the data is defined in the Study.

Glossary

Associated Studies Tab

A tab in a Collection that lists the NDA Studies that have been created using data from that Collection including both Primary and Secondary Analysis NDA Studies.

Contact NDA Help Desk

Edit

Choose File:	Select File
File Type:
Description:

Exemption Type*
From Date*
To Date*
Reason*	Characters Remaining:

Disclaimer

Filter Cart

Frequently Asked Questions

Glossary