Many government agencies, including the NIH and DARPA, have efforts underway to encourage scientific data sharing, however, the research community has been slow to adopt. In our discussions with researchers, it is clear that this is because sharing their data brings no clear benefit and some clear risks for them. We cannot mitigate the risks, but this proposal describes a platform that will bring a benefit to the neurosciences.
Neuroscience is a field that encompasses a broad range of disciplines and interests. Accordingly, researchers utilize a broad range of techniques and approaches. Each technique offers a unique window onto the brain and other excitable cells and is distinct from the rest in terms of spatial coverage, temporal resolution, chemical specificity or other parameters. A complete understanding of the brain cannot be achieved without understanding of how these disparate viewpoints can be integrated. Here we discuss a ubiquitous and underutilized portion of many neuroscience data sets that can serve as the common thread that ties all of these data together. Properly aggregated, they can explore basic questions in neuroscience that are currently intractable and will accelerate overall discovery.
“Spontaneous” is a term that encompasses a variety of names and descriptions for activity at rest (unstimulated). It is widely understood that spontaneous activity in the brain varies across parameters and conditions including location and time. In order to make quantitative observations of brain function in this variable backdrop, neuroscientists will typically dedicate a portion of their data collection to studying spontaneous activity (e.g. the last 500 ms of a 1000 ms recording or every other slide in histochemistry). These contemporaneous data are used to normalize the experimental observations and then typically ignored, even though they can be over half of the data overall. We propose to harvest observations of spontaneous activity from data sets currently in publicly accessible archives and build a resource.
At a minimum, each data set will be associated with basic information including researcher, subject’s species, and experimental preparation. From this starting point, we can add detail to existing brain atlases or build new maps. Scientific experiments are typically well documented, so each data set should be accompanied by voluminous metadata. Additional metadata makes a wider range of studies possible. For instance, What is the effect of Ketamine versus Pentobarbital on the activity in a given region of the brain ? How does it differ from Pentobarbital? How does the resting cortical activity in a particular genetic mutant differ from wild type? Researchers will also be able to rapidly identify where there are “holes” in the data in order to justify their request for grant funding. Our data sets would be linked to the source information including the original (whole) dataset, metadata (used to create a search index), and the associated research paper. The topic of the research paper and select metadata would also be used to perform a targeted search of PubMed to find related resources for detailed research. We will start by making our collection of spontaneous data available to institutions by subscription. Our minimum viable product (MVP) is simply cleaning and archiving the available data. As researchers find value in the resource (e.g. if their own data are cited or their own research is accelerated), we anticipate that their willingness to facilitate our efforts will increase. We anticipate the help from the research community will be necessary to fully document the metadata and translate some aspects of the available data sets. As our collection grows, we will be able to create new products for education and research.