class: center, middle, inverse, title-slide # Finding and obtaining open datasets --- layout: true <div class="my-footer"> <span> <img src="../images/dda_logo.png" alt="Danish Diabetes Academy", width="90"> <img src="../images/SDCA_logo.png" alt="Steno Diabetes Center Aarhus", width="55"> <img src="../images/au_logo_black.png" alt="Aarhus University", width="140"> </span> </div> --- ## Where is Open Data in the Open Science Universe? - Open data is only a small part of the open science movement .center[ <img src="../images/OpenUniverse.png" width="80%" height="80%" /> ] .footnote[Image source from Foster Open Science [(www.fosteropenscience.eu/resources)](https://www.fosteropenscience.eu/resources).] --- ## Different types of open data .pull-left[ - Open data is not only about datasets you can download from the internet - Accessibility is only one part of Openness ] .footnote[ [1] Go FAIR Initiative (https://www.go-fair.org/fair-principles/) ] --- ## Different types of open data .pull-left[ - Open data is not only about datasets you can download from the internet - Accessibility is only one part of Openness - Open data should be FAIR[1]: - Findable - Accessible - Interoperable - Reusable ] .pull-right[ <img src="../images/gofair.png" width="80%" height="80%" /> ] .footnote[ [1] Go FAIR Initiative (https://www.go-fair.org/fair-principles/) ] --- ## FAIR data[1] - **Findable**: The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. - **Accessible**: Once the user finds the required data, she/he needs to know **how** can they be accessed, possibly including authentication and authorisation. - **Interoperable**: The data usually need to interoperate with applications or workflows for analysis, storage, and processing. (Meta)data should use a formal, accessible, shared language or format - **Reusable**: Data and metadata should be well-described so that they can be replicated and/or combined in different settings. .footnote[ [1] Go FAIR Initiative (https://www.go-fair.org/fair-principles/) ] --- ## Finding Open data - Starting points: - Data Resources known in your network - Publication with link to data source / repository - Search in public repositories --- ## Known data resources - Wide range of options between fully closed and fully open (FAIR) - Closed --- ## Known data resources - Wide range of options between fully closed and fully open (FAIR) - Closed .pull-right[ <img src="../images/CPRD.png" width="90%" height="90%" /> ] - Commercial / Paid access - e.g. [(CPRD)](https://www.cprd.com/) --- ## Known data resources - Wide range of options between fully closed and fully open (FAIR) - Closed - Commercial / Paid access - e.g. [(CPRD)](https://www.cprd.com/) - Data sharing within a project/collaboration (restricted) --- ## Known data resources - Wide range of options between fully closed and fully open (FAIR) - Closed .pull-right[ <img src="../images/whitehall.png" width="90%" height="90%" /> ] - Commercial / Paid access - e.g. [(CPRD)](https://www.cprd.com/) - Data sharing within a project/collaboration (restricted) - Gated Data Sharing (Application + Evaluation of proposal, processing fee) - e.g. Whitehall II Study [(Whitehall II Study)](https://www.ucl.ac.uk/epidemiology-health-care/research/epidemiology-and-public-health/research/whitehall-ii/data-sharing) --- ## Known data resources - Wide range of options between fully closed and fully open (FAIR) - Closed .pull-right[ <img src="../images/ELSA.png" width="90%" height="90%" /> ] - Commercial / Paid access - e.g. [(CPRD)](https://www.cprd.com/) - Data sharing within a project/collaboration (restricted) - Gated Data Sharing (Application + Evaluation of proposal, processing fee) - e.g. Whitehall II Study [(Whitehall II Study)](https://www.ucl.ac.uk/epidemiology-health-care/research/epidemiology-and-public-health/research/whitehall-ii/data-sharing) - Only registration required - e.g. English Longitudinal Study of Ageing [(ELSA)](https://www.elsa-project.ac.uk/data-and-documentation) accessible via the [(UK Data Service)](https://ukdataservice.ac.uk/) --- ## Known data resources - Wide range of options between fully closed and fully open (FAIR) - Closed .pull-right[ <img src="../images/NHANES.png" width="90%" height="90%" /> ] - Commercial / Paid access - e.g. [(CPRD)](https://www.cprd.com/) - Data sharing within a project/collaboration (restricted) - Gated Data Sharing (Application + Evaluation of proposal, processing fee) - e.g. Whitehall II Study [(Whitehall II Study)](https://www.ucl.ac.uk/epidemiology-health-care/research/epidemiology-and-public-health/research/whitehall-ii/data-sharing) - Only registration required - e.g. English Longitudinal Study of Ageing [(ELSA)](https://www.elsa-project.ac.uk/data-and-documentation) accessible via the [(UK Data Service)](https://ukdataservice.ac.uk/) - No registration required - e.g. [(NHANES)](https://wwwn.cdc.gov/nchs/nhanes/) --- ## Finding Open data - Starting points: - Data Resource known in your network - Publication with link to data source / repository - Search in public repositories --- ## Publications with links to data Journals increasingly encourage publication of (links to) data Let's have a look at the PLOS journals: - Policy requiring researchers to share the data underlying their results or to state why this is not possible - But: Are authors complying with these requirements? [(PLOS Medicine: Diabetes)](https://journals.plos.org/plosmedicine/search?filterJournals=PLoSMedicine&filterSubjects=Medicine+and%20health%20sciences&filterArticleTypes=Research%20Article&q=diabetes&page=1) --- ## Publications with links to data Journals increasingly encourage publication of (links to) data Let's have a look at the PLOS journals: - Policy requiring researchers to share the data underlying their results or to state why this is not possible - But: Are authors complying with these requirements? [(Review by Federer et al)](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0194768) .center[ <img src="../images/FedererPLOSONE.png" width="85%" height="85%" /> ] --- ## Review of PLOS data statements (Federer 2018) .center[ ![](https://journals.plos.org/plosone/article/figure/image?id=10.1371/journal.pone.0194768.t001&size=medium) ] --- ## Finding Open data - Starting points: - Data Resource known in your network - Publication with link to data source / repository - Search in public repositories --- ## Figshare [Figshare: Diabetes](https://figshare.com/search?q=diabetes&sortBy=posted_date&sortType=desc&licenses=1,2,47,52,3,40,49&itemTypes=3&categories=7,48) .center[ <img src="../images/Figshare.png" width="90%" height="90%" /> ] --- ## Dryad [Dryad: Diabetes](https://datadryad.org/search?utf8=%E2%9C%93&q=diabetes) .center[ <img src="../images/Dryad.png" width="90%" height="90%" /> ] --- ## Github Collection [Github: Awesome Public Datasets](https://github.com/awesomedata/awesome-public-datasets) .center[ <img src="../images/Github.png" width="90%" height="90%" /> ] --- ## Rigsarkivet (Overview and access info) [Rigsarkivet: Sundhed](https://www.sa.dk/da/forskning-rigsarkivet/rigsarkivet-sundhed/) .center[ <img src="../images/Rigsarkivet.png" width="90%" height="90%" /> ] --- ## Open Neuro (MRI and fMRI images) [Openneuro: Diabetes](https://openneuro.org/search/diabetes) .center[ <img src="../images/OpenNeuro.png" width="90%" height="90%" /> ] --- ## How are you allowed to use data you find? - First important step: know who 'owns' the data and what they allow you to do with it - Public Domain: There is no owner, you are allowed to use the data in any way - Data are protected by copyright: but the owner gives you a license to use it in a certain way - Different Open Licenses .pull-left[ <img src="../images/CC_licenses.png" width="70%" height="70%" /> ] .pull-right[ <img src="../images/CClicense_range.png" width="30%" height="30%" /> ] --- ## Different Open Licenses .pull-left[ For any type of 'work', including databases: - Creative Commons: - CC BY-NC (Attribution-NonCommercial) - CC BY-ND (Attribution-NoDerivatives) - CC BY-SA (Attribution-ShareAlike) - CC-BY (only Attribution required) - CC0 (= placing something in the public domain) - Open Data Commons Licenses: - ODC-By (Attribution required) - PDDL (= placing a database in the public domain) ] .pull-right[ Mostly for Open Source Software: - GNU - MIT ] Finding a suitable license for your data: [(Choose a license)](https://choosealicense.com/) or [(Creative Commons Chooser)](https://chooser-beta.creativecommons.org/) --- ## Summary - Open Data is only a part of the Open Science Universe - Open Data should be FAIR (but are often only in part) - There are many different ways of finding Open Data, none are ideal (yet) - Be mindful of the licence attached to a dataset --- ## Links and references .pull-left[ **General Resources** - [Go Fair] (International initiative to promote FAIR data) - [Foster Open Science] (EU project with general resources on Open Science) - [Open Science Framework] (Resources for Open Science) - [Center for Open Science] (Resources for Open Science) ] .pull-right[ **Data Repositories** - [Dryad] (Mostly Manuscript-linked) - [Figshare] (Mostly Manuscript-linked) - [European Data Portal] (Mostly high aggregation level) - [NIH data repositories] (Links to topic specific repositories) - [YODA project] (Request access to RCT data) - [Project Datasphere] (Cancer Research databases) - [Nature recommended data repositories] - [ClinicalStudyDataRequest] (Request access to RCT data) ] --- class: center # Thank you! [Open Science Framework]: https://osf.io/ [European Data Portal]: https://www.europeandataportal.eu/ [GitHub]: https://github.com/ [Dryad]: https://datadryad.org/ [Figshare]: https://figshare.com/ [Center for Open Science]: https://cos.io/ [Choose a License]: https://choosealicense.com/ [Creative Commons]: https://creativecommons.org/ [Go Fair]: https://www.go-fair.org/fair-principles/ [EU Turning FAIR into reality]: https://ec.europa.eu/info/sites/info/files/turning_fair_into_reality_0.pdf [Plan S]: https://www.coalition-s.org/ [Foster Open Science]: https://www.fosteropenscience.eu/ [NIH data repositories]: https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html [UK Data Archive]: https://data-archive.ac.uk/find/archive-catalogue [YODA project]: https://yoda.yale.edu/ [Project Datasphere]: https://www.projectdatasphere.org/projectdatasphere/html/home [ClinicalStudyDataRequest]: https://www.clinicalstudydatarequest.com/ [Nature recommended data repositories]: https://www.nature.com/sdata/policies/repositories