+ - 0:00:00
Notes for current slide
Notes for next slide

Finding and obtaining open datasets

1 / 27

Where is Open Data in the Open Science Universe?

  • Open data is only a small part of the open science movement

Image source from Foster Open Science (www.fosteropenscience.eu/resources).

2 / 27

Different types of open data

  • Open data is not only about datasets you can download from the internet

  • Accessibility is only one part of Openness

3 / 27

Different types of open data

  • Open data is not only about datasets you can download from the internet

  • Accessibility is only one part of Openness

  • Open data should be FAIR[1]:

    • Findable
    • Accessible
    • Interoperable
    • Reusable

4 / 27

FAIR data[1]

  • Findable: The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers.

  • Accessible: Once the user finds the required data, she/he needs to know how can they be accessed, possibly including authentication and authorisation.

  • Interoperable: The data usually need to interoperate with applications or workflows for analysis, storage, and processing. (Meta)data should use a formal, accessible, shared language or format

  • Reusable: Data and metadata should be well-described so that they can be replicated and/or combined in different settings.

5 / 27

Finding Open data

  • Starting points:
    • Data Resources known in your network
    • Publication with link to data source / repository
    • Search in public repositories
6 / 27

Known data resources

  • Wide range of options between fully closed and fully open (FAIR)
    • Closed
7 / 27

Known data resources

  • Wide range of options between fully closed and fully open (FAIR)
    • Closed

    • Commercial / Paid access
8 / 27

Known data resources

  • Wide range of options between fully closed and fully open (FAIR)
    • Closed
    • Commercial / Paid access
    • Data sharing within a project/collaboration (restricted)
9 / 27

Known data resources

  • Wide range of options between fully closed and fully open (FAIR)
    • Closed

    • Commercial / Paid access
    • Data sharing within a project/collaboration (restricted)
    • Gated Data Sharing (Application + Evaluation of proposal, processing fee)
10 / 27

Known data resources

  • Wide range of options between fully closed and fully open (FAIR)
    • Closed

    • Commercial / Paid access
    • Data sharing within a project/collaboration (restricted)
    • Gated Data Sharing (Application + Evaluation of proposal, processing fee)
    • Only registration required
11 / 27

Known data resources

  • Wide range of options between fully closed and fully open (FAIR)
    • Closed

    • Commercial / Paid access
    • Data sharing within a project/collaboration (restricted)
    • Gated Data Sharing (Application + Evaluation of proposal, processing fee)
    • Only registration required
    • No registration required
12 / 27

Finding Open data

  • Starting points:
    • Data Resource known in your network
    • Publication with link to data source / repository
    • Search in public repositories
13 / 27

Journals increasingly encourage publication of (links to) data

Let's have a look at the PLOS journals:

  • Policy requiring researchers to share the data underlying their results or to state why this is not possible
  • But: Are authors complying with these requirements? (PLOS Medicine: Diabetes)
14 / 27

Journals increasingly encourage publication of (links to) data

Let's have a look at the PLOS journals:

  • Policy requiring researchers to share the data underlying their results or to state why this is not possible
  • But: Are authors complying with these requirements? (Review by Federer et al)

15 / 27

Review of PLOS data statements (Federer 2018)

16 / 27

Finding Open data

  • Starting points:
    • Data Resource known in your network
    • Publication with link to data source / repository
    • Search in public repositories
17 / 27

Figshare

Figshare: Diabetes

18 / 27

Dryad

Dryad: Diabetes

19 / 27

Github Collection

Github: Awesome Public Datasets

20 / 27

Rigsarkivet (Overview and access info)

Rigsarkivet: Sundhed

21 / 27

Open Neuro (MRI and fMRI images)

Openneuro: Diabetes

22 / 27

How are you allowed to use data you find?

  • First important step: know who 'owns' the data and what they allow you to do with it

    • Public Domain: There is no owner, you are allowed to use the data in any way
    • Data are protected by copyright: but the owner gives you a license to use it in a certain way
  • Different Open Licenses

23 / 27

Different Open Licenses

For any type of 'work', including databases:

  • Creative Commons:
    • CC BY-NC (Attribution-NonCommercial)
    • CC BY-ND (Attribution-NoDerivatives)
    • CC BY-SA (Attribution-ShareAlike)
    • CC-BY (only Attribution required)
    • CC0 (= placing something in the public domain)
  • Open Data Commons Licenses:
    • ODC-By (Attribution required)
    • PDDL (= placing a database in the public domain)

Mostly for Open Source Software:

  • GNU
  • MIT

Finding a suitable license for your data: (Choose a license) or (Creative Commons Chooser)

24 / 27

Summary

  • Open Data is only a part of the Open Science Universe
  • Open Data should be FAIR (but are often only in part)
  • There are many different ways of finding Open Data, none are ideal (yet)
  • Be mindful of the licence attached to a dataset
25 / 27

General Resources

Data Repositories

26 / 27

Thank you!

27 / 27

Where is Open Data in the Open Science Universe?

  • Open data is only a small part of the open science movement

Image source from Foster Open Science (www.fosteropenscience.eu/resources).

2 / 27
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow