Machine-Centric Science - podcast cover

Machine-Centric Science

Donny Winston
Stories about the FAIR principles in practice, for scientists who want to compound their impacts, not their errors.
Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

Sandra Gesing

An interview about FAIR software, workflows, and virtual research environments (VREs) / science gateways with Sandra Gesing, currently a Senior Research Scientist and Scientific Outreach and Diversity, Equity, and Inclusion (DEI) Lead at the Discovery Partners Institute at the University of Illinois, Chicago. https://galaxyproject.org/ https://dpi.uillinois.edu/ https://sciencegateways.org/ https://www.rd-alliance.org/groups/fair-virtual-research-environments-wg...

Feb 17, 202341 minEp. 28

Christophe Blanchi

https://doi.org/20.500.14132/chris --> https://doi.org/20.500.14132/chris?noredirect --> https://www.dona.net/team/christophe-blanchi Digital Object Identifier Resolution Protocol (DO-IRP): https://www.dona.net/sites/default/files/2022-06/DO-IRPV3.0--2022-06-30.pdf...

Jan 18, 20231 hr 12 minEp. 27

Vineeth Venugopal

https://en.wikipedia.org/wiki/Interatomic_potential

Oct 31, 202259 minEp. 26

walk-and-talk: DIKW pyramid/hierarchy

DIKW pyramid / DIKW hierarchy - https://en.wikipedia.org/wiki/DIKW_pyramid "Data becomes information when it is stored *in* a given *formation*." From B. Fong and D. I. Spivak, “Seven Sketches in Compositionality: An Invitation to Applied Category Theory,” Ch. 3 - Databases, arXiv, Oct. 12, 2018. doi: 10.48550/arXiv.1803.05316. "There are only three things we can do with data. We can accrete data by adding it to an existing collection, reduce data by discarding information from an existing colle...

Sep 27, 20229 minEp. 25

I Fought the Law

`.split()`s on strings and `filter`s on `None` I fought the Law and the Law won I fought the Law and the Law won I needed spec compliance; I got none I fought the Law and the Law won I fought the Law and the Law won I varied my output with the latest fad Breakin' every downstream run Needed Postel more than I ever had I fought the Law and the Law won I fought the Law and the Scatterin' parsing like a shotgun I fought the Law and the Law won I fought the Law and the Law won I lost robustness and ...

Sep 07, 20221 minEp. 24

Martynas Jusevičius

- Linked Data - Project Jupyter (Notebook, Lab, etc.) - UI Blocks: Block Protocol - Personal Knowledge Graphs: Roam , Logseq , Obsidian - Solid : decentralized data stores - Resource Description Framework (RDF) - Twitter: Martynas , AtomGraph - LinkedDataHub (Apache-2.0 license) - AtomGraph: Website , GitHub...

Aug 29, 202230 minEp. 23

FAIR-Enabling Services

I was thinking about FAIR-enabling resources and wanted to distinguish between things that actually have to be running in order for data to be alive and for you to actually find it, access it, interoperate with it, and reuse it, versus "one-time" things that those services will need.

Aug 19, 202210 minEp. 22

Stuck Data Mining Again (Lodi)

Just about a week ago, I set out to download. Seekin' supplementary data, lookin' for a pot of gold. Things got bad, and things got worse, I guess you will know the tune. Oh lord, stuck data mining again. Rode in on semantics, I'll be hand-waving out if I go. Trying controlled vocabularies, must've been seven of 'em or more. No corresponding authors have replied to my emails yet. Oh lord, I'm stuck data mining again. The man from Stack Overflow said I was on my way. My code kept raising exceptio...

Aug 09, 20222 minEp. 21

Don't Silo Me In

Oh give me mappings, lots of mappings, with resolving URIs. Don’t silo me in. Let me prance through semantics of namespaces that I love. Don’t silo me in. Let me use an open protocol to access these bytes, and for metadata promise me you’ll keep on the lights. Authenticate me repeatedly, but give clear usage rights. Don’t silo me in. Just give me data bare. Let me reuse my old CPUs and mint my URIs. With my own software, let me wander over yonder with least surprise. I want to probe the provenan...

Aug 04, 20221 minEp. 20

Shreyas Cholia

* [Materials Project](https://materialsproject.org/) * [Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE)](https://ess-dive.lbl.gov/) * [National Microbiome Data Collaborative (NMDC)](https://microbiomedata.org/) * [W3C Provenance (PROV) specs](https://www.w3.org/TR/prov-overview/) * [Research Equals (R=)](https://www.researchequals.com/) * [JSON-LD](https://json-ld.org/) * [Ecological Metadata Language (EML)](https://eml.ecoinformatics.org/) * [DataCite](https...

Jul 29, 202230 minEp. 19

Patrick Huck

Materials Project (MP) website: https://materialsproject.org/ Novel Materials Discovery (NOMAD) Laboratory: https://nomad-lab.eu/ Contributor Roles Taxonomy: https://credit.niso.org/ Authentication resources (FAIR A1.2): - https://portier.github.io/using.html - https://github.com/simov/grant - https://docs.konghq.com/ U.S. Department of Energy resources: - Office of Scientific and Technical Information (OSTI) Data ID Service: https://www.osti.gov/data-services - https://www.energy.gov/science/of...

Jul 21, 202253 minEp. 18

R1.3: metadata and data meet domain-relevant community standards

Linked Open Vocabularies (LOV): https://lov.linkeddata.es/dataset/lov/ FAIRSharing: https://fairsharing.org/ PageRank of Linked Open Vocabularies (LOV): https://donnywinston.com/posts/pagerank-of-linked-open-vocabularies-lov/ Principles of Open Scholarly Infrastructure (POSI): https://openscholarlyinfrastructure.org/

Jun 20, 20228 minEp. 16

R1.2: Metadata and data are associated with detailed provenance

https://www.w3.org/TR/prov-dm/#dfn-provenance # Component 1: Entities/Activities: Type: Entity Type: Activity Relation: Generation/Invalidation (E-Act) Relation: Usage (Act-E) Relation: Communication (Act1-[E]-Act2) Relation: Trigger/Starter of Start of Act (trigger E, starter Act) Relation: Trigger/Ender of End of Act End of Act (trigger E, ender Act) # Component 2: Derivations: Relation: Derivation (E-E, E-Act) Relation: Revision (E-E) Relation: Quotation (E-E) Relation: Primary Source (E-E) #...

Jun 02, 20227 minEp. 15

R1.1: Meta(data) are released with a clear and accessible data usage license

The Creative Commons suite of licenses: CC0, CC BY, CC BY-SA, CC-BY-ND, CC BY-NC, CC BY-NC-SA, CC BY-NC-ND. Code licenses: Server Side Public License, Affero GPL (AGPL), Lesser GPL (LGPL), Mozilla Public License (MPL), Business Source License (used e.g. by Sentry, <https://github.com/getsentry/sentry/blob/master/LICENSE>), Elastic License (for Elasticsearch), Apache 2.0, BSD, MIT. Spectrum of user freedom and redistributor freedom. "The CRAPL: An academic-strength open source license": &lt...

May 25, 202212 minEp. 14

I2: (Meta)data use vocabularies that follow the FAIR principles

Heather Hedden, "Foundation for a Knowledge Graph Taxonomy Design Best Practices", slides at https://zenodo.org/record/6510205 Teodora Petkova, "The Dialogic Potential of the Web of Data", slides at https://zenodo.org/record/6518557 https://en.wikipedia.org/wiki/Bohm_Dialogue Tim Berners-Lee's bag of chips https://www.w3.org/TR/vocab-dcat-2/#Class:Dataset https://schema.org/Dataset

May 04, 20226 minEp. 11

A2. Metadata are accessible, even when the data are no longer available

Archival Resource Key (ARK) specification (section on policy metadata): https://datatracker.ietf.org/doc/html/draft-kunze-ark-34#section-5.1.1 . Permanence Levels and the Archives for NIH NLM's Permanent Web Documents: https://www.nlm.nih.gov/pubs/techbull/ma05/ma05_archive.html .

Apr 19, 20225 minEp. 9

A1: (Meta)data are retrievable by their identifier using a standardized communication protocol

You want to avoid protocols with limited implementation, poor documentation, and, when possible, components involving human intervention. It may not be possible to provide secure access through a fully mechanized protocol like HTTP, for example, for highly sensitive data. However, the protocol must be clear and explicit in the metadata, whether it involves a verbal request, email, telephone number, Slack username, et cetera. The important thing is that the communication protocol for how to acces...

Mar 29, 20223 minEp. 6

F4: (Meta)data are registered or indexed in a searchable resource

The goal here is leverage: increasing the ratio of machine action to user action in getting to the data that they want. Otherwise, your data is technically findable, but it's going to require a lot of user action. They might have to do a full data download, scan through a full table, scroll through a long webpage, and it's unlikely that they're going to actually find what they need, because they're just not going to put in that much effort. So you really want indexing. You want this leverage to ...

Mar 22, 20227 minEp. 5

F2: Data are described with rich metadata

Kinds of metadata - "intrinsic" (machine-defined or machine-controlled; immutable) and "extrinsic" (user-defined or user-controlled). Other-than-technical interoperability. "Quality" in the eye of the beholder / data consumer. Analogy to web-browser feature detection, and application to search engine "rich results".

Mar 08, 20223 minEp. 3
Hosted on Transistor
For the best experience, listen in Metacast app for iOS or Android