An interview about FAIR software, workflows, and virtual research environments (VREs) / science gateways with Sandra Gesing, currently a Senior Research Scientist and Scientific Outreach and Diversity, Equity, and Inclusion (DEI) Lead at the Discovery Partners Institute at the University of Illinois, Chicago. https://galaxyproject.org/ https://dpi.uillinois.edu/ https://sciencegateways.org/ https://www.rd-alliance.org/groups/fair-virtual-research-environments-wg...
Feb 17, 2023•41 min•Ep. 28
https://doi.org/20.500.14132/chris --> https://doi.org/20.500.14132/chris?noredirect --> https://www.dona.net/team/christophe-blanchi Digital Object Identifier Resolution Protocol (DO-IRP): https://www.dona.net/sites/default/files/2022-06/DO-IRPV3.0--2022-06-30.pdf...
Jan 18, 2023•1 hr 12 min•Ep. 27
https://en.wikipedia.org/wiki/Interatomic_potential
Oct 31, 2022•59 min•Ep. 26
DIKW pyramid / DIKW hierarchy - https://en.wikipedia.org/wiki/DIKW_pyramid "Data becomes information when it is stored *in* a given *formation*." From B. Fong and D. I. Spivak, “Seven Sketches in Compositionality: An Invitation to Applied Category Theory,” Ch. 3 - Databases, arXiv, Oct. 12, 2018. doi: 10.48550/arXiv.1803.05316. "There are only three things we can do with data. We can accrete data by adding it to an existing collection, reduce data by discarding information from an existing colle...
Sep 27, 2022•9 min•Ep. 25
`.split()`s on strings and `filter`s on `None` I fought the Law and the Law won I fought the Law and the Law won I needed spec compliance; I got none I fought the Law and the Law won I fought the Law and the Law won I varied my output with the latest fad Breakin' every downstream run Needed Postel more than I ever had I fought the Law and the Law won I fought the Law and the Scatterin' parsing like a shotgun I fought the Law and the Law won I fought the Law and the Law won I lost robustness and ...
Sep 07, 2022•1 min•Ep. 24
- Linked Data - Project Jupyter (Notebook, Lab, etc.) - UI Blocks: Block Protocol - Personal Knowledge Graphs: Roam , Logseq , Obsidian - Solid : decentralized data stores - Resource Description Framework (RDF) - Twitter: Martynas , AtomGraph - LinkedDataHub (Apache-2.0 license) - AtomGraph: Website , GitHub...
Aug 29, 2022•30 min•Ep. 23
I was thinking about FAIR-enabling resources and wanted to distinguish between things that actually have to be running in order for data to be alive and for you to actually find it, access it, interoperate with it, and reuse it, versus "one-time" things that those services will need.
Aug 19, 2022•10 min•Ep. 22
Just about a week ago, I set out to download. Seekin' supplementary data, lookin' for a pot of gold. Things got bad, and things got worse, I guess you will know the tune. Oh lord, stuck data mining again. Rode in on semantics, I'll be hand-waving out if I go. Trying controlled vocabularies, must've been seven of 'em or more. No corresponding authors have replied to my emails yet. Oh lord, I'm stuck data mining again. The man from Stack Overflow said I was on my way. My code kept raising exceptio...
Aug 09, 2022•2 min•Ep. 21
Oh give me mappings, lots of mappings, with resolving URIs. Don’t silo me in. Let me prance through semantics of namespaces that I love. Don’t silo me in. Let me use an open protocol to access these bytes, and for metadata promise me you’ll keep on the lights. Authenticate me repeatedly, but give clear usage rights. Don’t silo me in. Just give me data bare. Let me reuse my old CPUs and mint my URIs. With my own software, let me wander over yonder with least surprise. I want to probe the provenan...
Aug 04, 2022•1 min•Ep. 20
* [Materials Project](https://materialsproject.org/) * [Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE)](https://ess-dive.lbl.gov/) * [National Microbiome Data Collaborative (NMDC)](https://microbiomedata.org/) * [W3C Provenance (PROV) specs](https://www.w3.org/TR/prov-overview/) * [Research Equals (R=)](https://www.researchequals.com/) * [JSON-LD](https://json-ld.org/) * [Ecological Metadata Language (EML)](https://eml.ecoinformatics.org/) * [DataCite](https...
Jul 29, 2022•30 min•Ep. 19
Materials Project (MP) website: https://materialsproject.org/ Novel Materials Discovery (NOMAD) Laboratory: https://nomad-lab.eu/ Contributor Roles Taxonomy: https://credit.niso.org/ Authentication resources (FAIR A1.2): - https://portier.github.io/using.html - https://github.com/simov/grant - https://docs.konghq.com/ U.S. Department of Energy resources: - Office of Scientific and Technical Information (OSTI) Data ID Service: https://www.osti.gov/data-services - https://www.energy.gov/science/of...
Jul 21, 2022•53 min•Ep. 18
The FAIR Implementation Profile (FIP) Ontology: https://w3id.org/fair/fip/terms/FIP-Ontology
Jul 15, 2022•10 min•Ep. 17
Linked Open Vocabularies (LOV): https://lov.linkeddata.es/dataset/lov/ FAIRSharing: https://fairsharing.org/ PageRank of Linked Open Vocabularies (LOV): https://donnywinston.com/posts/pagerank-of-linked-open-vocabularies-lov/ Principles of Open Scholarly Infrastructure (POSI): https://openscholarlyinfrastructure.org/
Jun 20, 2022•8 min•Ep. 16
https://www.w3.org/TR/prov-dm/#dfn-provenance # Component 1: Entities/Activities: Type: Entity Type: Activity Relation: Generation/Invalidation (E-Act) Relation: Usage (Act-E) Relation: Communication (Act1-[E]-Act2) Relation: Trigger/Starter of Start of Act (trigger E, starter Act) Relation: Trigger/Ender of End of Act End of Act (trigger E, ender Act) # Component 2: Derivations: Relation: Derivation (E-E, E-Act) Relation: Revision (E-E) Relation: Quotation (E-E) Relation: Primary Source (E-E) #...
Jun 02, 2022•7 min•Ep. 15
The Creative Commons suite of licenses: CC0, CC BY, CC BY-SA, CC-BY-ND, CC BY-NC, CC BY-NC-SA, CC BY-NC-ND. Code licenses: Server Side Public License, Affero GPL (AGPL), Lesser GPL (LGPL), Mozilla Public License (MPL), Business Source License (used e.g. by Sentry, <https://github.com/getsentry/sentry/blob/master/LICENSE>), Elastic License (for Elasticsearch), Apache 2.0, BSD, MIT. Spectrum of user freedom and redistributor freedom. "The CRAPL: An academic-strength open source license": <...
May 25, 2022•12 min•Ep. 14
* https://queryunderstanding.com * http://contentunderstanding.com * https://www.w3.org/TR/json-ld11-framing/ * https://www.w3.org/TR/shacl/ * https://jasonformat.com/islands-architecture/ * https://www.hydra-cg.com/spec/latest/core/
May 18, 2022•9 min•Ep. 13
In the W3C Provenance Ontology: https://www.w3.org/TR/prov-o/#wasDerivedFrom The HTML Anchor Element: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a
May 12, 2022•6 min•Ep. 12
Heather Hedden, "Foundation for a Knowledge Graph Taxonomy Design Best Practices", slides at https://zenodo.org/record/6510205 Teodora Petkova, "The Dialogic Potential of the Web of Data", slides at https://zenodo.org/record/6518557 https://en.wikipedia.org/wiki/Bohm_Dialogue Tim Berners-Lee's bag of chips https://www.w3.org/TR/vocab-dcat-2/#Class:Dataset https://schema.org/Dataset
May 04, 2022•6 min•Ep. 11
GUPRIs, RDF, RDFS, OWL, SHACL, JSON, JSON-LD, JSON Schema, ActivityPub, "fediverse", XMPP, SMTP.
Apr 27, 2022•8 min•Ep. 10
Archival Resource Key (ARK) specification (section on policy metadata): https://datatracker.ietf.org/doc/html/draft-kunze-ark-34#section-5.1.1 . Permanence Levels and the Archives for NIH NLM's Permanent Web Documents: https://www.nlm.nih.gov/pubs/techbull/ma05/ma05_archive.html .
Apr 19, 2022•5 min•Ep. 9
A brief dip into the world of HTTP auth. The Authorization request header. The WWW-Authenticate response header. Basic authentication. Bearer-based authentication. Authenticating securely. Shared secrets versus asymmetric encryption (for non-repudiation).
Apr 13, 2022•5 min•Ep. 8
Protocol versus implementation. HTTP, SMTP, Zulip.
Apr 05, 2022•4 min•Ep. 7
You want to avoid protocols with limited implementation, poor documentation, and, when possible, components involving human intervention. It may not be possible to provide secure access through a fully mechanized protocol like HTTP, for example, for highly sensitive data. However, the protocol must be clear and explicit in the metadata, whether it involves a verbal request, email, telephone number, Slack username, et cetera. The important thing is that the communication protocol for how to acces...
Mar 29, 2022•3 min•Ep. 6
The goal here is leverage: increasing the ratio of machine action to user action in getting to the data that they want. Otherwise, your data is technically findable, but it's going to require a lot of user action. They might have to do a full data download, scan through a full table, scroll through a long webpage, and it's unlikely that they're going to actually find what they need, because they're just not going to put in that much effort. So you really want indexing. You want this leverage to ...
Mar 22, 2022•7 min•Ep. 5
Literature references with and without DOIs. Tables of data in articles with and without unique identifiers in each row for what that row is about. The magic of including identifiers in the metadata you share. The Data Catalog (DCAT) Vocabulary: https://www.w3.org/TR/vocab-dcat-2/
Mar 15, 2022•3 min•Ep. 4
Kinds of metadata - "intrinsic" (machine-defined or machine-controlled; immutable) and "extrinsic" (user-defined or user-controlled). Other-than-technical interoperability. "Quality" in the eye of the beholder / data consumer. Analogy to web-browser feature detection, and application to search engine "rich results".
Mar 08, 2022•3 min•Ep. 3
HTTP URLs orcid.org, doi.org, uniprot.org archival resource keys (ARKs) meta-resolvers: identifiers.org, n2t.net
Mar 01, 2022•7 min•Ep. 2
A rundown of what I'm planning: FAIRdowns, inside the Box, and FIP calls, oh my!
Feb 22, 2022•1 min•Ep. 1