(Note to web-viewers - press space to advance a slide. Hit 'S' to see speaker notes)

VOEventDB

and

Sustainable Software

Tim Staley / 4pisky.org

Hotwiring the Transient Universe V

Villanova, PA, Oct 2016

Sustainable software?

Buzzword alert

Sustainable software?

  • can be used by others
  • can be reused outside original context
  • can be modified by other devs
  • is robust to changing dependencies

NB: kind of a Platonic ideal.

Why is this relevant?

  • There have always been large multi-person / long-term software projects in astronomy.
  • (I think) we're seeing an explosion in the number of 'smaller' public codes (cf ASCL)
  • This is a really good thing, but comes with difficulties of success
  • The easier it is to evaluate, re-use, modify and recycle these codes, the better.

What I'll try to cover

  • VOEventDB - what it's for, what it does
  • A few items on the 'sustainable software checklist'
  • (Python) Tooling to make your life easier

VOEventDB, in brief

Context

  • VOEvent is a standardised format for astronomical transient alerts.
  • NASA-GCN have been transmitting alerts in this format for over 2 years.
  • Previously, there was no public archive for alerts in this format.

  • VOEvent standard has always referred to a 'registry' of 'repositories' - clear gap to fill.

This a problem!

  • Difficult to plan a follow-up strategy if you can't back-test it
  • No way to know what's out there, what the rates are like
  • Impossible to check for missed alerts

... also helps with converting web-pages into VOEvent feeds.

VOEventDB: Spec

  • Store raw VOEvent XML, provide XML content at a persistent URL
  • Store a common subset of VOEvent metadata in regular database schema
  • Make queries based on this common subset
  • Including spatial (cone-search) and citation-based queries
  • 'RESTful' web-API
  • Python client-library for remote-queries

Reusable, decentralized

  • Agnostic about inputs and outputs
  • Easy for any team to set up their own local repository

Schema

schema

Implementation

  • Postgres + SQLAlchemy
  • Spatial queries powered by qc3 Postgres extension.
  • Flask-powered RESTful interface
  • Partially-autogenerated documentation.
  • Extensive test-suite using pytest fixtures.

Getting started: Client installation

In [3]:
!pip install voeventdb.remote --quiet 
In [26]:
import voeventdb.remote.apiv1 as api
api.count()
Out[26]:
1271241
In [19]:
api.map_stream_count()
Out[19]:
{u'com.dc3/dc3.broker': 22570,
 u'nasa.gsfc.gcn/AGILE': 6174,
 u'nasa.gsfc.gcn/AMON': 4,
 u'nasa.gsfc.gcn/CALET': 79,
 u'nasa.gsfc.gcn/CAlet': 1,
 u'nasa.gsfc.gcn/COUNTERPART': 113,
 u'nasa.gsfc.gcn/Fermi': 41556,
 u'nasa.gsfc.gcn/GRO': 6569,
 u'nasa.gsfc.gcn/HETE': 6014,
 u'nasa.gsfc.gcn/INTEGRAL': 33120,
 u'nasa.gsfc.gcn/IPN': 486,
 u'nasa.gsfc.gcn/KONUS': 449,
 u'nasa.gsfc.gcn/MAXI': 6369,
 u'nasa.gsfc.gcn/MOA': 1553,
 u'nasa.gsfc.gcn/SNEWS': 44,
 u'nasa.gsfc.gcn/SUZAKU': 17,
 u'nasa.gsfc.gcn/SWIFT': 1117763,
 u'nasa.gsfc.gcn/UNRECOGNIZED_TYPE': 2,
 u'nvo.caltech/voeventnet/catot': 66,
 u'nvo.caltech/voeventnet/mlsot': 147,
 u'svomcgft.naoc/VOEVENTTEST': 3091,
 u'voevent.4pisky.org/ALARRM-OBSTEST': 5780,
 u'voevent.4pisky.org/ALARRM-REQUEST': 42,
 u'voevent.4pisky.org/ASASSN': 1747,
 u'voevent.4pisky.org/GAIA': 1272,
 u'voevent.4pisky.org/TEST': 10,
 u'voevent.4pisky.org/TEST-RESPONSE': 14,
 u'voevent.4pisky.org/TEST-TRIGGER': 14,
 u'voevent.4pisky.org/voevent-broadcast': 7761,
 u'voevent.4pisky.org/voevent-receive': 7724,
 u'voevent.phys.soton.ac.uk/AMI-REQUEST': 84}
In [20]:
filters={api.FilterKeys.role:'observation'}
api.map_stream_count(filters)
Out[20]:
{u'nasa.gsfc.gcn/AMON': 4,
 u'nasa.gsfc.gcn/CALET': 79,
 u'nasa.gsfc.gcn/CAlet': 1,
 u'nasa.gsfc.gcn/COUNTERPART': 113,
 u'nasa.gsfc.gcn/Fermi': 8072,
 u'nasa.gsfc.gcn/INTEGRAL': 1111,
 u'nasa.gsfc.gcn/IPN': 486,
 u'nasa.gsfc.gcn/KONUS': 449,
 u'nasa.gsfc.gcn/MAXI': 269,
 u'nasa.gsfc.gcn/MOA': 1553,
 u'nasa.gsfc.gcn/SUZAKU': 17,
 u'nasa.gsfc.gcn/SWIFT': 1042121,
 u'nvo.caltech/voeventnet/catot': 66,
 u'nvo.caltech/voeventnet/mlsot': 147,
 u'voevent.4pisky.org/ASASSN': 1747,
 u'voevent.4pisky.org/GAIA': 1272}

Back up a moment...

In [5]:
!pip install voeventdb.remote
Collecting voeventdb.remote
Collecting requests (from voeventdb.remote)
  Using cached requests-2.11.1-py2.py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): six in /home/staley/.virtualenvs/jupyterpres/lib/python2.7/site-packages (from voeventdb.remote)
Requirement already satisfied (use --upgrade to upgrade): simplejson in /home/staley/.virtualenvs/jupyterpres/lib/python2.7/site-packages (from voeventdb.remote)
Requirement already satisfied (use --upgrade to upgrade): astropy in /home/staley/.virtualenvs/jupyterpres/lib/python2.7/site-packages (from voeventdb.remote)
Requirement already satisfied (use --upgrade to upgrade): pytz in /home/staley/.virtualenvs/jupyterpres/lib/python2.7/site-packages (from voeventdb.remote)
Requirement already satisfied (use --upgrade to upgrade): iso8601 in /home/staley/.virtualenvs/jupyterpres/lib/python2.7/site-packages (from voeventdb.remote)
Requirement already satisfied (use --upgrade to upgrade): numpy>=1.7.0 in /home/staley/.virtualenvs/jupyterpres/lib/python2.7/site-packages (from astropy->voeventdb.remote)
Installing collected packages: requests, voeventdb.remote
Successfully installed requests-2.11.1 voeventdb.remote-1.0.0

What just happened?

  • Pip fetched the relevant source-code package from the Python Package Index
  • Read setup.py, parsed the list of dependencies
  • Checked what's currently installed, fetched those missing (possibly from the local cache).
  • Installs each of those dependencies in turn,
  • then installs our package

Packaging

  • Encourages re-use as a component
  • Removes 'install friction': just add a package to your requirements list
  • Adoption has historically been slowed due to fragmented ecosystem, lack of good docs.
  • Good, short, up-to-date tutorial on packaging your code: http://python-packaging.readthedocs.io/

One snag...

setup.py:

In [ ]:
#!/usr/bin/env python
from setuptools import setup, find_packages

install_requires = [
    'iso8601',
    'pytz',
    'requests',
    'simplejson',
    'astropy',
    'six',
]
packages = find_packages()
setup(
    name="voeventdb.remote",
    version=0.1,
    description="Client-lib for remote queries...",
    author="Tim Staley",
    author_email="[email protected]",
    url="https://github.com/timstaley/voeventdb.remote",
    packages=packages,
    install_requires=install_requires,
)

Package Versioning

Versioneer:

  • Adds a standalone Python module to your codebase
  • Automatically sets version number according to most recent git-tag
  • Git commit-id also available as a string in your library.
  • Super convenient, keeps everything in sync

setup.py with Versioneer:

In [ ]:
#!/usr/bin/env python
import versioneer

setup(
    name="voeventdb.remote",
    version=versioneer.get_version(),
    cmdclass=versioneer.get_cmdclass(),
    "...",
)
In [22]:
import voeventdb.remote
print("Git tag:", voeventdb.remote.__version__)
print("Git commit-id:", voeventdb.remote.__versiondict__['full-revisionid'])
Git tag: 1.0.0
Git commit-id: 02b727d168797a9ae9bc6835c15b37e384ea1557

Documentation

Minimal docs:

  • Description of what your package does (+ links for context!)
  • One or two brief usage examples
  • One big README is typically fine

Extended docs:

Read The Docs:

  • Free hosting for Sphinx-generated documentation
  • Links to a Github repository
  • Every git-push results in a new documentation build

Documenting examples

  • Examples are very useful... until the code changes and they go stale
  • Python notebooks are a great format for writing examples - but tricky to publish.

Documenting examples with nbsphinx & RTD

  • nbsphinx lets you generate docs from notebooks.
  • The notebooks are re-run with every docs-build - so if the examples are broken, you'll notice.
  • This is how the voeventdb client-docs are generated.

Deployment & Hosting

For multi-component systems, deployment details are crucial.

deploy

  • Deployments scripted with Ansible
  • Lack of software development training for grad-students

    (What do we drop, to replace with software-carpentry?)

  • Lack of long-term career-path for 'research software engineers'

(This is changing, slowly, e.g. UCL's RSE team)

Summary

VOEventDB

  • Provides a 'turn-key' queryable repository for transient alerts.
  • Can be used as a remote service
  • Or run your own
  • Overview paper: arXiv:1606.03735

Packaging

  • Make use of your packaging ecosystem
  • Think about use of your code as a component
  • Keep versioning information in your version control system! - automate package versioning

Documentation

  • Minimum: description + example usage + install requirements
  • Documentation goes stale - test your examples
  • In Python, notebooks are a great format for this - try nbsphinx!

Deployment

  • Docs are a start, but easily go out of date
  • Automate!

Fin

Thanks!