Doug's ramblings
Finding Messier objects in DBPedia via Spark

So, I was in the middle of planning a post discussing various ways of searching for Astronomical objects - in particular M31 - in DBPedia, when along comes an announcement of the Spark javascript library.

So, instead of working on the post, I created a simple web page to test it out, and it works. Lo and behold, I give you a dynamically-generated list of Messier Objects. The page took about 20 minutes to create, and most of that was faffing around with the CSS and deciding exactly what properties I wanted to show.

This is cool. Not quite as cool as going for a ride into space, but still, not bad at all.

More to come later…

Astronomy and the Semantic Web

Astronomy, Semantics and Linked Data

So why, you might ask, am I - an Astrophysicist - interested in semantics, specifically the semantic web and linked data. Am I not just buying into the AI hype of the 80s, repurposed for the 21st century? It’s quite possible, but if we ignore some of the more extreme claims made about this approach, there is already plenty that we can do to help us in our work, as shown by the Virtual Solar and Terrestial Observatory. I encourage you to read Jim Hendler’s Nature Network blog article What is the Semantic Web really all about?, for an overview of this area.


An obvious place to start is the literature since we are fortunate to have two major portals for accessing this data online - pre-prints at and published material at ADS - which are already inter-linked and provide a wealth of data that we can take advantage of. There are a number of interesting developments in this space - some more “semanticy” than others - including (but not limited to) ADS labs, Mendeley, Zotero, arxivsorter - and then there’s work that I am involved in, funded by the US VAO project, to combine literature and data into a bowl of semantic goodness at the ADS. I’ll leave further discussions on this to a later post, since what I really wanted to talk about was Outreach.

Astronomy Outreach

I have been convinced for several years now that Semantic technologies can, and should, be used to improve access to Education and Outreach efforts in Astronomy and Astrophysics. The application of simple classification schemes1 such as the Ontology of Astronomical Object Types to data collections including Astronomy Picture Of the Day, astronomical images on flickr, images from the Las Cumbres Observatory Global Telescope Network and the press releases from observatories like Chandra can, I believe, enhance the reader’s understanding and enjoyment of the material, as well as providing more avenues for enquiry and understanding.

In my spare time I’ve explored several avenues - such as semantic tagging with Astro MOAT - and have grand visions, but have yet to get anything really working.

A case study

One of the first examples in this area (that I know of), has been provided by Stuart Lowe, who combined observational data from the Las Cumbres Observatory telescope network with the education data provided by the UK Government to look at where the users of the telescopes are, at least for schools in England and Wales.

For this particular example I’m sure Stuart would have found it easier just to be able to access simple tabular data sets2 rather than learn a whole new technology stack3. In fact, I think this is often true of many “simple” semantic applications; there’s always a simpler approach if you already understand the format and layout of the data you are given. The reason for using these semantic technologies is when you do not know, ahead of time, exactly what you want or how the data is stored4. In this regard I see Semantics as being synonymous with “Data Integration and Exploration”, but I don’t really see this particular Three-Letter Acronym becoming very popular.

I have been noodling around trying to use the vocabularies we are writing for the ADS project - in particular those dealing with observational data - to try and model the LCOGTN data such as this observation of 30 Doradus by Shoeburyness High School. This work has fallen by the way side, so I’m hoping to write up my steps following the approach of Keith ALexander’s LOD by hand, as a way of re-starting the work. If I do, you’ll be the first to know :-)

  1. Here simple refers to the fact that the scheme does not rely on the more expressive parts of Semantic technologies, such as application of rules or logical inference capabilities; I know that creation of these schemes is not an easy job. 

  2. There are arguments that you do not need data in RDF - which we can assume for the purpose of this post to be magical pixie dust that makes everything just work - to take advantage of semantic technologies (e.g. the blog and twitter feed of Kingsley Idehen), but at present most of the supporting infrastructure and documentation assume the presence of RDF. 

  3. This includes, but is not limited to RDF, SPARQL and the linked data cloud

  4. This does not come for free, unfortunately, since someone, somewhere, has to do all the curation and provide access to the data.