The Census Dissemination Blog

September 5, 2011

Census Aggregate SDMX-RDF Linked Data

Filed under: SDMX — richwiseman @ 10:52 am

The 2001 Census Aggregate data is a complex major UK data set. In addition to providing a straightforward data extract in CSV (comma separated values) format, the CAIRD II project aimed to provide enhanced metadata to academics and researchers.

In line with the open data agenda, we want to make data more accessible and useful to app and website developers. Use of an open format rather than a proprietary format allows developers to more easily use data. Third party software and code libraries, many of which are free, greatly speeds up and simplifies the work to retrieve and incorporate the data.

The SDMX organisation provide a number of open formats for describing statistical data which are suitable. Some of this work has been adapted for use as RDF Linked Data by the publishing statistical data Google group and some of which is described by Jeni Tennison in her blog. The SDMX-RDF work expands the usefulness of SDMX data by describing the concepts of SDMX using the linked data framework – RDF. SDMX-RDF provides a ready to use framework that existing SDMX data can be readily connected into.

RDF provides the capability to link our data to other related information on the linked data web and also allows other providers to link to our information.

Geography information for the UK census can pose unexpected challenges. There are several hundred thousand areas, which when expressed as RDF data expands to one file of 2.8 Million lines of text (or 235,000 files of 16 lines of text).

We are expecting to gain access to the 2011 census data in the second half of 2012. To maximise the benefit and value from the latest census we would like to be able to compare the data with 2001 and even 1991 and 1981 censuses. The questions, geographic areas and the predefined options change from census to census and between England, Northern Ireland, Scotland and Wales. Metadata and comparability information are often requested and we expect this to be very useful to researchers. Use of the RDF Linked Data framework will provide us with a direct way to indicate comparability within and across censuses.

CDU team.

1 Comment »

  1. [...] CDU have published an interesting blog post explaining what they are doing with Linked Data and how they hope to apply it to the 2011 census. [...]

    Pingback by Census Aggregate Linked Data | mimasld — September 5, 2011 @ 3:03 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.