Open data's potential for political history.

Author:Milligan, Ian

The recent trend of "open government" initiatives has provided an exciting new source of material for digital humanities researchers. Large datasets allow these scholars to engage in "distant reading" exercises to provide context in ways previously not possible. In this article, the author provides examples of the tools researchers can use to expand their understanding of the country's political history and of the changing nature of parliamentary institutions and debates. He concludes with suggestions for ways to gain the maximum benefit from these data releases.


What could we learn if we read every word of the federal Hansard and explored how the frequency of various 'topics' rose and fell over time? Or, what types of trends might we see if we were able to know the occupation of every candidate for office since 1867? What kind of heretofore unknown value can be discovered in these sorts of extremely large datasets? The answers to all of these questions are promising.

New and newly digitized datasets from parliamentary sources offer considerable potential for historians, political scientists, and other researchers interested in political history. The rise of digital humanities--a hard-to-define and nebulous grouping of humanities scholars who explore the possibilities offered by new media and emerging technologies and present fascinating methods to approach analyzing large quantities of information--as well as exciting releases in the 'open data' sphere, combine to offer new opportunities for understanding the past. In this piece, I highlight some of the possibilities that large datasets present to people interested in parliamentary history, and conclude with suggestions about what governments and funding agencies can do to support this emerging field of research.

Open Government and the Digital Humanities

'Open data' is the idea that data should be made publicly available for use by anyone for any purpose, including reusing the data, modifying it, and building platforms upon it. 'Open data' is married to the concept of 'open government'--the idea that the people of a country should be able to access, read, and manipulate (in their own applications and on their own terms) the data that a country generates. The current federal government aggressively moved in this direction with the 2011 launch of the Open Government Initiative. (1) When people think of 'open data,' historical research probably does not immediately come to mind. In general, most open data releases tend towards the scientific, the technical, or the immediately applicable: bus route information, for example, or geospatial information about various zoning or infrastructure placements. However, some of these new data releases are increasingly relevant to historians, including the ones alluded to above--all candidates for federal political office, the frequency of words appearing in transcripts from parliamentary debates, etc.

Prior to the advent of these types of initiatives, many humanists would not be able to access these large arrays of information. The dawn of the era of the digital humanities has opened up new exciting possibilities for analysis, however. In English literature, for example, literary scholar Franco Moretti argues for "distant reading" to help understand the rise of the Victorian novel; rather than focusing efforts on a corpus of some two hundred or so books, we can use computational methods to study tens of thousands of novels at once. (2) While it is still important to read individual books to test theories and explore prose, we cannot read all of them; distant reading lets us further contextualize the ones that we do read.

Using a few parliamentary datasets as examples, let's see some of what a digital humanist can do with access to all of this data.

Topic Modeling and Distantly Reading "Hansard," 1994-2012

The federal government has made its full transcripts of debates since 1994 available online. (3) The transcripts form a relatively large, but not insurmountable, amount of full-text data: 800 megabytes of plain text. Yet it would be nearly impossible to read all of this text, especially if you wanted to be able to do anything else with your time!

We can, of course, query it with full-text searching. Many of us have been doing these types of searches for years, and to good effect in published scholarship on parliamentary history. But meaningful full-text searching is always difficult to carry out; a researcher must know what to look for with a fairly high degree of certainty. Using colloquial keywords, shorthand terms or perhaps being ignorant of a single typographical mistake, can lead to many missed results. Often a researcher would need to know a lot about a topic before hitting the search bar. More so, full-text searches in some search engines can skew results, given the algorithms that underlie the search function; results are being ranked in a way that most scholars do not understand. (4) If, however, a scholar is looking for specific discussions, whether it is a particular name of a labour strike or a specific piece of legislation, full-text search can be extremely useful. To try a full text search of Hansard, visit housechamberbusiness/ChamberHome.aspx and click on "Search and Browse by Subject" in the left-hand column.

Researchers can repurpose the plain text used in subject searches to manipulate and explore these Hansard records...

