Curated Data-Driven Narrative

There are two goals when presenting data: convey your story and establish credibilityEdward Tufte


The objective of this project was to analyse data from publicly available datasets in order to set a narrative in motion with visualising accompaniment. The Central Statistic Office’s 2017 introduction of an online application based around baby names in Ireland served as inspiration for this assignment. My paternal grandparents were Polish and moved to Ireland shortly after World War 2. In keeping with Polish tradition, my parents christened me Antonina in 1979, and my brother Karol in 1981. Both were highly uncommon names in that era in Ireland. In more recent years, I have noticed an increase in the names Antonina and Karol in today’s society. This assignment will focus on these two names and measure the popularity of their usage over the years in Ireland. In tandem, it will examine and chart the presence of Polish people in Ireland. These findings will be presented side by side and connections will be illuminated, and any outliers or factors that arise along the data journey will be examined.

Thanks to the popular and intuitive CSO application of Baby Names in Ireland, there is significant data on hand pertaining to the subject. It was necessary to filter down the Girls/Boys Names in Ireland with 3 or more Occurrences by Name, Year and Statistic (1964 – 2017) datasets to specific boundaries of name (Antonina and Karol) and time limit. The parameters of 1970 to 2017 were chosen, as any years previous to 1970 proved redundant in nature due to lack of relevant data. This data was then downloaded in a CSV format and structured within the Excel file so as to prepare it for the visual rendering process. Experimentation with Tableau, Datahero, Voyant and RAW’s various capabilities was an educational exercise in emphasizing the manner in which various tools are undeniably suited for specific data and purposes. Ultimately, Datawrapper proved to be a best fit for the purpose of the data being interrogated and showed itself to be an intuitive and user/reader friendly option. The column format below was chosen in an initial analysis of the data for its ability to display the information proportionally and convey to the audience concrete numbers of names in a given year.

Fig. 1: Babies named Antonina from 1970 – 2017 in Ireland. Get the data here.

Fig. 2: Babies named Karol from 1970 – 2017 in Ireland. Get the data here.

Information on the numbers of Polish people within Ireland over the same time period proved to be more elusive in nature (concrete data on the years prior to 2000 proved impossible to locate) and the decision was made to base the interrogation upon allocation of PPS numbers in the State. acts as a portal for the Department of Employment Affairs and Social Protection page which grants access to data on the allocation of PPS numbers via a collective chart of non nationals, ranging from 2000-2009 and, individually, the years thereafter. It was necessary to mine the data pertaining to Polish nationals into one complete Excel sheet. In order to create a streamlined comparison for this study, and operating on a position of full disclosure and transparency, the decision was taken to include the earlier years of 1970 – 2000 and enter these years as null of entries, as seen on Fig. 3.

Fig. 3: Allocation of PPS numbers to Polish people, 1970 – 2017. Get the data here.

The resulting data from Fig. 1 shows that there were no babies registered Antonina in 1979. The name falls under what the CSO deems a “limit of discretion/uncertainty” indicating that there were 3 or less babies named thus in the years preceding 2007 and the name was duly omitted by the CSO due to confidentiality reasons. 2007 witnesses the entry of Antonina into the chart, with 8 females being registered that year. 2017 sees the name peak in popularity at 19.

More variation is evident in the second visualisation, Fig. 2, which sees a significant spike in popularity in the name Karol, evident in 1979, which then continues over several years before waning in existence and re-emerging again in 2005 in a steadier guise.

Fig. 3 demonstrates the economic impact of EU membership on the number of Polish in Ireland over the years. 2004 saw Poland gaining EU membership, and Ireland granted Poland full access to the labour market at this time. The inflow of Polish entering Ireland peaked in 2006 at 93,787 and until the recession in 2008 numbers of Polish in Ireland accessing PPS numbers remained relatively high and sustained.

The use of line graph in Fig.’s 4 and 5 is conducive in showing the shape of the trends over time. The dual peak in popularity of both names is undeniable over the time period between 2004 and 2017. This peak is observed in a similar manner in Fig. 5 which visualises the arrival of Polish people on the shores of Ireland. The striking similarity of the data, seen through the narrative structure of the line graph, holds statistical significance and establishes connection between the occurrences.

Fig. 4: Comparison of Antonina and Karol from 1970 – 2017

Fig. 5: Allocation of PPS numbers to Polish people in Ireland 1970 – 2017

An interesting outlier is evident in the earlier years of Fig. 2 and Fig. 4 and warrants further attention. An examination of the events in Ireland at that time points to a link between the visit from the Polish Pope John Paul II (Karol Jozef Wójtyla), in 1979, and the peak in the name Karol. The Papal visit had huge impact upon the people of Ireland and this is certainly reflected in the choice hundreds of Irish families made in naming their baby boys this uncommon name for the time. This impact is concretely supported by the data and corresponding visualisation. The name Karol lingered in residual popularity for several years following 1979, before the visit faded from the public consciousness to some extent, and the name shows a sharp downturn in usage in 1984.

Additionally, in Fig. 2 and Fig. 4, a more subtle increase may be witnessed further within the second spike in popularity (after the advent of the Polish arrival in Ireland) in the popularity of the name Karol. This outlier may well have remained undetected without the aid of this visualisation. In the year 2011, we see a significant increase in the presence of the name. An investigation into cultural events at the time points to a specific influence in this respect. The formal beatification of Pope John Paul II took place in 2011 and appears to influence the baby naming choices of the year. While it may be said that the spike is significantly minor so as not to lead to a solid conclusion, it would be negligent not to highlight the finding in light of its relevance to this project.

Data visualisation expert Edward Tufte famously critiqued the adage “Correlation does not imply causation.” The peak in popularity of the name Karol seen in the years surrounding the Papal visit is highly evident. The increase in registration of the name in the year of the beatification, combined with the fact of the Pope’s Polish heritage and the presence of a Polish population in Ireland, highlights a subtle further correlation and implies causation between certain baby naming practices and Papal influence in Ireland in this instance.

Edward Tufte, Correlation/Causation, print on canvas, via

Finally, in order to further emphasise the definitive influence of the 1979 visit upon baby names in Ireland, a supplementary investigation into two additional names was conducted.

Fig. 6: Babies named John/Paul from 1970 – 2017 in Ireland.

The line graph evident in Fig. 6 demonstrates the popularity of the names John/Paul, and the spikes in popularity seen in Fig. 6 directly mirror the arc of the name Karol across the same years as seen in Fig. 4, indicating, once again, the influence the Papal visit exerted. Tufte states “Empirically observed covariation (correlation) is necessary but not a sufficient condition for causality. Correlation is not causation, but it sure is a hint.”

In charting the popularity of specific names over a time frame in Ireland, and placing them in the context of economic and cultural shifts and occurrences, this data story has elucidated some specific instances of influences upon baby naming practices in society. Additionally, it showcases the manner in which large datasets may be mined and rendered in order to produce engaging narratives and visualisations for a wider audience.





“Baby Names of Ireland.” Central Statistics Office.  Accessed 15 Feb. 2018.

Brule, Joshua. “A causation coefficient and taxonomy of correlation/causation   relationships.” Semantic Scholar, 05 Aug. 2017. Accessed 18 Mar. 2018.

“Design Principles.” Data Depiction. Accessed   17 Mar. 2018.

Edward Tufte.  Accessed 10 Mar. 2018.

“Enrich your story with charts, in seconds.” Datawrapper.  Accessed 14 Mar. 2018.

“Statistics on Personal Public Service Numbers Issued.” Department of Employment Affairs and Social Protection, 22 Mar. 2017 Accessed 16 Feb. 2018.



Digital Preservation and ‘The Crossing’

A widely misconceived notion seems to be that with the dawn of the digital, what is portrayed through this medium will accompany us into the future, with no expiration date. But where is the guarantee that we will be able to read the news of today on the computers of tomorrow? The 12th International Conference on Digital Preservation was held in November 2015 in North Carolina. Participants were asked why they thought digital preservation was important. The answer is not immediately obvious to the majority of people.

Attendee Alice Sara Prael, John F. Kennedy Presidential Library and Museum, succinctly puts it as such “Our cultural heritage is being saved in digital formats now and it used to be that history was saved by what didn’t get thrown away and that’s not a strategy that works anymore because if we leave our digital (as it is) and hope it will live for another hundred years, it won’t.” (Coursera)

Elaine Harrington, in the UCC October 2017 colloquial, touches on this matter. Researchers are, naturally enough, focused on the research they are doing and documenting this content. However, there is a question around how we can keep this data secure moving into the future. It needs to be accessible, whether it be news sources, data, or scientific research.

Adrienne LaFrance writes in her article Raiders of the Lost Web that “today’s great library is being destroyed even as it is being built.” She tells the story of how, in 1985, a budding journalist named Kevin Vaughan was haunted by a story he came across of a school bus collision in 1961 in Colorado, where 20 children lost their lives. In 1992, then an investigative journalist himself, he tracked down the name of the bus driver involved. In 2006, his editors at the Rocky Mountain News agreed to let him pursue the trail. The series he wrote came to be known as The Crossing.

Colorado was still suffering the ramifications of the 1999 Columbine shooting. The far reaching story of The Crossing elicited a mass wave of empathy in the region. Vaughan’s emotive narrative had significant impact on the lives of people in Colorado in their identification with the terrible loss. Vaughan assumed his piece would live forever. Its digital presence would guarantee this. Or so he thought…

In 2008, Vaughan was named a finalist for the Pulitzer Prize for his work on the 34 part multimedia series. In 2009, the Rocky Mountain News went out of business, and the website followed soon after. The Crossing, and the tale it told, disappeared and fell victim to the passage of time once again.

“There’s this gradual trend toward more and more access, and of course electronic media provides the easiest and cheapest access to information that we’ve ever had on the planet…but it’s also the most easily lost. We’ve always had this tradeoff between permanence and accessibility.” ( Edward McCain, digital curator of journalism and founder of Dodging the Memory Hole at the Donald W. Reynolds Journalism Institute and University of Missouri Libraries.) ( “How To Preserve”)

Vaughan’s initial research had been conducted from work resurrected from dusty old boxes in forgotten warehouses. But with the collapse of the paper and resulting vanquish of the website, his own digital research and documentation disappeared without a trace.

Abbey Rumsey, writer and digital historian says, “There are now no passive means of preserving digital information. In other words if you want to save something online, you have to decide to save it. Ephemerality is built into the very architecture of the web, which was intended to be a messaging system, not a library.” (“Raiders”)

Six years after The Crossing disappeared, Vaughan re-introduced it to the Web. By serendipity alone, he had back-up. After the publishing of The Crossing, he had been asked to introduce some presentations on it, and had four DVDs made to serve this purpose. The series had been built using HTML4, and he had what he needed to rebuild the site. His son, who was studying electrical engineering and computer science at the time, took the project on, and rebuilt it according to the new standards and software now available.

Obsolescence is embedded into technology. When Sir Tim Berners-Lee formulated the web twenty years ago, he saw it as a forum for the good of mankind, where data was available for research, where all of this information could network and be democratized for the people of the world. But what we are seeing more and more of are useful ideas and research disappearing.

The problem lies in large with the fact that there is no one trusted digital vault into which content can be stored and used for future access. “Terms of service for nearly every free platform…make absolutely no promise regarding digital preservation or even the return of content to users in event of business failure or (elimination of) service.” (Elaine Harrington)

The obstacles are manifold: matters of privacy, staffing and skillsets, storage space, issues of access….The sheer volume of information we now produce is in direct competition with our capacity to preserve it and archive it for future use.



Coursera. “Why Is Digital Preservation Important?” Coursera (n.d.) Web. Nov. 2017.

Hare, Kristen. “How to Preserve Your Work Before the Internet Eats It.” Poynter, 14 Jan. 2016. Web. Nov. 2017.

Harrington, Elaine. “Preserving the Libraries of the Future.” Digital Humanities Research Colloquium, 18 Oct 2017, DH Active Learning Space, UCC. Guest Presentation.

LaFrance, Adrienne. “Raiders of the Lost Web.The Atlantic, 14 Oct. 2015. Web. Nov. 2017.




The Cost of Free

Almost twenty years ago, in The Control Revolution, Andrew Shapiro outlined two potential paths that the Internet could take. The first was a more positive tale of an “increased individual freedom.” However, the second had a more cautionary tone. It warned of institutions harnessing the power of the network and exerting their influence over us as consumers.

Aleks Krotoski
By Paul Downey via Wikimedia Commons

Aleks Krotoski in The Virtual Revolution, The Cost of Free, broadcast by BBC 2, reports on the development of the World Wide Web in the last twenty years and mirrors this cautionary tone. Users access vast, incalculable amounts of information on a daily basis, and the majority of us take this great ‘commodity’ for granted. Countless hours on Google, Facebook, Twitter…… Krotoski argues that there is a heavy price to pay for these interactions. Douglas Rushkoff, author of Life Inc., states: “The product on line is not the content. The product on line is you.”

When Tim Berners-Lee invented the World Wide Web in 1989 he saw it as an open forum, without boundaries, where information could be shared freely. Stephen Fry, in Krotoski’s The Cost of Free, furthers this notion, saying: “It seemed like a new democracy, of people coming together.”

Stephen Fry
By Marco Raaphorst from The Hague via Wikimedia Commons

However, in 1994, the United States congress lifted the injunction on Web Commerce, and change came about rapidly. Change that was to affect us all deeply. That free content that is available to us on tap? We receive it due to our willingness to sign away our personal data. All those minutiae that may be seen as having little value in the moment are, in fact, priceless. The surveillance that we are constantly under is the price we pay for the ‘free’ services we access on an almost constant basis. Our personal information is that which is being traded.

AdWords is the model implemented by Google whereby advertisers are enabled to target and filter their audience. Google have become the most powerful company in the world simply by using our search preferences and refining their advertising models.

Wikimedia Commons

What Google deems us to be interested, this is what we find in our searches. A barrier has been erected towards the discovery of new things. Krotoski proposes that this system denies us the very ‘serendipity’ that the web originally offered. As the algorithm gets to ‘know’ us more, we are cutting off and marginalizing our options and confining them in the direction Google wants us to take. Eric Schmidt, CEO of Google, in an attempt to put a jaunty, positive spin on the process, utilizes a neat turn of phrase to describe it: “It’s not a broadcast mechanism. It’s a narrowcast mechanism.”

Eric Schmidt
By Guillaume Paumier via Wikimedia Commons

It could be argued that ultimately the use of targeted advertising will lead to the de-personalisation and homogeneity of the audience and consumer. There are implications looking to the future as to how we will identify ourselves, but we must also look at and consider the vast reserve of information that is being stored indefinitely, where it is being held and who has access. And how could it potentially be used?


The Cost of Free. The Virtual Revolution. Dir. Dan Kendall. BBC, 2010.

Lessig, Lawrence. The Future of Ideas. Random House, 2001.



Jerome McGann

McGann reports that there is an educational emergency as a result of the growth of digital. He draws a very pertinent link between our current situation and that of the humanists in the 15th century in outlining the upheaval we are faced with and the reassessment we must make of all of the tools and methods at the core of our current knowledge production.
Jerome McGann via Youtube

An entire re-editing of our archive of cultural works within a network of digital storage and access is predicted in the next fifty years by McGann. A main concern of his is that the current educational system is not equipped to undertake this overhaul/mission. It is interesting to note that those who have the most at stake in this movement are the least involved. He despairs “not a person in the room seemed to know what TEI was” at a meeting of the editorial board of Critical Inquiry. He refers to an apartheid being in place between literary and cultural studies and calls for an intertwining of the two moving forward in education, particularly in the US.

Print culture, which has been to some extent relegated since the proliferation of the digital, is here given justified praise. He is of the opinion that we must reengage with print culture on our journey into the digital realm, looking specifically at the bibliographical interface and its mode of organization. Using the bibliography as the launch pad, we can then progress beyond traditional conventions, the digital can build and feed upon this original format. McGann envisions an exciting quantum world becoming thus available, encompassing ideas and theories which are by their very nature “inexhaustible,” ever changing and growing.


McGann, Jerome. “A Note on the Current State of Humanities Scholarship.” Critical Inquiry, vol. 30, no. 2, 2004, pp. 409–413.

“Professor Jerome McGann – Truth and Method.” YouTube, uploaded by Crassh Cambridge, 15 May 2015,




My View on the Digital Humanities Manifesto 2.0

Through my reading of the Manifesto, it is apparent that there is a need for a redefinition of the connotation of ‘success’ as we traditionally view it. As the Manifesto puts forth, process is the new god, not product. We need to break the norm of only valuing the end result, and place value on the entire process of learning, discovery, creation etc. I enjoyed the point made that the university library must hence be viewed as a lab, and traditional hierarchy must be broken with in order for the student to be recast as scholar, and vice versa.

This enforces the idea put forth in the Manifesto of the importance of collaboration and community in Digital Humanities moving forward as an inclusive model, where we can all learn equally from one another.

If one of the aims of Digital Humanities is to be inclusive, then on a positive note, the formatting of the article is certainly more inviting and engaging to a wider audience, in comparison to a traditional humanities document which can be austere and intimidating in tone and format. I personally must admit to finding the overall look a little amateurish, as if the tools used were ACTUAL paper, paste and scissors!

I had a definite sense of multiple authorship throughout the article but not to the extent of a hundred plus contributors. In that regard, the editing work must be applauded, but I do wonder, to what extent did the editors have a say in the shaping of this Manifesto? Towards making their individual voices or feelings heard? It is very difficult to edit 100% objectively, one could argue impossible.

Translate »