Curated Data-Driven Narrative

There are two goals when presenting data: convey your story and establish credibilityEdward Tufte

 

The objective of this project was to analyse data from publicly available datasets in order to set a narrative in motion with visualising accompaniment. The Central Statistic Office’s 2017 introduction of an online application based around baby names in Ireland served as inspiration for this assignment. My paternal grandparents were Polish and moved to Ireland shortly after World War 2. In keeping with Polish tradition, my parents christened me Antonina in 1979, and my brother Karol in 1981. Both were highly uncommon names in that era in Ireland. In more recent years, I have noticed an increase in the names Antonina and Karol in today’s society. This assignment will focus on these two names and measure the popularity of their usage over the years in Ireland. In tandem, it will examine and chart the presence of Polish people in Ireland. These findings will be presented side by side and connections will be illuminated, and any outliers or factors that arise along the data journey will be examined.

Thanks to the popular and intuitive CSO application of Baby Names in Ireland, there is significant data on hand pertaining to the subject. It was necessary to filter down the Girls/Boys Names in Ireland with 3 or more Occurrences by Name, Year and Statistic (1964 – 2017) datasets to specific boundaries of name (Antonina and Karol) and time limit. The parameters of 1970 to 2017 were chosen, as any years previous to 1970 proved redundant in nature due to lack of relevant data. This data was then downloaded in a CSV format and structured within the Excel file so as to prepare it for the visual rendering process. Experimentation with Tableau, Datahero, Voyant and RAW’s various capabilities was an educational exercise in emphasizing the manner in which various tools are undeniably suited for specific data and purposes. Ultimately, Datawrapper proved to be a best fit for the purpose of the data being interrogated and showed itself to be an intuitive and user/reader friendly option. The column format below was chosen in an initial analysis of the data for its ability to display the information proportionally and convey to the audience concrete numbers of names in a given year.

Fig. 1: Babies named Antonina from 1970 – 2017 in Ireland. Get the data here.

Fig. 2: Babies named Karol from 1970 – 2017 in Ireland. Get the data here.

Information on the numbers of Polish people within Ireland over the same time period proved to be more elusive in nature (concrete data on the years prior to 2000 proved impossible to locate) and the decision was made to base the interrogation upon allocation of PPS numbers in the State. Data.gov.ie acts as a portal for the Department of Employment Affairs and Social Protection page which grants access to data on the allocation of PPS numbers via a collective chart of non nationals, ranging from 2000-2009 and, individually, the years thereafter. It was necessary to mine the data pertaining to Polish nationals into one complete Excel sheet. In order to create a streamlined comparison for this study, and operating on a position of full disclosure and transparency, the decision was taken to include the earlier years of 1970 – 2000 and enter these years as null of entries, as seen on Fig. 3.

Fig. 3: Allocation of PPS numbers to Polish people, 1970 – 2017. Get the data here.

The resulting data from Fig. 1 shows that there were no babies registered Antonina in 1979. The name falls under what the CSO deems a “limit of discretion/uncertainty” indicating that there were 3 or less babies named thus in the years preceding 2007 and the name was duly omitted by the CSO due to confidentiality reasons. 2007 witnesses the entry of Antonina into the chart, with 8 females being registered that year. 2017 sees the name peak in popularity at 19.

More variation is evident in the second visualisation, Fig. 2, which sees a significant spike in popularity in the name Karol, evident in 1979, which then continues over several years before waning in existence and re-emerging again in 2005 in a steadier guise.

Fig. 3 demonstrates the economic impact of EU membership on the number of Polish in Ireland over the years. 2004 saw Poland gaining EU membership, and Ireland granted Poland full access to the labour market at this time. The inflow of Polish entering Ireland peaked in 2006 at 93,787 and until the recession in 2008 numbers of Polish in Ireland accessing PPS numbers remained relatively high and sustained.

The use of line graph in Fig.’s 4 and 5 is conducive in showing the shape of the trends over time. The dual peak in popularity of both names is undeniable over the time period between 2004 and 2017. This peak is observed in a similar manner in Fig. 5 which visualises the arrival of Polish people on the shores of Ireland. The striking similarity of the data, seen through the narrative structure of the line graph, holds statistical significance and establishes connection between the occurrences.

Fig. 4: Comparison of Antonina and Karol from 1970 – 2017

Fig. 5: Allocation of PPS numbers to Polish people in Ireland 1970 – 2017

An interesting outlier is evident in the earlier years of Fig. 2 and Fig. 4 and warrants further attention. An examination of the events in Ireland at that time points to a link between the visit from the Polish Pope John Paul II (Karol Jozef Wójtyla), in 1979, and the peak in the name Karol. The Papal visit had huge impact upon the people of Ireland and this is certainly reflected in the choice hundreds of Irish families made in naming their baby boys this uncommon name for the time. This impact is concretely supported by the data and corresponding visualisation. The name Karol lingered in residual popularity for several years following 1979, before the visit faded from the public consciousness to some extent, and the name shows a sharp downturn in usage in 1984.

Additionally, in Fig. 2 and Fig. 4, a more subtle increase may be witnessed further within the second spike in popularity (after the advent of the Polish arrival in Ireland) in the popularity of the name Karol. This outlier may well have remained undetected without the aid of this visualisation. In the year 2011, we see a significant increase in the presence of the name. An investigation into cultural events at the time points to a specific influence in this respect. The formal beatification of Pope John Paul II took place in 2011 and appears to influence the baby naming choices of the year. While it may be said that the spike is significantly minor so as not to lead to a solid conclusion, it would be negligent not to highlight the finding in light of its relevance to this project.

Data visualisation expert Edward Tufte famously critiqued the adage “Correlation does not imply causation.” The peak in popularity of the name Karol seen in the years surrounding the Papal visit is highly evident. The increase in registration of the name in the year of the beatification, combined with the fact of the Pope’s Polish heritage and the presence of a Polish population in Ireland, highlights a subtle further correlation and implies causation between certain baby naming practices and Papal influence in Ireland in this instance.

Edward Tufte, Correlation/Causation, print on canvas, via edwardtuft.com

Finally, in order to further emphasise the definitive influence of the 1979 visit upon baby names in Ireland, a supplementary investigation into two additional names was conducted.

Fig. 6: Babies named John/Paul from 1970 – 2017 in Ireland.

The line graph evident in Fig. 6 demonstrates the popularity of the names John/Paul, and the spikes in popularity seen in Fig. 6 directly mirror the arc of the name Karol across the same years as seen in Fig. 4, indicating, once again, the influence the Papal visit exerted. Tufte states “Empirically observed covariation (correlation) is necessary but not a sufficient condition for causality. Correlation is not causation, but it sure is a hint.”

In charting the popularity of specific names over a time frame in Ireland, and placing them in the context of economic and cultural shifts and occurrences, this data story has elucidated some specific instances of influences upon baby naming practices in society. Additionally, it showcases the manner in which large datasets may be mined and rendered in order to produce engaging narratives and visualisations for a wider audience.

 

 

 

References

“Baby Names of Ireland.” Central Statistics Office. www.cso.ie/en/interactivezone/visualisationtools/babynamesofireland/  Accessed 15 Feb. 2018.

Brule, Joshua. “A causation coefficient and taxonomy of correlation/causation   relationships.” Semantic Scholar, 05 Aug. 2017. pdfs.semanticscholar.org/7cdf/8ae48c7191130b8c19b17ec1af4a9a0a9e9c.pdf. Accessed 18 Mar. 2018.

“Design Principles.” Data Depiction. datadepiction.wordpress.com/design/ Accessed   17 Mar. 2018.

Edward Tufte. www.edwardtufte.com/tufte/  Accessed 10 Mar. 2018.

“Enrich your story with charts, in seconds.” Datawrapper. www.datawrapper.de/  Accessed 14 Mar. 2018.

“Statistics on Personal Public Service Numbers Issued.” Department of Employment Affairs and Social Protection, 22 Mar. 2017 www.welfare.ie/en/Pages/Personal-Public-Service-Number-Statistics-on-Numbers-Issued.aspx Accessed 16 Feb. 2018.

 

 

The Cost of Free

Almost twenty years ago, in The Control Revolution, Andrew Shapiro outlined two potential paths that the Internet could take. The first was a more positive tale of an “increased individual freedom.” However, the second had a more cautionary tone. It warned of institutions harnessing the power of the network and exerting their influence over us as consumers.

Aleks Krotoski
By Paul Downey via Wikimedia Commons

Aleks Krotoski in The Virtual Revolution, The Cost of Free, broadcast by BBC 2, reports on the development of the World Wide Web in the last twenty years and mirrors this cautionary tone. Users access vast, incalculable amounts of information on a daily basis, and the majority of us take this great ‘commodity’ for granted. Countless hours on Google, Facebook, Twitter…… Krotoski argues that there is a heavy price to pay for these interactions. Douglas Rushkoff, author of Life Inc., states: “The product on line is not the content. The product on line is you.”

When Tim Berners-Lee invented the World Wide Web in 1989 he saw it as an open forum, without boundaries, where information could be shared freely. Stephen Fry, in Krotoski’s The Cost of Free, furthers this notion, saying: “It seemed like a new democracy, of people coming together.”

Stephen Fry
By Marco Raaphorst from The Hague via Wikimedia Commons

However, in 1994, the United States congress lifted the injunction on Web Commerce, and change came about rapidly. Change that was to affect us all deeply. That free content that is available to us on tap? We receive it due to our willingness to sign away our personal data. All those minutiae that may be seen as having little value in the moment are, in fact, priceless. The surveillance that we are constantly under is the price we pay for the ‘free’ services we access on an almost constant basis. Our personal information is that which is being traded.

AdWords is the model implemented by Google whereby advertisers are enabled to target and filter their audience. Google have become the most powerful company in the world simply by using our search preferences and refining their advertising models.

Wikimedia Commons

What Google deems us to be interested, this is what we find in our searches. A barrier has been erected towards the discovery of new things. Krotoski proposes that this system denies us the very ‘serendipity’ that the web originally offered. As the algorithm gets to ‘know’ us more, we are cutting off and marginalizing our options and confining them in the direction Google wants us to take. Eric Schmidt, CEO of Google, in an attempt to put a jaunty, positive spin on the process, utilizes a neat turn of phrase to describe it: “It’s not a broadcast mechanism. It’s a narrowcast mechanism.”

Eric Schmidt
By Guillaume Paumier via Wikimedia Commons

It could be argued that ultimately the use of targeted advertising will lead to the de-personalisation and homogeneity of the audience and consumer. There are implications looking to the future as to how we will identify ourselves, but we must also look at and consider the vast reserve of information that is being stored indefinitely, where it is being held and who has access. And how could it potentially be used?

References

The Cost of Free. The Virtual Revolution. Dir. Dan Kendall. BBC, 2010.

Lessig, Lawrence. The Future of Ideas. Random House, 2001.