Commonplacing the public domain: Reading the classics socially on the Kindle

Amazon leads the market in ebooks with the Kindle brand, which encompasses a range of dedicated e-reader devices and a large ebook store. Kindle users are able to share the experience of reading ebooks purchased from Amazon by selecting passages of text for upload to the Kindle Popular Highlights website. In this article, I propose that the Kindle Popular Highlights database contains evidence that readers are re-appropriating commonplacing – the act of selecting important passages from a text and recording them in a separate location for later re-use – while reading public domain titles on the Kindle. An analysis of keyness in a corpus of 34,044 shared highlights from public domain titles suggests that readers focus on words relating to philosophy and values to draw an understanding of contemporary society from these classic works. This form of highlighting takes precedence over understanding and sharing key narrative moments. An examination of the top ten most popular authors in the corpus, and case studies of Jane Austen’s Pride and Prejudice and William Shakespeare’s Hamlet, demonstrate variation in highlighting practice as readers are choosing to shorten famous commonplaces in order to change their context for an audience that extends beyond the original reader. Through this analysis, I propose that Kindle users’ highlighting patterns are shaped by the behaviour of other readers and reflect a shared understanding of an audience beyond the initial highlighter.


Introduction
The rise of digital media has led to the re-examination of historical literacy practices including the act of commonplacing. 1 The tradition of commonplacing refers to the historical practice of selecting important parts of a text to remember and reflect upon for their significance beyond their original textual context (see Blair, 1992, Lesser andStallybrass, 2008;Throsby, 2012). In this article, I use keyword analysis of the Kindle Popular Highlights website (Amazon Inc., 2014) together with close analysis of variations between similar highlights to investigate how readers of public domain ebooks use shared reading functions to carry out a contemporary form of commonplacing. This analysis suggests that Kindle users' shared highlights that focus on shared wisdom or pivotal narrative moments are guided by the affordances of the platform and an awareness of an audience much larger than the individual reader.

Reading together online
Digital social reading has been framed as a break from traditional forms of reading, and previous research has analysed several important sites for interaction such as the GoodReads website (Nakamura, 2013). One of the key sites for such interaction is the website maintained by leading online retailer, Amazon (see Allington, 2016: Section 1 for further discussion). The Amazon website enables readers to post reviews of books, and these have been the subject of a number of academic studies focusing on contrasts between 'popular' and 'highbrow' forms of literary culture (Gutjahr, 2002;Steiner 2008;Allington, 2016). As well as selling physical books, Amazon is the world's leading ebook retailer, thanks to its Kindle range of tablet computers and dedicated ereaders, which are designed in order to direct sales traffic towards Amazon's online store. Like the Amazon website, the Kindle provides a social infrastructure facilitating interactions between Amazon customers. Cameron (2012: 86) posits a link between the Kindle's shared highlight feature, which offers the reader the chance to share a section of an ebook with other readers of the same title, and the aide-memoire of marginalia in print culture. Barnett (2014) argues that shared highlights extend beyond an aide memoire for a single reader, as Kindle Popular Highlights 'suggest[s] the promise of facilitating deeper reading conducted across social networks, of motivation for reading for reluctant readers, for productive readings in educational settings, and for guided readings through the involvement of teachers of literature and authors themselves'.
While this seems optimistic, Cameron's (2012: 89) study of highlighting in a single electronic edition of The Adventures of Sherlock Holmes suggests that a pre-existing pattern of frequently highlighted passages can drive future behaviour as the 'hive mind' converges on already popular quotations focusing on 'love, method, and culture'. While 'method' is a core theme in Sherlock Holmes which might not extend to other titles, 'love' and 'culture' may feature as commonplaces in the Kindle Popular Highlights. In this article, I will use the corpus linguistic approach of indentifying keyness and collocates in order to test whether Cameron's findings extend beyond Sherlock Holmes.

Accessing Kindle Popular Highlights
In order to analyse large-scale datasets, researchers require access to data that is often proprietary, and Kindle Popular Highlights represented an uncharacteristic degree of openness from Amazon. Users were able to view what their fellow readers had highlighted on titles they had not purchased, which provided a useful marketing tool for popular titles on the Kindle. Amazon made this data available as a public resource from early 2011 to mid-2014, when it obfuscated access to the database through removing a central list of the most highlighted passages, requiring users to instead crawl through four million individual records to collate the same data. 2 While the dataset was in circulation, it provided a cache of data that offered an insight into how various parts of the Kindle's infrastructure were used. For example, while Amazon boasts a catalogue of several million ebooks, it is unclear which titles users are purchasing, let alone reading. Kindle Popular Highlights cannot map the reading habits of all users, but it reveals which ebooks a subset of Amazon customers have purchased and subsections that a proportion of those readers found interesting.
The dataset cannot be used to form hypotheses about small-scale reading practices as it only offers a macroscopic view of highlighting culture, such as the number of people who have highlighted a particular passage. There is no ability to break down the demographics of this mass readership or see how any one reader has highlighted his or her copy of the book. However, the dataset still provides a large body of evidence with regard to textual fragments that Kindle users wished to record and share. In response to these methodological limitations, I have been careful to focus primarily on the textual evidence without trying to reconstruct the intention of readers.
I will use two approaches to the Popular Highlights culture: first, a comparison of the keywords in public domain highlights against both the Kindle Popular Highlights dataset and a selection of Project Gutenberg texts to represent texts in the public domain; and second, a close analysis of highlighting patterns in the works of the ten most popular authors and repetition across unique editions of the same titles to reveal the interests of readers. The combination of large and micro-scale analysis is instrumental in understanding new cultures of literacy. Analysis of screen reading offers methodological frameworks for understanding literacy, but these must be considered through an understanding of its historical precedents.

The historical context of commonplacing
Historical context can offer illuminating parallels between print and digital culture.
The Kindle Popular Highlights infrastructure facilitates a behaviour analogous to the historical act of commonplacing, which Blair (1992: 541) describes as 'select [ing] passages of interest for the rhetorical turns of phrase, the dialectical arguments, or the factual information they contain'. These commonplaces were either noted in the margins of the book being read, or formalised into manuscript books to collate a single reader's activity. Valenza (2009: 220) demonstrates how commonplacing helped transform narratives into wisdom, since the act of copying important passages from the text to the margins or separate notebook allows readers to focus on parts of the text that impart particular truths or values instead of teasing out the narrative as a whole. This practice was formalised with the rise of the anthology, which functioned as an extended form of commonplacing pre-approved by the publisher and 'trained readers to pace themselves through an unmanageable bulk of print by knowing when to skip and where to linger' (Price, 2000: 4). The commonplace book encourages readers to view texts as collections of aphorisms that can be manipulated in various ways through focusing on decontextualized expressions of wisdom rather than on narrative (McGill, 2007: 357).
In this framework, the commonplace book can be seen as the database to complement the narrative of the original text (Manovich 2001: 231). The collation of commonplaces into a single cross-referenced repository in book form provides a clear precursor to the popular highlights database.
Authors such as John Milton and Francis Bacon maintained commonplace books that served as a record of their composition process. These manuscripts were later preserved through publication to allow readers to track the development of these famous authors' works. While this emphasises the individuality of each reader, Kindle Popular Highlights transforms the highlighting behaviour of vast numbers of readers into an amorphous whole, where it is difficult to distinguish the individual from the group, creating a new type of commonplace that focuses on convergence rather than individuality. The Kindle's highlighting infrastructure is complex. Highlights can be used to bookmark an important part of the text or can be viewed within a personalised commonplace book that contains the accumulative highlights of the user's reading.

Shared highlights on the Kindle
Readers can also choose to publicly share their highlights, which are then amalgamated into the complete shared highlights dataset. The most popular passages have been shared over ten thousand times, but the majority have received fewer than a dozen highlights. Any highlight will be stored in Amazon's database, but will not be visible for readers of the ebook or database unless it is one of the top ten most popular highlights within the title. Each publicly shared quotation is the tip of an iceberg, as the bulk of shared highlights may be spread thinly throughout the ebook, particularly in the most popular titles where the threshold for inclusion is several thousand.
Unfortunately, this means it is impossible to know the full extent of highlighting culture on the Kindle, which means that researchers must rather focus on homogenous reading practices that are publicly visible. The feature has enjoyed great popularity, as users have generated a million unique highlights from over one hundred thousand titles. Shared highlights then appear in two locations: (1) in the ebook, although users must purchase the title to see this; and (2) a publicly maintained database, which will be the focus of this paper. The only way to add new quotations to the system is through those ebooks. Once shared, highlights are rendered distinct from the books, and as the database does not maintain a constant shape, they can be viewed in multiple contexts. In this way, the highlight has morphed from a stable entity in a single manifestation of a book to a fluid and promiscuous data point, which is disassembled and reassembled at will. It is important to note that Kindle Popular Highlights is opt-in -not all users will necessarily choose to either see or interact with these highlights -and this represents a further departure from the traditional approach to highlight culture as the potential audience becomes much larger. Jackson (2002: 96-97) proposes that the creation of marginalia is a 'four-way transaction' involving 'text, reader, target audience, unknown future reader'. Kindle users who choose to share their highlights extend the 'target audience' and 'unknown future reader' into a large potential audience which may discover the shared highlight through reading the book or exploring the separate popular highlights website.

The importance of public domain ebooks
The Kindle Popular Highlights database contains a hetereogenous selection of texts, since readers are naturally interested in a broad variety of texts. In order to mitigate this disorder, I focus on a subset of ebooks that have had a particular role in the development of ebook culture: public domain texts. The economics of public domain ebooks, which are often sold cheaply or released free of charge, have ensured a robust market, although users may not have necessarily read them. Focusing on these texts creates a more homogenous and manageable dataset, but it also provides several secondary benefits. Public domain books have a pre-formed reception due to their survival and circulation as digital editions. While the deluge of new ebooks being published can be difficult to sift through, public domain ebooks have undergone a form of consecration, which has an ancillary effect on contemporary readers' interactions with them, as users may find their reading shaped by the wealth of material based upon the book's reception prior to their interaction with the text. This can lead to a convergence of thought and reaction to texts that are familiar, which is the perfect counterpart to the hive mind culture of Kindle Popular Highlights.
Unfortunately, it is not easy to identify the limits of the public domain. The precise limits of the public domain are ambiguous, because recent copyright legislation has left the status of many texts unclear (see Spoo, 2013: Epilogue). Consequently, prospective publishers have an uneasy notion of what counts as public domain, particularly as the ebook marketplace has become global and therefore has to abide by the most stringent copyright legislation across the world. Since this project is not concerned with solving the complex issues of copyright, works that are treated as if they are public domain are included due to their consecration and integration into this aspect of ebook culture.
Due to the fuzziness of the public domain in Kindle Popular Highlights and the poor quality of Amazon's metadata for public domain ebooks, the project works with a sample of 34,044 highlights -a corpus of over one million words -from public domain authors rather than the complete database.
Through this context of the Kindle's popular highlights culture, I will assess how users are appropriating this feature as an aspect of both individual and social literacy.
Analysis of the public domain works in Kindle Popular Highlights reveals a correlation between shared highlights culture and commonplacing practices. Linguistic analysis of the dataset can facilitate a deeper understanding of digital social reading practices through historical analogies, and more importantly, an analysis of important topics that crop up throughout shared quotations. In particular, Kindle Popular Highlights offers a chance to analyse comparative elements of the same texts across multiple editions, which is difficult to achieve outside of the public domain due to the strictures of copyright for post-1923 publications leading to a single ebook edition for each title.

What do readers of public domain ebooks highlight?
Having identified public domain highlights as a potentially useful subset of Kindle Popular Highlights, I will now analyse whether they reveal any new insight into sharing practices on the Kindle. The public domain highlights reveal the diverse range of genres that interest ebook users. These extend beyond 'the classics' of literature and books that appear in print, as readers are offered a greater selection of public domain texts through the long tail (Anderson, 2009) of the large-scale digitisation projects that have shaped ebook culture in the last decade. Philosophical treatises, fiction, and the Bible are the most popular genres in the dataset. The digitisation of public domain books has led enterprising readers to dig into the 'great unread', the deluge of fiction from the Victorian era that accompanied the mechanisation of the printing press (Cohen, 2002).
Pulp fiction from older eras has found a natural home with the Kindle as a result of the economics of the public domain, where it is possible to create a free, or cheap, copy of a text which users can purchase through the Kindle Store without the requirement to exit Amazon's ecosystem. Project Gutenberg, HathiTrust, and other non-profit digitisation projects have been instrumental in shaping the titles available on the Kindle store, as several publishing projects have emerged to create Kindle ebooks automatically from new titles available from these archives. The following analysis focuses on those texts that have been published automatically in multiple editions in order to sample how different groups of readers semi-independently choose the same elements to highlight. These repeated elements represent the commonplace, as they indicate parts of the text are deemed important by multiple groups of Kindle readers.

Locating the commonplace
It is essential to identify what might count as a commonplace in the public domain highlights. The aphorism, which is a section of text that 'must stand by itself (unconnected), be brief, and treat a moral topic' (Morson, 2003: 409), is an important literary device connected to the commonplace, as it separates knowledge from narrative. Some of the most memorable aphorisms use humour to effectively transmit knowledge. While aphorisms are one of the most common types of commonplace as the genre encourages re-use outside of the source text, they remain only one genre.
Given the popularity of highlighting these bite size chunks of text, such as the opening to Pride and Prejudice, it is likely that readers are mindful of an audience outside of the book who might peruse the Kindle Popular Highlights list and discover the passage. This secondary readership of shared highlights might see the quotation on Twitter or the Kindle Popular Highlights website and need to understand the relevance of the quotation without the added context of the ebook. Aphorisms, by their nature, can be treated in isolation, as they make sense without any surrounding context. This makes them ideal shared highlights for readers who might only see the quotations through an ancillary website. Both the choice of highlights and the way that readers choose to present them can offer an insight into how the Kindle has shaped reading practices in the early twenty-first century.

Identifying keywords
In order to understand how users are responding to the public domain titles through the popular highlights system, I used AntConc to identify keywords in the public domain Kindle Popular Highlights. Keyness assesses which words occur significantly more frequently in a particular corpus compared to a reference corpora (Baker 2006: 121-149). In corpus linguistics, it is conventional to treat a word with a log-likelihood of English language-only selection of public domain titles. After removing foreign and arcane words, as well as function words, leaving only contemporary English lexical words (Baker, 2005: 54), it is possible to focus on the most pertinent keywords for commonplace culture in the public domain Kindle highlights dataset. This focus provides evidence of semantic content of the highlights rather than reader's potential interest in syntactic features and style more generally. While the literary style may interest readers, commonplaces turn out to focus mostly on the knowledge that can be extracted from a work's semantic value. Nature', while 'human nature' and its variants, appears in 25% of all instances of 'nature'. In the remaining 1012 instances of 'nature' in the corpus, 57% collocate with prepositions -including 'of' (713 instances) and 'by' (165 instances) -indicating an interest in nature as essence. Out of 1119 instances of 'soul', the word collocates with 'man' or 'his' in 114 instances, which can refer to a generic everyman as well as a specific character. There is also a smaller cluster of collocates that refer to the soul and self-improvement, including 'nourish' (10 instances), 'save' (10 instances), 'mutiny' (13 instances), 'peace' (18), and 'cure' (20). The patterns of use for the two keywords demonstrate that these words are used in a context where they can be extracted from the text for reflection. As practised through the Kindle, the contemporary act of commonplacing therefore seems to focus on the same chunks of knowledge that were targeted in its print culture counterpart.    Table 3 shows that values and philosophical topics once more emerge as common themes. While these words have been drawn from texts initially published prior to the twentieth century, it is clear that the concepts remain relevant to contemporary society.

Popular authors in public domain highlights
These words coalesce into themes that remain a concern today, such as love, religion, mortality and humankind's integration into the world. These commonplace themes confirm Cameron's assertion that 'love' is an important part of highlighting culture, but her other two categories of 'method' and 'culture' are replaced by a broader philosophical approach to the texts. Shared highlights are used as a way to glean wisdom through revisiting classic texts. However, this preference of reading public domain texts for their universal truths is not the only way readers are highlighting the text, as many are still interested in the plot.

Highlighting pivotal moments in the narrative
Beyond the commonplace, there is a second genre of annotation that is pervasive in shared highlights culture: pivotal narrative moments. In this genre, readers use the shared highlights function to note the affective climaxes of a narrative rather than any perceived wisdom in the highlighted passage. Pride and Prejudice provides one of the clearest examples: 'In vain I have struggled. It will not do. My feelings will not be repressed. You must allow me to tell you how ardently I admire and love you' (Austen, 2007(Austen, : loc. 2205. While this is not aphoristic and needs further context to be understood, it provides other readers with a teaser which reveals key narrative information without being a spoiler, as there is a notable absence of naming actors. This form of in medias res highlighting, or the selection of text that takes place in the middle of the action, is replete with unidentifiable actors and decontextualised dialogue. As in medias res highlights take place in the middle of the action, they are identifiable through typographic patterns, such as a highlight which starts in the middle of a sentence, or quotation. These patterns demonstrate that users are slightly more likely to highlight in medias res segments of public domain titles than those still in copyright. For example, highlighted passages in the public domain titles are marginally more likely to start with a lowercase letter (see table 4). Likewise, quotations, both complete and incomplete, are more likely to appear in public domain works, particularly at the start of a highlight. In other words, in medias res highlighting is as important a part of the kind of social reading studied here as commonplacing and shows the range of literacy practices at work on the Kindle. The early technical infrastructure of Kindle Popular Highlights encouraged in medias res highlighting involving the wider context, as the webpage for each title contained each quotation in its context. Figure 2 shows how the early interface of the Popular Highlights website encouraged in medias res highlights through displaying the shared highlight in its immediate context from the original ebook. In medias res narrative highlights are more common in fiction and poetry than non-fiction. Many of these highlights involve speech as dialogue is a common method of delivering exposition, and five percent of all Kindle Popular Highlights contain at least one quotation mark. 3 Figure 2 demonstrates that commonplacing and in medias res highlights are not mutually exclusive as aphorisms can be integrated into narrative developments.

Analysis of patterns in Jane Austen's Pride and Prejudice
The clearest evidence for commonplace culture appears in titles with multiple editions In this framework, it is possible to look at patterns and the development of repetitive commonplaces. 'It is a truth universally acknowledged…' remains intact, other than on two occasions, when the latter half of the aphorism -'single man in possession of a good fortune, must be in want of a wife.' -is all that remains. To some extent, this may be a technological problem -the user may not necessarily know exactly how to highlight the section they want -but equally, without a chain of influence of other readers, users converge towards the heart of their interests. Figure 3 shows a more complex example of variable highlighting, as users have altered the quotation's length.   These highlighting practices occur across at least two editions, which demonstrates a convergence towards particular elements of the soliloquy. The shared highlighting patterns across editions may represent a shared aesthetic appreciation of elements of the soliloquy, although this is difficult to deduce without further knowledge of individual highlighters. At least three readers must highlight exactly the same passage without knowledge of the others' highlights for it to be classified as a popular highlight, leading to a degree of unseen convergence. These readers' common understanding of the importance of elements of the speech extends its historical role as a commonplace, as readers choose to separate elements of the speech into chunks of knowledge rather than focus on its significance as a complete passage.

Conclusion
In this article, I have argued that the shared highlights of public domain books on the Kindle reveal that readers are re-appropriating the historical literacy practice of commonplacing. These public domain ebooks represent some of the most frequently shared titles as readers are mining the classics for wisdom on mortality, religion and humanity's place within the universe. Highlighting important narrative moments (a feature that is more popular in titles published post-1923) is combined with an interest in extracting universals and expressions of values from the text. Further to this, my analysis showed that readers undertake an act of curation when forming their highlights, as they may share a quotation that starts in the middle of a sentence to turn the highlight into an imperative. Examples from Pride and Prejudice and Hamlet demonstrate that readers edit and curate the content of their highlights to emphasise themes and messages that may interest individual readers. Since each popular highlight initially requires three users to independently select exactly the same passage, and this can happen across multiple editions of the text, users may be focusing on specific aspects of these public domain works. Once the highlight is shared more than three times, users are more likely to follow pre-existing trends that modify the pre-existing commonplace.
Through focusing on just the public domain titles in the Kindle Popular Highlights database, I have demonstrated nuances in the literacy practices of ebook readers that cannot simply be explained as the result of aide memoire or of the re-sharing of already popular highlights. The advent of large-scale datasets of contemporary reading habits presents exciting new frontiers -and pitfalls (Davis, 2015) -in the quantitative and qualitative study of reading. This article presents one specific framework for such research. In so doing, it begins to show how readers make use of the affordances provided by commercial reading technologies: in this case, by hunting for wisdom in pre-1923 texts.
Notes 1 See Barton (1994) for a discussion of literacy practices.
2 No official reason was given for restricting access to the database, but it was likely due to the value of the data in an age where publishers are beginning to become more interested in the reading habits of their customers (Kobo 2014). To share data freely would be to give up the possibility of selling it.