Special Issue: Histories of Data and the Database

AHP readers may be interested in a special issue, “Histories of Data and the Database,” recently published in Historical Studies in the Natural Sciences. Full details below.

“Introduction: Scrutinizing the Data World,” by Soraya de Chadarevian, Theodore M. Porter. No abstract.

“Data as Word,” by Daniel Rosenberg. Abstract:

The history of what we today call “data” extends to the ancient world, yet our contemporary terminology of “data” is modern. This article examines the history and significance of the term “data.” It argues that a historiography of data that is self-conscious about the historicity of its own categories can illuminate the specific materiality of data, distinct from the things in the world it claims to represent.

“Datafication and Spatial Visualization in Nineteenth-Century Census Statistics,” by Christine Von Oertzen. Abstract:

This essay argues that the explosion of visual graphics in nineteenth-century population statistics was closely linked to a shift in statistical epistemologies and practices of data collection. Taking German census statistics as a case in point, I illuminate concepts and practices that referred to data as a category of the here and now, enabling spatial representations of current phenomena. I argue that seeing and abstracting the world as data opened new avenues not only for producing tables with multiple variables, but also for forging such refined results into graphical visualizations of data. These in turn made empirical relationships in the social order evident and thus modifiable through intervention and reform.

“Data in Time: Statistics, Natural History, and the Visualization of Temporal Data,” by David Sepkoski. Abstract:

One of the best arguments for approaching the history of information processing and handling in the human and natural sciences as a “history of data” is that it focuses our attention on relationships, convergences, and contingent historical developments that can be obscured following more traditional areas of focus on individual disciplines or technologies. This essay explores one such case of convergence in nineteenth-century data history between empirical natural history (paleontology and botany), bureaucratic statistics (cameralism), and contemporary historiography, arguing that the establishment of visual conventions around the presentation of temporal patterns in data involved interactions between ostensibly distinct knowledge traditions.

“Observations, Narrative, and Data in Nineteenth-Century Asylum Medicine,” by Theodore M. Porter. Abstract:

French asylum doctor Ludger Lunier’s effort to measure the causal force of war and revolution in the production of insanity involved reasoning from data in an unfamiliar form. Lunier built up what we can call a medical database from an accumulation of about four hundred compact case narratives, some of them based on his direct experience. Although the conclusions he sought were purely quantitative ones, he returned repeatedly to these elemental accounts of the genesis of madness.

“Making and Unmaking Populations,” by Staffan Müller-Wille. Abstract:

Statistics derives its power from classifying data and comparing the resulting distributions. In this paper, I will use two historical examples to highlight the importance of such data practices for statistical reasoning. The two examples I will explore are Franz Boas’s anthropometric studies of native American populations in the early 1890s, which laid the foundation for his later critique of the race concept, and Wilhelm Johannsen’s experiments in barley breeding, which he carried out for the Carlsberg Laboratory around the same time and which prepared the ground for his later distinction of genotype and phenotype. Both examples will show that the manipulation of data depended on complex classificatory practices: the distinction and articulation of “tribes,” “races,” and “family lines” in the case of Boas, and the selection and construction of “populations” and “pure lines” in the case of Johannsen. They also reveal a fundamental difference between data practices in the human and the life sciences: whereas the latter are relatively free to construct populations in the laboratory, the field, or on paper, the former have to rely on social categories shaped by historical accident and self-perception of the subjects under study.

“Me and My Data,” by Sarah E. Igo. Abstract:

This article examines a recent, unexamined turn in the history of personal data in the last half century: the era when it was re-envisioned as a possession of the individual whom it described or from whom it was obtained. Data—whether scientific, commercial, or bureaucratic—had often been treated as confidential or protected, but it had not typically been conceived in terms of individual ownership. But starting in the later 1960s, more and more people in the industrialized West questioned whether they or the authorities who collected or maintained their data properly had claim to that information. This question was sparked as much by political and economic developments as it was by scientific and technological ones. Citizens’ move to shore up their proprietary claims would prompt new regulations around access, control, and consent that continue to undergird contemporary ideas about personal data. A product of social movements and civil rights reforms as well as market thinking, this bid for authority over one’s “own” information would however reveal its limitations by the turn of the twenty-first century, particularly in the context of a big data economy.

“The National Data Center and the Rise of the Data Double,” by Dan Bouk. Abstract:

A mid-1960s proposal to create a National Data Center has long been recognized as a turning point in the history of privacy and surveillance. This article shows that the story of the center also demonstrates how bureaucrats and researchers interested in managing the American economy came to value personal data stored as “data doubles,” especially the cards and files generated to represent individuals within the Social Security bureaucracy. The article argues that the United States welfare state, modeled after corporate life insurance, created vast databanks of data doubles that later became attractive to economic researchers and government planners. This story can be understood as helping to usher in our present age of personal data, one in which data doubles have become not only commodities, but the basis for a new capitalism.

“An Episode in the History of PreCrime,” by Rebecca Lemov. Abstract:

This article traces the rise of “predictive” attitudes to crime prevention. After a brief summary of the current spread of predictive policing based on person-centered and place-centered mathematical models, an episode in the scientific study of future crime is examined. At UCLA between 1969 and 1973, a well-funded “violence center” occasioned great hopes that the quotient of human “dangerousness”—potential violence against other humans—could be quantified and thereby controlled. At the core of the center, under the direction of interrogation expert and psychiatrist Louis Jolyon West, was a project to gather unprecedented amounts of behavioral data and centrally store it to identify emergent crime. Protesters correctly seized on the violence center as a potential site of racially targeted experimentation in psychosurgery and an example of iatrogenic science. Yet the eventual spectacular failure of the center belies an ultimate success: its data-driven vision itself predicted the Philip K. Dick–style PreCrime policing now emerging. The UCLA violence center thus offers an alternative genealogy to predictive policing.

“Things and Data in Recent Biology,” by Soraya de Chadarevian. Abstract:

There is much talk about data-driven and in silico biology, but how exactly does it work? This essay reflects on the relation of data practices to the biological things from which they are abstracted. Looking at concrete examples of computer use in biology, the essay asks: How are biological things turned into data? What organizes and limits the combination, querying, and re-use of data? And how does the work on data link back to the organismic or biological world? Considering the life cycle of data, the essay suggests that data remain linked to the biological material and the concrete context from which they are extracted and to which they always refer back. Consequently, the transition to data science is never complete.

“Open-Access Genomic Databases: A Profit-Making Tool?,” by Emmanuel Didier. Abstract:

A database organizes information, but since information is produced by actors, it also coordinates the different actors involved with data. Here, focusing on the newly created ClinVar, a genomic clinical variant database, we will see how it helps the government, academia, and industry (represented mainly by the company Illumina) find their positions relative to one another.

“How We Became Instrumentalists (Again): Data Positivism since World War II,” by Matthew L. Jones. Abstract:

In the last two decades, a highly instrumentalist form of statistical and machine learning has achieved an extraordinary success as the computational heart of the phenomenon glossed as “predictive analytics,” “data mining,” or “data science.” This instrumentalist culture of prediction emerged from subfields within applied statistics, artificial intelligence, and database management. This essay looks at representative developments within computational statistics and pattern recognition from the 1950s onward, in the United States and beyond, central to the explosion of algorithms, techniques, and epistemic values that ultimately came together in the data sciences of today.

About Jacy Young

Jacy Young recently completed a Social Sciences and Humanities Research Council (SSHRC) of Canada Postdoctoral Fellow at the University of Surrey in the UK. She earned her doctorate in the History and Theory of Psychology at York University in 2014.