Charles Watkinson (An Institutional Response to the Challenges of Digital Scholarship in Archaeology at the American School of Classical Studies at Athens, The "drill down" dilemma. Why can't we link archaeological publication to the underlying data?, Only Panthers Share Archaeological Data) and Sebastian Heath (Drilling Down (and Up)) have been engaged in an entertaining discussion regarding the future of digital repositories for archaeological data. The discussion was vigorous and interesting for me and my collaborators at the Pyla-Koutsopetria Archaeological Project (PKAP) as we are beginning to look toward making our data available online. In this spirit we have posted papers, our annual reports, video (Emerging Cypriot, Survey on Cyprus), maps backed by real data, and even some of our poster presentations. We are even beginning to fool about with Omeka to develop an online "museum". But none of this, at least to my mind, really counts as "data" in an archaeological sense.
While folks like Charles and Sebastian have a firm grasp of many of the technical and theoretical aspects to making data available, most small to mid-sized projects supported by small to mid-sized universities and directed by folks like me with small to mid-sized brains struggle to get their heads around the real practical issues of making data available to colleagues elsewhere on the web. (I would be what Charles Watkinson refers to as a baby armadillo or raccoon (I prefer squirrel) as compared to the Grey Panther types, like Jack Davis and Ian Hodder).
In particular, I struggle to envision what my end-user will do with out data. This is not rooted in the fear of someone doing something untoward or threatening with our data or an unwillingness to share. Rather I am not sure how to present our data in in a way that would be effective and useful to a outside user within the technical expertise at our disposal. On the one hand, for our project, the metaphor of "drilling down" from the solid surface of interpretation to the data below (if I understand the metaphor) represents a reasonable extension of the sound rhetorical tactic for any archaeological argument. Most well-published sites and surveys include some kind of catalogue of finds that at least allows for some kind of down-drilling from interpretation and analysis to the actual artifacts that provide evidence for a project's conclusions. In this sense, presenting data to allow for drilling down is not a phenomenon that is new to archaeological argument or presentation, but one that would simply be enhanced by exploring electronic forms of data.
This, however, is does not necessarily coincide precisely with how we envision our archaeological data being useful. PKAP is a small scale, highly intensive pedestrian survey. Despite the claims of "artifact level survey", a single sherd from a project like ours rarely has much "intrinsic" meaning. Of course, if we say that a particular artifact is African Red Slip 105 and an expert looks at it and says that it is a local Hellenistic cooking pot, then we have a problem. More commonly, however, a single artifact gains meaning from its spatial distribution, frequency, and relationships to other artifacts. Thinking about our data in this way, the metaphor of drilling down from interpretation to a stable artifactual foundation is less helpful (although I can understand how it would be extended to include the context of the artifact). When we at PKAP think about data sharing we primarily think about presenting data in such a way that it can be recontextualized and recombined to form the basis for yet unanticipated interpretations.
Perhaps this line of discussion is a red herring: while not a technical wiz like Sebastian, I know enough about database structure to know that data "drilling" and regrouping and comparing should all be possible within well-conceived data structure. Nevertheless it remains difficult, for example, to export data from the PRAP Pottery Database in a tabular form for analysis (although I suspect that this is possible). Other data sets which are available as tables (for example some that are available through the Archaeological Data Service) lack the metadata necessary to understand what the tables actually represent or seem too raw and irregular to trust entirely for analysis.
When we have thought about publishing our data, we've conceived of it almost exclusively in a raw format which would make it more difficult for an inexperienced user to drill down through, but perhaps more useful to a professional who seeks to do quantitative analysis. Again, I recognize that these models for data distribution are not mutually exclusive (Open Context, for example, seems to allow queries to be exported in tabular form) but for a project like ours that is approaching its final field season and is looking to share our data in the short term there are some very basic issues that would facilitate our progress toward that goal:
(1) On a budget and with present technology how should a project go about making its data available? The ASCSA has invested a major grant in preparing the complex data sets of the Agora and Corinth Excavations. PRAP and other major archaeological project have well-designed interfaces prepared by highly skilled experts. At the same time, there are services like Open Context. How do we understand and sort through these options for making data available? What are the salient issues? Is this something that individual projects should do or is the future of data distribution going to ultimately reside with outside services? Will data distribution become as idiosyncratic and fragmented as, say, academic publishing?
(2) Is drilling down still the best way to conceptualize the presentation of our data. That is to say, how raw can our data be? Do we need to contextualize our data within interpretative narratives that can then form the "surface" from which "excavation" can take place? Or do we present our "raw" data with appropriate metadata with the hope that it can serve as the raw material for independent comparisons and analysis?
(3) An more practical concern is stable storage. As recent discussions of digital infrastructure has made clear, just getting long term, stable server space at a mid-sized university is a struggle and maintaining it is a long-term investment (just as maintaining a apotheke for physical artifacts). What is the relationship between a long term commitment to data storage (on a certain server) and reference stability that is necessary if we want to cite in our academic papers a particular artifact from a particular data set?
Is our end user someone who simply wants to download a data set with appropriate meta data and manipulate it via their own software and according to their own whims? Or does our end user want an interactive interface? Or both?
Will it increasingly be the obligation of a project to have well-thought-out answers to these collections before we begin to collect data to begin with? I can't shake the feeling that my inability to answer these questions or even properly understand the debate reflects a kind of archaeological irresponsibility on my part!
This is confusing business.