As the NSA stories reveal, it really is all about the metadata now (if you updated “Our Bodies, Our Selves” for 2014 it would have to be “Our Metadata, Ourselves” or more accurately, “Our Selves, Google’s Metadata”).
One specific species of metadata that has always interested me–a library geek from day one–was citation, and there is an interesting buzz of activity around making citations, particularly in science domains, open. This is a tentacle of a long-running battle in scholarly publishing about the enormous power of big publishers vs. individual scholars/scientists and the latter’s desire to make content open and available. It may come as a surprise that not only the journals and articles themselves are often tightly controlled, but the metadata–who is cited, how often, and by which other authors–is also typically proprietary. This means that doing broad analysis of this metadata, for instance mapping scholarly knowledge as reflected in the written record, is pretty hard. One example is measuring impact or influence–oft-cited papers (even if is to dismiss the arguments contained therein) raise your academic rank. But that, like the lit review, which these tools also support, is only the beginning of what this “meta” level of citation can yield.
A recent issue of Nature had news of a couple of interesting tidbits about this: first the Open Citations Corpus, which is trying to establish a framework for accessing citations from publishers and other sources. (Now it’s focused on biomedical content, but they have ambitions to scale to encompass every discipline, including arts and humanities.) They are already providing the tools to do this, and Nature is following suit.
A related (and even more interesting) effort is figshare: a way for scientists to publish and share data without the intermediaries of legacy publishing channels. From their web site:
Who/what/when/where/why is figshare?
figshare allows researchers to publish all of their data in a citable, searchable and sharable manner. All data is persistently stored online under the most liberal Creative Commons license, waiving copyright where possible. This allows scientists to access and share the information from anywhere in the world with minimal friction.
The site offers upload times of mere seconds for all file formats, providing a citable, searchable endpoint for researchers. figshare offers unlimited storage space for data that is made publicly available on the site, and 1GB of free storage space for users looking for a secure, private area to store their research. Users of the site maintain full control over the management of their research whilst benefiting from global access, version control and secure backups in the cloud.
figshare was started by a frustrated Imperial College PhD student as a way to disseminate all research outputs and not just static images through traditional academic publishing. It is now supported by Digital Science, a Macmillan Publishers company.
It’s based on Creative Commons Licensing, and I’m struck that it acknowledges from the get-go that modern scholarship and research should provide a way to capture and publish more than just text and figures. We live in a multimedia world, and use those tools to think and build knowledge. Yet we can’t keep track of the assets that are created with these tools very well. (If there really is any kind of functional scholarly apparatus for multimedia when used in disciplines other than those about a particular format, I haven’t heard about it.) When scientists use an asset that is in film format, say, in scholarly publication, how do they catalog, cite, and store? How does it become available in the literature. How do datasets get “published,” for that matter? Is it just, “that data file I sent to Sudhir last month. Where did that go after that, do I still have it? did the data change?”
figshare is one of the efforts to answer these sorts of questions. And it sort of shows that there is a science of data and its management, analysis, and place in knowledge emerging (“International Journal of Metadata Studies, anybody). Were this library school drop out more ambitious, I’d try to get it going. For now I’ll wish it all well, and blog about it now and then. (Along with the observation that “figshare” is a tad twee for a title, as is the all lower case.) Like Thomas Pynchon, the Internet does love its funny names.

Funny, I just heard about Figshare recently, as they are owned by the same company I work for.
In Australia, the Australian National Data Service provides data curation services like figshare. The Research Whisperer published three articles that champion the ideas of sharing, citing and ‘tattooing’ data.