Skip to content

A Few Reasonable Words

A commonplace book

Tag: metadata

Posted on April 15, 2016

“That Movie with the Sliding Doors and Time Travel … You Know…Maybe in Polish”

Book store staff, librarians, and movie store clerks (back when there still were movie stores!) all have stories of perplexing requests from patrons, generally when there is a very specific title in mind, but the question is a complete puzzle, with a title described in such vague or idiosyncratic ways as to require mind-reading more than reference desk skills.

A few tidbits from Library Thing,

” I need that book that’s called Shakespeare, but it’s spelled with a “Ch” and the author starts with M…”

Fortunately, I was in my groove that day, and it only took me a few seconds to figure out that the patron wanted the book Chesapeake, written by James Michener.

••••••••

My friend, as a young lass, once ran up to a librarian, very excited, and yelled out, “Do you have ‘The Cat Who Shat?'”

••••••••

I worked in a record store all through my (many) undergraduate years and we would get many crazy folks into the store on a regular basis. The overall most common silly request was; ” Can you help me find this song? I don’t know what it’s called or who sings it, but it’s about love.”

Aren’t they all?

As a classical music person, I’ve encountered my share of these mysteries: Somebody raving about hearing “Faust’s Requiem.” (I figured it was some outrageous East German world premiere, but it was just the Fauré one.).

When I was responsible for an information service on opera (which is a fairly nutty thing to start with), I did reliably got calls about “that opera where she dies at the end.”  I checked my impulse to say, “if she doesn’t die at the end, it’s  probably just Rodgers and Hammerstein.”

hunt Library Stacks.sm
The automated, as in robot served, stacks at NC State’s Hunt Library. A lot of metadata within and without. Maybe someday we’ll be able to tell a robot, “I want that blue book, about a whale named Ahab…”

But on reflection, all these misfires disclose that the brain is an interesting thing, and though no doubt we may all share much common neural architecture, we certainly don’t seem to keep track of and remember things or their associations in the same way. (In my case, I would charitably call this “neural metadata diversity of recall” but I suppose it could just be that I’m ‘getting on’ as the Brits would have it.) There is after all, a specific target for all these descriptions, it’s just that the getting there is fraught.

For myself, I do okay remembering  music facts (I started early in this domain, and, since I listen to and play music every day and write about it almost as much, it remains relatively fresh), but boy do I get foggy about movie details. And am notorious for groping to remember titles of films I have seen or want to see. “The one about the farmer who drives to someplace in the upper Midwest to collect his life insurance pay out starring the guy who was in The Great Gatsby.”

“Train doors close and open on two different paths for the story.”

“Brad Pitt turns into a baby. Unbearably long “

Turns out, there’s an AI-powered web site that makes this a snap. “What Is My Movie?” assesses deep content using technology from cognitive science to suss out those elusive links in your query and unravel the mystery of which movie you were looking for on Netflix.

From their website:

Whatismymovie.com is a descriptive movie search engine that was originally created as a showcase for Arctic15, Helsinki in 5/2015. Its purpose is to present a new way of searching video content, using movies and TV as the chosen approach. Descriptive movie search is based on our research on what is called “Deep Content”. Deep Content is everything we can see and hear in a video, but cannot mechanically analyze – until now. Deep Content includes transcripts, audio, visual patterns and basically any form of data feed that describes the video content itself. After analyzing the deeper levels of the video, we automatically convert it into advanced metadata. This metadata is then processed by the beating heart of our engine: a cognitive machine learning system that understands natural language queries and matches it with our metadata.

It’s not foolproof. It got Sliding Doors and the (completely intolerable) Benjamin Button in the examples above, but didn’t realize I was thinking about Nebraska for the first. (Although it did suggest Borat!).

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pinterest (Opens in new window) Pinterest
Like Loading...
Posted on December 6, 2013

The Age of Metadata

As the NSA stories reveal, it really is all about the metadata now (if you updated “Our Bodies, Our Selves” for 2014 it would have to be “Our Metadata, Ourselves” or more accurately, “Our Selves, Google’s Metadata”).

One specific species of metadata that has always interested me–a library geek from day one–was citation, and there is an interesting buzz of activity around making citations, particularly in science domains, open. This is a tentacle of a long-running battle in scholarly publishing about the enormous power of big publishers vs. individual scholars/scientists and the latter’s desire to make content open and available. It may come as a surprise that not only the journals and articles themselves are often tightly controlled, but the metadata–who is cited, how often, and by which other authors–is also typically proprietary.  This means that doing broad analysis of this metadata, for instance mapping scholarly knowledge as reflected in the written record, is pretty hard. One example is measuring impact or influence–oft-cited papers (even if is to dismiss the arguments contained therein) raise your academic rank. But that, like the lit review, which these tools also support, is only the beginning of what this “meta” level of citation can yield.

A recent issue of Nature had news of a couple of interesting tidbits about this: first the Open Citations Corpus, which is trying to establish a framework for accessing citations from publishers and other sources. (Now it’s focused on biomedical content, but they have ambitions to scale to encompass every discipline, including arts and humanities.) They are already providing the tools to do this, and Nature is following suit.

A related (and even more interesting) effort is figshare: a way for scientists to publish and share data without the intermediaries of legacy publishing channels.   From their web site:

Who/what/when/where/why is figshare?

figshare allows researchers to publish all of their data in a citable, searchable and sharable manner. All data is persistently stored online under the most liberal Creative Commons license, waiving copyright where possible. This allows scientists to access and share the information from anywhere in the world with minimal friction.

The site offers upload times of mere seconds for all file formats, providing a citable, searchable endpoint for researchers. figshare offers unlimited storage space for data that is made publicly available on the site, and 1GB of free storage space for users looking for a secure, private area to store their research. Users of the site maintain full control over the management of their research whilst benefiting from global access, version control and secure backups in the cloud.

figshare was started by a frustrated Imperial College PhD student as a way to disseminate all research outputs and not just static images through traditional academic publishing. It is now supported by Digital Science, a Macmillan Publishers company.

It’s based on Creative Commons Licensing, and I’m struck that it acknowledges from the get-go that modern scholarship and research should provide a way to capture and publish more than just text and figures. We live in a multimedia world, and use those tools to think and build knowledge. Yet we can’t keep track of the assets that are created with these tools very well.  (If there really is any kind of functional scholarly apparatus for multimedia when used in disciplines other than those about a particular format, I haven’t heard about it.)  When scientists use an asset that is in film format, say, in scholarly publication, how do they catalog, cite, and store? How does it become available in the literature. How do datasets get “published,” for that matter? Is it just, “that data file I sent to Sudhir last month. Where did that go after that, do I still have it? did the data change?”

figshare is one of the efforts to answer these sorts of questions. And it sort of shows that there is a science of data and its management, analysis, and place in knowledge emerging  (“International Journal of Metadata Studies, anybody). Were this library school drop out more ambitious, I’d try to get it going. For now I’ll wish it all well, and blog about it now and then. (Along with the observation that “figshare” is a tad twee for a title, as is the all lower case.)  Like Thomas Pynchon, the Internet does love its funny names.

catalog_card
Metadata 101, from the 1930s, when the Catalog Card was the ne plus ultra of metadata. Found on the Flickr stream of http://ccdl.libraries.claremont.edu/, Claremont Colleges Digital Library, a fun browse.

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pinterest (Opens in new window) Pinterest
Like Loading...

Recent Posts

  • National Library Week!
  • Stephen Sondheim, Ruth Slenczynska, and Sabine Devieilhe
  • Web Rabbit Hole: Igor Levit to Charlotte Selver
  • West Side Stories at the Movies and a Golden Voice of Today
  • Quotable words

Sites I like

  • Astronomy Picture of the Day
  • Boston Globe Ideas
  • Daring Fireball
  • Design Observer
  • Gramophone
  • Joe My God
  • Laughing Squid
  • Leiter Reports: A Philosophy Blog
  • London Review of Books
  • Long Reads
  • Mutts
  • Never Ending Search
  • Poetry Daily
  • Poetry Foundation
  • Powell's Book Store
  • Quanta Magazine
  • Spitalfields life
  • The Big Picture from the Boston Globe
  • The Chronicle of Higher Education
  • The Coversation
  • The DCist
  • The Lens Blog from the NYTimes.com
  • The Little Professor
  • The Sartorialist
  • Times Literary Supplement
  • UC Press Blog

Archives

  • April 2022
  • January 2022
  • February 2021
  • July 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • May 2019
  • April 2019
  • March 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • October 2017
  • September 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012

Categories

  • 30 Days
  • 30 Days of First Lines
  • Academe
  • Adirondacks
  • Architecture
  • Art
  • arts presentation
  • Blogging
  • Books
  • Breezeway
  • Civil Liberties
  • Commonplace Book
  • Computers
  • Cyberespionage
  • Dan Chiasson
  • Dance
  • data
  • design
  • Drama
  • education
  • Essay
  • fact checking
  • Gardens
  • Gay Life
  • Graphic Design
  • History
  • Holidays
  • humor
  • John Ashbery
  • Journalism
  • Language
  • Libraries and Librarians
  • Links
  • Literature
  • Literature and Poetry
  • Math Education
  • Media
  • Museums
  • Music
  • Musicals
  • Musings
  • newspapers
  • opera
  • painting
  • Personal Finance
  • Philosophy
  • Photography
  • Poets
  • quotations
  • Reading
  • science
  • software
  • Teaching and Learning
  • Technology
  • Theater
  • Travel
  • Uncategorized
  • Writing

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 195 other subscribers
Follow A Few Reasonable Words on WordPress.com
  • RSS - Posts
  • RSS - Comments
Create a website or blog at WordPress.com
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Subscribe Subscribed
    • A Few Reasonable Words
    • Join 195 other subscribers
    • Already have a WordPress.com account? Log in now.
    • A Few Reasonable Words
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    %d