Sunday, April 14, 2013

Big Data and the Future of Collections Management

[A presentation I gave for Collections Management class. You can follow the clicks in this embedded Prezi to simulate the experience.]

"Big data" is a big buzzword in business and technology circles. If some had asked me a year ago to define big data, I would have talked about credit card and credit score companies. I would have talked about Google harvesting email content. I would have talked about social networking graphs.


But this sharp increase in data collection is just the first step. The soul of big data is in its use. [click] And the magic of big data is that its use is [click] not predetermined.

To see what I mean, let's take a minute to think about scientific method. [click] Remember this from grade school?
  • Form a hypothesis.
  • Design an experiment.
  • Then: Collect data.
  • Analyze data.
  • Draw a conclusion.
Big data allows a different approach. [click] Data collection happens first. And here's the key: this data is more or less complete. There's no need to design an experiment to generate some data that's relevant to a specific hypothesis; we just work from all the data! This is a data driven approach to finding patterns, including patterns we never would have hypothesized.

For example [click], when Walmart's analysts searched their sales history for interesting patterns, they found a connection between [click] looming hurricanes and the sale of [click] flashlights! Ok, that's not too surprising. They also found a strong correlation between hurricanes and [click] pop tarts! Who knew? Even individual pop tart purchasers may not have perceived they're part of a pattern; a wider perspective was required. So Walmart did the obvious thing, they waited for a hurricane and shipped truckloads of extra pop tarts to select stores. They sold like hotcakes! Or should I say, like pop tarts before a hurricane? [click]

The authors of the book on which this talk is based wrote:
"Big data refers to things one can do at a large scale that cannot be done at a smaller one." p. 6
Let's see what another retail giant has accomplished with large scale data. [click]

Target sometimes advertises by sending 'targeted' coupons to individual customers. It's like's personalized recommendations. Of course, the better the match between coupons and customer needs, the higher the chance that people will get in their cars, drive to Target, and buy things!

Here's the creepy part. Target's analysts wanted to know if they could identify pregnant customers. So they started with customers who had registered for baby showers and searched for patterns in their purchase histories. It turns out that customers who purchase cotton balls and unscented lotion are more likely to be pregnant, especially if this is followed up by certain vitamins and minerals or over twenty other pregnancy-correlated items. In fact, this progression of purchases can even produce a projected due date! There's even a story about a father who came in to Target upset because his teenage daughter had received coupons for baby cribs. Target knew before he did!

If big data is sounding powerful and a little scary, you've got the right idea. [click]

Now we're ready to talk about big data in the context of public library collections management. [click]

What's the difference between a library and a book store? One difference is that book stores are ultimately about making money, while libraries are ultimately about serving their patrons. As we saw with Target, big data can be used to trade away privacy for profit. It seems inevitable that retail stores will use big data in more and more invasive ways [click]. If this happens, all libraries need to do is maintain their reputation for privacy and their value will grow. It would even make sense for collection development policies to mention a preference for materials of confidential interest.[click]

On the other hand, libraries are very well situated to take advantage of big data techniques. Unlike Walmart or Kmart transactions, every checkout is tied to a loyalty card...I mean a library card. I can't tell you what patterns a team of big data analysts would reveal in library data. But when we find our equivalent to pop-tarts or unscented lotion, we might order more or fewer of certain materials, rearrange items, or set up displays at more effective times.[click] [click]

Obviously, there's some tension between maintaining privacy and using library data to its fullest. We could add a line to due date phone calls: "This is Lincoln Public Libraries. We are calling to inform you that you have an item due on Thursday... and you might also enjoy Surprise Child: Finding Hope in Unexpected Pregnancy!" Yes, that might scare people away. Thankfully, libraries don't need to rely on their own data to take a big data approach. [click]

We can use public data. Even without big data analysis, individual collection managers can (and should!) follow best seller lists, social networking trends, and top news stories. Big data analysis goes deeper. It might be possible to predict the next big things before they make their way to the top. Libraries could be ready to meet demands for the next 50 Shades, not lag weeks behind retail stores. If a historically-themed movie is coming out, it would make sense to review materials on that subject, but only if public interest really is picking up; big data might be able to tell the difference. [click]

In summary, big data is powerful and a little scary. It's not something for the average librarian to use directly, but I believe it is everyone's responsibility to steer the profession between the extremes of neglecting and overusing this technology. We need to adapt to big data, but we also need to adapt big data to our professional ethics.

Thank you. [click]


  1. I have to wonder what's included in the fine print for loyalty cards. Is this use of big data for targeted coupons something they disclose to patrons? I don't know that I would object to targeted coupons or improving store content based on big data in this manner. The push for privacy seems to have been aimed at personal information, but I'm not sure this would count as personal information. What does count, though?

    What's interesting to me, too, is the rate at which our technology is allowing for procedures we haven't had time to discuss in terms of ethics. The law-making process in the U.S. seems so far behind the curve that any corrective action would come well after the worst of the damage is already done. How do we legislate what we don't even know is coming?

    An neat presentation, though library usage of big data would seem pretty limited.

    I'm finally caught up, too!

  2. Get daily ideas and instructions for earning $1,000s per day FROM HOME totally FREE.