Thursday, December 27, 2012

What Is The Semantic Web?

The Semantic Web isor is hoped to bethe next revolution in the way the Internet is used, just as the World Wide Web was a revolution in the way the Internet was used. To get some perspective, we need to look back at history.

Before the Internet, computers existed as standalone machines, possibly with multiple monitor/keyboard terminals spread around a building. For long distance connections, wired circuits (think modems) had to be be brought up and then maintained throughout a session. Local networks existed, but each network vendor had its own incompatible system. There wasn't a standard way of communicating across networks.

The Internet began as a U.S. Department of Defense project to connect research universities. By the end of 1969, networks at four universities were connected to each other. In 1983, the communication standard of this inter-network ("between"-network) was changed to the TCP/IP protocol suite, which is still the basis of Internet communication today. With an IP address (e.g. 203.0.113.100) and a port (e.g. 25), a computer in California can connect to the email program on a computer in Germany and leave a message for a user there. Or, slightly more user friendly, a kid growing up in rural North Dakota could use the telnet application to connect to a domain name (genesis.cs.chalmers.se) along with a port (3011) to play a text adventure game running on a university server in Sweden. (They've changed the address a bit since I was in high school.)

There was useful and fun stuff going on before the World Wide Web, but it was hard to discover new resources. It's hard to believe now, but it was common in those days to learn about Internet sites by reading about them in books. Paper books! Sure, there were Gopher servers with manually-maintained hierarchical categories of Internet resources, but these directories didn't keep up very well and the resources didn't usually link to each other.

The World Wide Web began as an internal project at CERN, the particle physics research center on the border of Switzerland and France. Researchers needed a better way to organize their information in a busy environment with lots of job turnover, so Tim Berners-Lee proposed a solution for CERN intentionally designed to work on a global scale as well. He wrote:
"a 'web' of notes with links (like references) between them is far more useful than a fixed hierarchical system. When describing a complex system, many people resort to diagrams with circles and arrows. Circles and arrows leave one free to describe the interrelationships between things in a way that tables, for example, do not. The system we need is like a diagram of circles and arrows, where circles and arrows can stand for anything." (source)
He was serious about the "anything" part, but we'll get back to that. As implemented, the circles came to represent documents and the arrows became references to other documents. Web pages linking to other web pages! The notion of document interlinking had been around for decades, but the World Wide Web turned the idea into practical, worldwide reality.

Linked documents sounds a little boring, but programmers have found ways to make web "documents" very interactive. Many other Internet applications have migrated into the web browser. Gopher was replaced by Yahoo (before Yahoo became a tabloid). Home users are more likely to use web mail than a standalone mail client. Twitter and Facebook have largely replaced IRC and other instant messaging clients. Web services (and web mail, unfortunately) are used to transfer files instead of FTP. It's a good thing that applications like Skype and BitTorrent exist, or people might forget there's a difference between the Internet and the World Wide Web!

W3C Semantic Web Logo
What's next?

Many great things happened after we started linking documents; what if we try linking finer-grained pieces of data in usable ways? That's the idea behind the Semantic Web.

Think of it this way: the World Wide Web allowed organizations and individuals to put their relatively static documents "out there" for the world to see. But what about database generated content like library catalogs, or online store pricing, or current weather conditions? Web crawlers might be able to retrieve and usefully interpret some of this data, but that usually requires special per-site programming that breaks if the API or web formatting changes.

Getting Across Town, The Semantic Way

Here is an example of a web-published bus route:

http://lincoln.ne.gov/city/pworks/startran/routemap/weekday/route41.htm

An experienced bus rider can read this page and figure out how to plan a trip. A computer program would need help understanding how to parse all of this visually-structured data into precisely labeled information that it can reason about. Quick, what time does the last southbound bus leave "North Walmart" on Thursdays? It's not a trivial process to give that answer, even after we visually interpret the numbers as times in columns that correspond to bus stop locations on the map below. An even harder question might be: "I'm at arbitrary location X and want to reach location Y; what bus route gives me the shortest total walking distance?" In this case, a human on the right website might still have to manually look through all bus route pages, narrow it down to a couple of likely shortest routes, then spend more time comparing the tradeoff between walking farther to the first bus stop or walking farther from the last bus stop.

What would be really neat is a way for bus services and street map services to publish their data on the web in a computer-friendly form that allows third party web apps to combine all of this information and calculate answers to such questions. Even better: a universal format so mash-ups from unexpected combinations of data sources are easier to make. I'm thinking of a music app that checks your GPS position and your destination so it can create a playlist that ends within thirty seconds before your final stop. Or an emergency flight plan app that cross references ticket pricing options with weather predictions. Or a recipe web site that lets you mark missing ingredients and shows their pricing from the five closest stores. Or a personalized book recommendation site that filters by currently available titles in local public libraries. Or imagine searching the web for information on a brand-name drug, and the top results use the drug's generic name without mentioning the brand-name.

Many of these things are possible without semantic web technology; they just require more work to set up and don't tend to be very reusable. For example, Google Transit can help with bus route planning, if a city has formatted their data specifically for this Google web app and joined the transit partner program. But what if a new business wants to reuse this information in a creative way? What if Google cancels the Transit service? It would preferable to have an open standard for open data.

Linked Data

What's the plan, then? Open existing relational databases to the public? Not exactly. The World Wide Web Consortium is pushing for another database model that's a more natural fit for the web: a graph-style data model. From the Wikipedia article:
"Compared with relational databases, graph databases are often faster for associative data sets, and map more directly to the structure of object-oriented applications. They can scale more naturally to large data sets as they do not typically require expensive join operations. As they depend less on a rigid schema, they are more suitable to manage ad-hoc and changing data with evolving schemas. Conversely, relational databases are typically faster at performing the same operation on large numbers of data elements."
In other words, graph databases are less efficient but more flexible (see also The Death of the Relational Database). For people who aren't math majors or computer programmers, "graph database" may sound like "graphical database." But what's meant is graph theory: a bunch of nodes and connections between nodes, usually visualized as circles and lines. A directed graph adds direction to those lines, so you get circles and arrows. Recall what Tim Berners-Lee wrote in his original proposal for the World Wide Web: "The system we need is like a diagram of circles and arrows, where circles and arrows can stand for anything." The World Wide Web is made of connections like this:
(http://en.wikipedia.org/wiki/Cat) --links to--> (http://www.catpert.com/)
Each URL (Uniform Resource Locator) is a circle and web links are the arrows. If you can imagine all URLs and all arrows between them as a gigantic diagram, you're visualizing the World Wide Web as one big directed graph.

Now imagine that the circles can stand for anything, not just web documents. Imagine that the arrows can stand for any relationship, not just navigation links.
(rain gauge #2,388) --detected rain depth--> (3 cm)
(rain gauge #2,388) --time since last emptied--> (60 min)
(rain gauge #2,388) --location--> (Millennium Stadium)
(Cardiff) --contains--> (Millennium Stadium)
A web app that has access to this information can now give an answer the question, "How much has it rained in Cardiff in the last hour?" "An average of 3 cm, as reported by 1 rain gauge." Or with more gauges it might be, "An average of 2.95 cm, as reported by 15 rain gauges." These (something) --related somehow--> (something) snippets of information called triples can combine together into complex graphs of data. And, like web pages, this can happen across servers. The rain depth information could be on one server that only knows the gauge is in Millennium Stadium, while another server knows that Millennium Stadium is in Cardiff. In fact, it makes sense to reference a separate server with lots of geographical knowledge rather than trying to maintain geographical info on a specialized rain gauge server. If the geography server is updated, the rain server automatically and instantly benefits! This is an example of the synergy that can happen with linked data.

Wait, Where Are These Factoids?

Regular web links are in web pages and point to other web pages; we're used to that by now. But where are these triples located? They can be embedded into web page code in the form of RDFa. Graph databases called triplestores can also be put on the Internet and directly queried, much as a SQL database could be if it weren't hidden behind an intermediary website. In either case, typical Internet users won't "see" the Semantic Web directly as they see the World Wide Web's documents and links. The Semantic Web exists as a programming-oriented sibling or add-on to the World Wide Web, not as a replacement. Applications use the Semantic Web to enhance traditional web services.

What Makes the Semantic Web "Semantic"?

In philosophy, linguistics, and computer science, semantics has to do with meaning in contrast to syntax (which has to do with structure or format). Remember ad-libs?
The [adjective] outlaw [transitive past tense verb] a [common noun].
So long as these blanks are filled in with the specified parts of speech, the resulting sentence will be syntactically correct; it will have the right format for an English sentence. For example:
The lonely outlaw whistled a tune.
The law-abiding outlaw drank a mortgage.
The second sentence may have proper syntax, but it's nonsense. Because of their meaning, certain words and phrases don't go well together, at least not in a literal sense. Something else to consider:
This isn't a dog, it's a doberman pinscher.
Again, nothing wrong with the syntax, but a doberman pinscher is a type of dog. Another case:
There were witch trials in Salem.
The truth of this sentence depends (in part) on which Salem is meant. It's a true claim when referring to Salem, Massachusetts. It's false for Salem, Iowa...and many other Salems. In standalone databases, ambiguities and mis-matched concepts like these aren't much of a problem. A database created for a certain purpose in a certain context has implicit restrictions on the meaning of its data. A Massachusetts newspaper database and a Iowa newspaper database are going to mean something different by just plain "Salem." What happens if we try to publish all of these databases on the web and expect the data to mesh well together? Chaos, unintentional humor, and a general lack of usefulness!

For this reason, the Semantic Web has to be about more than just publishing everyone's data as (subject) --predicate--> (object) triples. Here's a flawed set of triples:
(witch trials) --took place in--> (Salem)
(Tom) --born in--> (Salem)
Was Tom born in the same city that the witch trials took place in? We can't tell because we don't know if the two "Salem"s are the same, or which "Tom" is meant. To solve this problem, URIs (Uniform Resource Identifiers) are used, roughly like this:
(http://dbpedia.org/resource/Category:Salem_witch_trials)
--(http://sw.opencyc.org/2008/06/10/concept/en/eventOccursAt)-->
(http://dbpedia.org/resource/Salem,_Massachusetts)

(http://dbpedia.org/page/Thomas_Poulter)
--(http://dbpedia.org/ontology/birthPlace)-->
(http://dbpedia.org/resource/Salem,_Iowa)
In this case, the "Tom" in question was born in a different Salem. If the URIs had matched up, it would have been possible to draw a new conclusion along the lines of (Tom) --born where occurred--> (Salem witch trials). Why call these URIs rather than URLs? Because they don't necessarily correspond to a visitable web page, although it's considered best practice to make such a page available when possible. A URI can identify a resource (or a concept!) without necessarily providing a location.

Did you notice that the URIs above come from both dbpedia.org and opencyc.org? There isn't a single, authorized web domain for the URIs used in linked data. Different organizations can contribute to the pool of URIs. What if two organizations use different URIs for the same thing? There's a triple for that!
(http://dbpedia.org/resource/Salem,_Massachusetts)
--(http://www.w3.org/2002/07/owl#sameAs)-->
(http://sw.cyc.com/concept/Mx4rvViiFpwpEbGdrcN5Y29ycA)
What about mismatches between URIs for "doberman pinscher" and "dog." As you might guess by now, a predicate (i.e. middle URI) can be used to say that a doberman is a type of dog. Then, hopefully, any computer program trying to decide if a given specimen is a dog won't stop at finding out that it's a "doberman pinscher"; it will check to see if doberman pinschers are dogs.

To answer the original question, what makes the Semantic Web "semantic"? All of this background work done by ontologists to separate and combine concepts and to specify the relationships among them. The Semantic Web isn't just about breaking data out of individual databases, but to publish data in terms of these shared vocabularies and relationship schemes. For data to be useful (and reusable) in a giant, global database, the information that was implicit in the context and structure of local databases has to become explicit. Triples format does this for structure. Ontology work does this for meaning.

When Will "Semantic Web" Be a Household Name?

It probably won't ever be a term everyone knows. The semantic revolution is happening behind the scenes among scientific, business, and cultural heritage groups. If things go well, the Semantic Web will increasingly influence the average person's experience with traditional web sites and services. Even if today's technical implementation of the Semantic Web remains niche, I have no doubt that some of its motivating ideas will reappear in future technologies.

Related Reading

Wednesday, December 19, 2012

Favorites in 2012

Some things I liked in 2012 (that weren't necessarily released in 2012).


Novels


John Scalzi's series-starter Old Man's War opens with:

"I did two things on my seventy-fifth birthday. I visited my wife's grave. Then I joined the army."

I was hooked immediately and on every. single. page. after that! This is a sci-fi military thriller from the current president of the Science Fiction & Fantasy Writers of America. Hey, I'd vote for him too.

Best read without any spoilers beyond that first line.



Ted Chiang's collection Stories of Your Life and Others is a perfect blend of religion, math, linguistics, and transhumanism. It begins with a retelling of the Tower of Babel and ends with a what-if about turning off the human ability to instantly judge facial beauty.

Chiang surely has one of the best publications-to-awards ratios in the business, possibly because he writes fiction as a side gig. At three stories in, he jumped up there with Borges on my favorite short form authors list.




Yes, I read a lot of science fiction this year! The Diamond Age: or A Young Lady's Illustrated Primer by Neal Stephenson is officially the book I most want to see adapted into a movie, so long as it has a big budget and doesn't hold back on the adult content. When I saw Sucker Punch later, I thought, "This director would be perfect for The Diamond Age if he isn't allowed to write the script!"

This story has everything: the end of material shortages, society after nation-states, cyborg gangsters, nanomachine smog, robot horses, and the best introduction to computer science a girl could ever have.


The Name of the Rose by Umberto Eco does its damnedest to drive off readers in the prologue and first few chapters. I know because I suffered through that trial, and because there was a note in the back about Eco's friends complaining about it and him saying it was on purpose. So tough!

But get through the bits about medieval monks squabbling over seemingly irrelevant theological points and what you'll find is a gripping murder mystery set in and around one of the most anti ALA-approved libraries in the history of mankind. I've actually seen it referenced in a professional paper as an example of what not to do. (Seriously, though, skip the prologue.)


Films (besides Drive, obviously)



Never Let Me Go is science fiction in the minimal sense that Gattaca is science fiction. The technological advance is not a whiz-bang amazing thing in the foreground, but the background reason the protagonists are struggling. Without a doubt the best child-actors-to-adult-actors transition I've seen, and both phases of the story are equally strong. As a bonus: this is the first time I've liked Keira Knightley in anything.



Heaven (2002) is a gorgeously-shot story about life after the unforgivable. Set in Italy and frequently switching between Italian and English, this police drama has a lot of the dreamy quality of the director's earlier film Run Lola Run.



I almost stopped watching Castaway On The Moon about a third of the way through because I felt the silliness of being stranded on an island in the middle the Han River in the middle of Seoul wasn't enough to fill the rest of the movie. I was right and so very wrong. A major twist happened that took this from a ha-ha three stars to five SOUTH KOREA IS BEST KOREA stars. No, I won't give you a hint. If you don't mind a few poop jokes along the way, trust me on this film.



If you didn't know Cold Weather was an amateur detective film, you probably wouldn't guess it for quite a while. Best cinematography of the year and the most realistic characters. The polar opposite of something like Brick, as these characters aren't especially clever or talented. That's an understatement; they're flat-out not clever and not talented. They're awkward, complex twenty-somethings like everyone I knew at that age. And I've never felt so tense worrying about characters in a mystery film.



The Adventures of Buckaroo Banzai Across the 8th Dimension is the wacky, gloriously exaggerated 80s movie that can only be as low profile as it is due to an alien conspiracy. Banzai is a neurosurgeon. No, he's a test car driver! No, he's physicist! No, he's a band leader! No, he's a personal advisor to the President! He's all of these things and the leader of a paramilitary organization trying to keep Earth out of the crossfire of an interplanetary conflict. He's a very busy and very cool man.



Beyond The Black Rainbow is all style and little substance, but the style THE STYLE! Heavy synth. Abrupt changes in lighting and color filters and contrast. This is more like a planetarium light show than a movie. What little plot there is reminded me vaguely of the Portal video game series and required me to piece together the back-story out of small touches and sparse flashbacks. I loved it, but I completely understand why so many people detest the film. I had to take a couple of breaks to finish it.


Music

Just listen...




Interactive Fiction



I didn't like the comic version of The Walking Dead. The TV show is decent. But for my money, the video game series is the best-written, best-acted incarnation. It's in the adventure game genre, which means the focus is on dialogue choices, examining and using objects to solve puzzles, and some action game-y elements sprinkled here and there.

If you're familiar with the TV version, this is a parallel story starting from the day of the infection with cameos by a couple of characters you know. I've heard of a gamer's mother who doesn't like zombie violence or approve of all the swearing, but who got hooked by the story about protecting a little girl through the apocalypse and played the whole way through herself. No surprise, considering how this game cleaned up at the Spike TV video game awards, including one just for the girl's voice actress. I give it the Most Intense Emotional Rollercoaster award.

Sunday, December 16, 2012

Lingo: Authority Control

http://www.flickr.com/photos/kamikazestoat/425526222
Some Delicious tags:


These are user-submitted tags to help other users find webpages on a given topic.

Suppose I just found some interesting Legend of Zelda alt art and want to link it on Delicious. Which tag do I use? legendofzelda is popular, but so is zelda. If I want everyone to see my link, I had better use both! Maybe this is good enough, but since there will still be people browsing through the other tags listed above, should I use all of them? How do I know I've even found them all? What if someone starts using the tag zeldaseries next week?

Hey, maybe someone should clean up this mess by designating an official tag for the Legend of Zelda video game series. Or we call this the authorized tag. Here is a great three-part plan:
  1. Decide on authorized tags for every distinct topic on Delicious.
  2. Make sure that all current and future Delicious links use the authorized tags.
  3. Enjoy finding all links related to a topic under one tag (and nothing unrelated)!
In Library Science lingo, steps one and two are called authority work: the behind-the-scenes work that needs to be done to have neatly organized access points to resources. Access points can be titles, names, or topics.
  • The Legend of Zelda: The Wind Waker (Video Game)  -- a title
  • Miyamoto, Shigeru, 1952-  -- a name
  • Sailing  -- a topic
or a little older:
  • Dracula (Novel)  -- a title
  • Stoker, Bram, 1847-1912  -- a name
  • Vampires  -- a topic
A close synonym to authority work is authority control. I prefer to think of authority control as the goal of authority work. In other words, we do authority work to achieve a state of authority control (as in step three above). But it's more common to combine the concepts:
"Authority control is the process of bringing together all of the forms of name that apply to a single name; all the variant titles that apply to a single work; and relating all the synonyms, related terms, broader terms, and narrower terms that apply to a single subject heading." — Arlene Taylor, The Organization of Information (3rd edition), p. 44
It's not the most intuitive terminology. "Access point control" or "name deduplication" or "not having a pile of inconsistent labels" would all be better.

A Professionals Only Club?

Delicious is not likely to change its tagging system. Authority control has great benefits, but it takes a lot of extra time and effort. Delicious is fantastic for what it offers: quick-and-easy bookmark tagging and decent (if flawed) bookmark discovery.

Does this mean authority control is only in reach for professional librarians? Nope! I can think of a major Web 2.0 site that lets users participate in a kind of authority work: Wikipedia.

http://commons.wikimedia.org/wiki/File:Pommes-1.jpg
Quick! What are these called:

...pommes, chips, French fries? ...pommes frites, slap chips, Belgian fries?

Imagine separate Wikipedia articles for these variations and many more. Not desirable, to say the least. Wikipedia handles this situation by letting users decide on a single article title (e.g. French Fries) and creating redirects for alternate titles.

Why does this work for Wikipedia but not for Delicious? Primarily because of the number of volunteer editors willing to do this kind of behind-the-scenes work for articles. Trying to keep Delicious links organized would be much more maddening with much less payoff.

Controlled Vocabulary Resources

Not every library or website needs to come up with its own authorized titles, names, or subjects. Here are some (more or less) publicly available lists that can at least serve as a starting point:

Library of Congress Subject Headings. A very broad and inclusive set of subject terms. Academic libraries tend to re-use these for their collections. Example: Ships. Smaller libraries often use the Sears List of Subject Headings instead.

Library of Congress Name Authority File. Example: Rice, Anne, 1941-. Also see Getty's Union List of Artist Names. Example: Mondrian, Piet (Dutch painter, 1872-1944).

Library of Congress' Thesaurus for Graphic Materials. Check the three "Browse By" links on the left. Example: Nitrate negatives. Getty's Art & Architecture Thesaurus. Example: Googie.

Individuals might prefer to use vocabularies like these rather than come up with their own blog tags, image tags, or music tags. You can look beyond the library and archives scene too. If I had a music review blog, I would probably use AllMusic's genre name hierarchy. Example: Americana. Right now this doesn't do a lot of good on one blog, but the growth of Semantic Web technologies may mean better use of authorized vocabulary on the public web in the future. Or the SEO leeches might just mess that up too. Either way, you can always visit your library and take advantage of the authority control someone worked so hard to set up there!

Friday, December 7, 2012

Nicomachean Ethics (Pt. 2)

[Series introduction and table of contents here.]

Book I, Chapter 4

In my comments on Chapter 2, I described Aristotle's "grand goal" as the political art. That wasn't quite right. What he was saying back then and reiterates here in Chapter 4 is that the highest of goods is the same as whatever the political art's goal is. He sees politics as the most encompassing activity in human life, so its goal would be the most encompassing goal. And what is the goal of the political art? Happiness.

All human activities are subordinate to politics and politics is aimed at happiness. Got it. Aristotle doesn't feel the need to argue for the answer of "happiness" because he takes it as universally accepted by both "the many" and "the refined." (Yes, he's just a tad elitist.) He does note that "the many" give a variety of explanations for what constitutes happiness, e.g. health, wealth, pleasure, etc.
"Certain others, in addition, used to suppose that the good is something else, by itself, apart from these many good things, which is also the cause of their all being good."
"Certain others" being Plato and friends, obviously. It's interesting how Aristotle puts some distance between himself and this view. Before he elaborates, however, he goes off on another tangent about arguing from principles vs. arguing to principles. Why does he do this? I think it's because he wants to excuse himself from starting with Plato's principles. He actually names Plato as someone who understood these two different directions of argument. He's tip-toeing around his audience's reverence for his own former teacher. Aristotle is firmly on the side of arguing to principles, which might sound bad until you realize he's trying to be more of a scientist than an ideologue; he wants to use induction to discover what the true principles are from "things known to us" rather than "things known simply."
"Perhaps it is necessary for us, at least, to begin from the things known to us."
See, he's not being arrogant by going his own way from Plato. He's being extra humble.

Book I, Chapter 5

There are three "especially prominent" ways of life:

The life of enjoyment. This is what "the many" choose to pursue, though some rulers do as well. Aristotle calls this "the life of fattened cattle." These people think happiness and pleasure are the same.

The political life. The "refined and active" live the political life by pursuing honor...or maybe virtue. Aristotle considers the possibility that honor is more of a reaction people have when they encounter a person with virtue, which would make virtue the primary goal. He's not quite happy with this result, however, since there are many cases where the exercise of virtue and happiness seem at odds.
"For it seems to be possible for someone to possess virtue even while asleep or while being inactive throughout life and, in addition to these, while suffering badly and undergoing the greatest misfortune. But no one would deem happy somebody living in this way, unless he were defending a thesis."
Funny! But I have to wonder if Aristotle is being overly dismissive of the possibility of being fulfilled and happy despite great suffering, because a person is so overwhelmingly interested in what they're accomplishing.

The contemplative life. A footnote here says that Aristotle doesn't get around to explaining the contemplative life until Book X, Chapters 6-8. I've already seen how easily distracted he is, but this has to be some kind of record! Is "sophistication" a Greek word meaning "disorganized"?

Book I, Chapter 6

Aristotle argues that good can't be a Platonic form (see the "Certain others..." block quote above) because, roughly:
  • For something to have a Platonic form, its expressions must pertain to a "common idea."
  • Good can pertain to both what something is and its relations to other things.
  • What something is is an essential property.
  • How something relates to other things is an accidental property.
  • A common idea can't be both essential and accidental.
  • Therefore good can't be a Platonic form.
He goes on to list other difficulties in understanding good as a single idea. But then he admits that maybe we can divide instances of good into "things good in themselves" and things that "are advantageous" so we can consider whether the multiplicities of good might only be a problem for the latter category (what philosophers today call "instrumental good"). Perhaps there is a single idea common to all things good in themselves. For example, what if the idea of good itself is the only thing that is good in itself? Aristotle calls this "pointless."

In order to avoid pointlessness, it must be the case that all instances of things that are good in themselves outwardly manifest good in a common way, "just as the definition of whiteness is the same in the case of snow and in that of white lead." Aristotle believes that "honor, prudence, and pleasure" are good in themselves because people pursue these things for their own sake (even if they also pursue them in an instrumental sense). He doesn't see how the good of honor and the good of pleasure, for example, manifest in a common way, so good can't be a Platonic form even if we set aside instrumental goodness.

Now Aristotle has a problem. Why the heck do we call all of these disparate things "good" if they don't share a common idea?
"For they are not like things that share the same name by chance. Is it by dint of their stemming from one thing or because they all contribute to one thing? Or is it more that they are such by analogy?"
He doesn't have a ready answer. Instead, he points back at the Platonists and accuses them of having problems explaining how totally abstract forms and concrete human action interact with each other. Reminds me of physicalists in philosophy of mind who defend themselves by pointing out issues with Cartesian dualism.

I wonder what Aristotle would have made of Paul Ziff's book, Semantic Analysis. It seems to me that Ziff answered the question by discovering that things are never good in themselves and it's the other category that can fold neatly into a single idea.


Quotes from: Bartlett, R.C. & Collins, S.D. (2011). Aristotle's nicomachean ethics: A new translation. Chicago: The University of Chicago Press.

Saturday, December 1, 2012

Monthly Picks

On the first day of each month, I will be posting about papers I've found interesting in Philosophy or Library & Information Science. I'll try to make sure at least one is accessible to everyone.

Davis, J.K. (forthcoming). An alternative to relativism. Philosophical Topics (Special Issue on Moral Disagreement).
[link] freely accessible
...

In other news, "Lingo: Locutionary, Illocutionary, and Perlocutionary Acts" has just passed "What Is Moral Realism?" as this blog's all-time most popular post.

Sunday, November 11, 2012

Nicomachean Ethics (Pt. 1)


Time for a good old-fashioned blogmentary! In this series, I'm going all the way back to ancient Greek moral philosophy. Most of my previous readings in ethics have been more-or-less contemporary, with a side of Hume, Kant, and Mill. While I'm not a fan of confusing philosophy with history of philosophy, this Aristotle fellow keeps popping up in current, actively-defended philosophy. He's resilient! I decided it's high time to get acquainted with Aristotle's ethics beyond the popular quotes I've encountered elsewhere.

So you understand where I'm coming from, I have a very goal-oriented view of morality. Descriptively, morality arises from deeply-held human values. Normatively, moral truth arises from a fitting application of decisions or policies to the way the world works. This means I have a decidedly practical rather than mystical view of morality. In the not-so-helpful language of metaethics, "cognitivism," "success theory," "anti-realism," and "hybrid expressivism" should put you in the right neighborhood.

I will be using Robert C. Bartlett and Susan D. Collins' new (2011) translation, as pictured above. They pursued formal equivalence—as opposed to dynamic equivalence—to provide readers with a less filtered experience of Aristotle's wording. Think NASB instead of NIV or CEV, if you're familiar with Bible translations (and their acronyms!). I have no set plan on how much to write per original text or even if I'll comment on the whole thing. So long as I find the material interesting and worth discussing, I will. Finally, I encourage you to pick up a paperback copy for yourself. The Kindle edition has a typo in the first sentence and takes away from the excellent footnotes on nearly every page.

Series Links

Book I, Chapter 1
"Every art and every inquiry, and similarly every action as well as choice, is held to aim at some good. Hence people have nobly declared that the good is that at which all things aim."
Quite an opening line. The first sentence calls out for elaboration. Given an art, inquiry, action, or choice, what is the good being targeted? The second sentence is, intriguingly, hedged. Aristotle isn't flat-out saying all things aim at "the good." He's putting a common view on the table and expressing some sympathy for the people who take that view. It's one thing to say all things aim at "some good"; another to say all things aim at the same good. Even if they do, is this common good so abstract that we can only call it "the good"?

Aristotle immediately raises a difficulty with this noble declaration: how can all things aim at the same good when there are different types of things aimed at? As he puts it, "there appears to be a certain difference among the ends." Some ends are direct. The end of shipbuilding is the production of a ship. Other ends are indirect. The end of building warships isn't just the production of a warship, but of winning a war.

When one end is pursued as a means to a more encompassing end, Aristotle calls the encompassing end "naturally better" and "more choice-worthy." I'm less sure. Take bread-making, for example. The immediate end is the production of a loaf of bread. A further end is to alleviate hunger. Does this necessarily mean the work of alleviating hunger is better than the action of baking bread? Bread isn't the only way to take care of hunger; opening a can of beans could do the job. A person might value bread-making in itself, over and above its use as a hunger banisher. In other words, bread-making might have both instrumental and final value. (Or instrumental and intrinsic value, if you're not hip to Korsgaard).

I'm wary about pushing all value for one activity into its encompassing activity because it can lead pretty quickly to single-value ethics such as Mill's grand goal of aggregate happiness or Rand's grand goal of extending one's own lifespan. While we may value such broad ends and engage in many activities that promote them, I think it's a mistake—an error in judging human psychology—to empty all other values into such pools. The error is especially clear in Ayn Rand's case: we need to live to experience life, but what makes our lives worth living is more than just the time spent.

Book I, Chapter 2
"If, therefore, there is some end of our actions that we wish for on account of itself, the rest being things we wish for on account of this end, and if we do not choose all things on account of something else—for in this way the process will go on infinitely such that the longing involved is empty and pointless—clearly this would be the good, that is, the best."
Freshmen programmers who don't understand the need for a base case in recursive functions should be ashamed of themselves. The ancient Greeks knew this stuff! (They also put your middle school Geometry skills to shame.) Anyway, I still think Aristotle is wrong to ignore the possibility of multiple ends in the "on account of itself" category. But since he thunders on past that, what is his grand goal? ...the political art. Huh? I didn't see that coming, but it does make sense of this edition's beautiful cover art.

Aristotle lists activities such as economics, warfare, and rhetoric which can all be understood as supporting politics. Today we might say that all things are done for the good of society.
"[T]he good of the individual by himself is certainly desirable enough, but that of a nation and of cities is nobler and more divine."
Why not say that the good of nations and cities is subordinate to the good it produces for individuals? It will be interesting to see how Aristotle handles situations where what's good for the state is very bad for some individuals. Or when what's good for individuals is irrelevant to what's good or bad for society.

Book I, Chapter 3

This chapter argues for approaching political science in a rough—rather than an unduly precise—manner.
"The noble things and the just things, which the political art examines, admit of much dispute and variability, such that they are held to exist by law alone and not by nature. And even the good things admit of some such variability on account of the harm that befalls many people as a result of them: it has happened that some have been destroyed on account of their wealth, other on account of their courage"
Oh what a relief! He admits there are problems when civic good or other virtues are pushed to the extremes without considering their effects. Maybe he was familiar with Greek tragedies? This should have prompted some reflection on his part. If your great all-encompassing good can have bad effects, isn't this a flashing clue that you have the wrong fundamental good...or at least not the only fundamental good?

After some snappy characterizations of mathematicians and youngsters, Aristotle praises an attitude of patience when learning. He says his teachings are pointless for people who just follow their passions unreflectively, but of great benefit to people who "fashion their longings in accord with reason and act accordingly." This makes me ask myself, "When was the last time I allowed learning to shape my actions, and not just to justify them?" Honestly, not long ago, considering I participated in the political art just this week and made a different choice than I did four years ago.

Thursday, November 8, 2012

What's Going On?

I'm closing down a tech notes blog I started in May 2008 and moving the current top four posts here. Together, they've gotten over 12k hits from Google searches. Why do this? I have barely used the tech blog in the last year and I don't like sites that are stagnating, decreasing property values, etc.

Besides, I've already increased this blog's official tagline to include librarianship. Might as well throw in technology for a full mix of the issues I write about. Note to self: write a fascinating post on the philosophy of library technology.

If you're wanting more general philosophy, don't worry! I plan on blogging through a book again soon. Just trying to decide between a few candidates (and finish final papers for Fall semester).

Ubuntu Server Text Mode

[Originally posted June 5, 2012]

Ubuntu Server supposedly lacks a GUI, but it still uses FrameBuffer by default. I was seeing the following error on a picky Dell monitor when booting to a fresh install of Ubuntu 12.04 LTS:
Out of range signal.
Cannot display this video mode,
change computer display input to 1600x1200 @60Hz
Disabling FrameBuffer
  • Boot off the installation CD and use rescue mode.
  • Open the file:  /etc/modprobe.d/blacklist-framebuffer.conf
  • Append the line:  blacklist vga16fb
  • Save the file and exit, then run:  update-initramfs -u
  • Reboot
Credit to this guy for showing me the fix.

Text Mode GRUB

I didn't like my video signal going out of range during the GRUB stage, so...
  • Open the file:  /boot/grub/grub.cfg
  • Uncomment the line:  GRUB_TERMINAL=console
  • Save the file and exit, then run:  update-grub
  • Reboot

Outlook Certificate Warning With Exchange 2007 or 2010

[Originally posted January 10, 2011]  

After installing a third party certificate in Exchange 2007 or Exchange 2010 (for Outlook Web Access and similar services), some Outlook clients may suddenly start complaining:

"The name of the security certificate is invalid or does not match the name of the site."

Here's the relevant Microsoft article. If you have trouble understanding it on the first read, I'll paraphrase!

The Problem

Exchange '07 and '10 automatically generate a self-signed certificate with the fully qualified internal name of the mail server. Outlook 2007 (and possibly Outlook 2010) clients connect to Exchange using — by default — the server's internal name. When the name the client uses and the certificate match, no problem! There's also no problem for Outlook 2003 clients because they don't bother with the certificate.

But what if you replace the Exchange certificate with one that references the external name of the server? 'mail.contoso.com' instead of 'mail-srv.contoso.local', for example? Well, you get the error above!

Expensive Fix

If the new certificate includes Subject Alternate Names, you could include the internal name as one of the alternates. This internal name will be externally viewable to anyone who likes to read certificate details, if you care about that.

The Usual Fix...

The other way to make the warning go away is to instruct internal Outlook clients to look for the mail server under its external name (e.g. 'mail.contoso.com') and make sure internal DNS resolves to the internal IP of the mail server.

...And Its Downside

You'll need to run "split DNS." Create a forward lookup zone on the internal DNS server for the external domain name. LAN clients which try to reach anything that ends in '.contoso.com' will receive their answers from the internal DNS server. Be careful! If you forget to add, for example, 'www.contoso.com' to the internal version, LAN clients may lose access to the company website.

Check Current Values

To be on the safe side, make a record of the relevant Exchange settings before changing them. This process will also help familiarize you with what's going on in the next step. Open Exchange Management Shell. Type the following queries, then note the information on the lines specified:

> get-clientaccessserver | fl

Note the value for 'AutoDiscoverServiceInternalUri'

> get-webservicesvirtualdirectory | fl

Note the value for 'InternalURL'

> get-oabvirtualdirectory | fl

Note the value for 'InternalURL'

(Exchange 2007 only)
> get-umvirtualdirectory | fl

Note the value for 'InternalURL'

Hopefully, the values are all the same for these!

Change To the External Name

Assuming...
Internal name is 'mail-srv.contoso.local' and
External name is 'mail.contoso.com'.

> Set-ClientAccessServer -Identity mail-srv.contoso.local -AutodiscoverServiceInternalUri https://mail.contoso.com/autodiscover/autodiscover.xml

> Set-WebServicesVirtualDirectory -Identity "mail-srv.contoso.local\EWS (Default Web Site)" -InternalUrl https://mail.contoso.com/ews/exchange.asmx

> Set-OABVirtualDirectory -Identity "mail-srv.contoso.local\oab (Default Web Site)" -InternalUrl https://mail.contoso.com/oab

(Exchange 2007 only)
> Set-UMVirtualDirectory -Identity "mail-srv.contoso.local\unifiedmessaging (Default Web Site)" -InternalUrl https://mail.contoso.com/unifiedmessaging/service.asmx

Then either reboot the server, or open IIS, browse to application pools, and recycle 'MSExchangeAutodiscoverAppPool'.

Shrew Soft VPN Client with Juniper/Netscreen IPSEC

[Originally posted July 30, 2010]

Shrew Soft's VPN client is free and remarkably cross-platform. I needed it for Windows 7 notebooks. While there's already a nice write-up on how to configure a preshared key with XAuth scheme, my particular situation called for separate preshared keys for each user and no XAuth. So that's the (relatively!) simple setup I'll be documenting here.

A bit of history: Juniper Networks purchased Netscreen in '04. The Netscreen brand continued to be used on Firewall/VPN devices for several years following that (which is when I earned technical certification on them), but these are now simply Juniper "Secure Services Gateway[s]." I'll call the device the "firewall" to stay neutral. Screenshots are from a NS5GT; details may vary slightly.


Sample Parameters

Obviously, these won't actually work. The 'X's stand for unspecified numerical values.

192.168.1.0 /24 — Business LAN
10.X.X.X — Firewall public IP
roadwarrior — User name
corporation.inc — Business URL
1234567895 — roadwarrior's preshared key


Routing

Routing on the Netscreen should already be set up unless this is the first VPN configured on the firewall. Something along these lines should work:

untrust-vr entry
IP/Netmask — 192.168.1.0 /24
Gateway — trust-vr
Interface — -

trust-vr entry
IP/Netmask — 192.168.1.0 /24
Gateway — 0.0.0.0
Interface — ethernet1

And if there isn't already a name for the LAN subnet, add it to Objects->Addresses->List->Trust->New.

Address Name — corporation.inc LAN
IP/Netmask — 192.168.1.0 /24
Zone — Trust


User Setup

Objects->Users->Local->New

User Name — roadwarrior
Status — Enable
IKE User — Checked
IKE ID Type — Auto
IKE Identity — roadwarrior@corporation.inc



Phase 1 Setup

VPNs->AutoKey Advanced->Gateway->New

Gateway Name — roadwarrior P1
Security Level — Standard
Remote Gateway Type — Dialup User
User — roadwarrior
Preshared Key — 1234567895
Use As Seed — Unchecked
Outgoing Interface — ethernet3


Click Advanced.

Mode (Initiator) — Aggressive
Enable NAT Traversal — Checked
UDP Checksum — Unchecked
Keepalive Frequency — 20
[Authentication Section] — None


Click Return, then Ok.


Phase 2 Setup

VPNs->AutoKey IKE->New

VPN Name — roadwarrior P2
Security Level — Custom
Remote Gateway — roadwarrior P1


Click Advanced.

Security Level — Custom
Phase 2 Proposals:
* nopfs-esp-3des-md5
* nopfs-esp-3des-sha
* nopfs-esp-aes128-md5
* nopfs-esp-aes128-sha
Replay Protection — Checked
...the rest of the settings on this page shouldn't need changing from default:
Transport Mode — Unchecked
Bind to — None
Proxy-ID — Unchecked
Local (and Remote) IP/Netmask — 0.0.0.0 / [blank]
Service — Any
VPN Group — None
VPN Monitor — Unchecked
Source Interface — Default
Destination IP — 0.0.0.0
Optimized — Unchecked
Rekey — Unchecked


Click Return, then Ok.


Policy Setup

Policies.

From: Untrust
To: Trust
Click New.

Source Address — Dial-Up VPN
Destination Address — corporation.inc LAN
Service — Any
Action — Tunnel
Tunnel [VPN] — roadwarrior P2
Tunnel [L2TP] — None


Click Ok.


Shrew Soft Access Manager — General Tab

Host Name or IP Address — 10.X.X.X (True value at Network->Interfaces->edit[ethernet3]->IP Address)
Port — 500
Auto Configuration — disabled
Address Method — Use an existing adapter and current address


Shrew Soft Access Manager — Client Tab

NAT Traversal — enable
NAT Traversal Port — 4500
Keep-alive Packet rate — 15
IKE Fragmentation — enable
Maximum Packet size — 540

Enable Dead Peer Detection — Checked
Enable ISAKMP Failure Notifications — Checked


Shrew Soft Access Manager — Name Resolution Tab

All unchecked. Of course this sort of thing can be set up if you prefer. I'm using it for a simple case which does not need DNS.


Shrew Soft Access Manager — Authentication Tab

Authentication — Mutual PSK

Local Identity subtab
Identification Type — User Fully Qualified Domain Name
UFQDN String — roadwarrior@corporation.inc


Remote Identity subtab
Identification Type — IP Address
Address String — [blank]
Use a discovered remote host address — Checked


Credentials subtab
Preshared Key — 1234567895


Shrew Soft Access Manager — Phase 1 Tab

Exchange Type — aggressive
DH Exchange — group 2
Cipher Algorithm — auto
Hash Algorithm — auto
Key Life Time limit — 86400
Key Life Data limit — 0
Enable Check Point Compatible Vender ID — Unchecked


Shrew Soft Access Manager — Phase 2 Tab

Transform Algorithm — auto
HMAC Algorithm — auto
PFS Exchange — disabled
Compress Algorithm — disabled
Key Life Time limit — 3600
Key Life Data limit — 0


Shrew Soft Access Manager — Policy Tab

Maintain Persistent Security Associations — Unchecked
Obtain Topology Automatically or Tunnel All — Unchecked

Click Add.
Type — Include
Address — 192.168.1.0
Netmask — 255.255.255.0


Click Ok, then Save.


...now try connecting. When it fails the first time, check the log entries on the firewall. When those are unclear, see the blog post immediate prior to this one on detailed VPN troubleshooting.

Basic Setup for Wyse ThinOS + Windows Terminal Server

[Originally posted February 23, 2010]

Consider this a quick start guide for a particular scenario: you want multiple Wyse ThinOS terminals to automatically log into a Windows Terminal Server with terminal-specific user accounts.


In this example, the user accounts "Front Desk" and "Utilities Console" are already configured on the Terminal Server (or its domain). Here's what needs to happen when one of the thin clients is powered on:
  1. Client looks for DHCP services and configures basic network parameters. (Client IP can be dynamic.)
  2. Client checks DHCP option 161 and finds the static IP address of the FTP server.
  3. Client logs into the FTP server anonymously and runs /wyse/wnos/wnos.ini which contains the settings for all Wyse ThinOS clients.
  4. wnos.ini includes a line which causes the client to look for /wyse/wnos/inc/[MAC].ini where "MAC" is its own MAC address. This contains client specific settings, e.g. "Front Desk" credentials. Either wnos.ini or [MAC].ini will instruct the client to connect to the Terminal Server.
Note: The Terminal Server, DHCP server, and FTP server may all be the same host or three separate hosts. Or a 2 / 1 split. It just doesn't matter.

Terminal Server Setup

Make sure the user profiles are set up correctly on the Terminal Server by using any RDP client.

DHCP Setup

Check the scope options on the DHCP server. For Windows 2003 Server, this will be under [Server]->Scope->Scope Options->Configure Options->General tab->Available options. Option 161 is not defined by default, so it will probably not be on this list.

To define a new DHCP option in Windows 2003 Server, right click on [Server] and select Set Predefined Options. Click Add.

Name: Wyse FTP Server
Data Type: String
Code: 161
Description: FTP Server for Wyse ThinOS Clients

(Only the Code value is vital.)

DHCP services may need a restart. Go back to the scope options, enable the newly defined option, and enter the IP address of the FTP server.

FTP Setup

Use any familiar FTP server. The following just needs to work:

> ftp [FTP server]
> Name: anonymous
> Password: anonymous
> cd wyse
> cd wnos
> ascii
> get wnos.ini
> cd ini
> get [MAC].ini

Both wnos.ini and [MAC].ini are going to be plaintext configs. Feel free to make test versions with any content to make sure the FTP is working right.

Example Network Values

User: Front Desk
Pass: easyPass8
MAC: 0123456789AB

User: Utilities Console
Pass: easyPass4
MAC: 1023456789CC

Domain: toasterco.local

FTP IP: 192.168.1.40 (not used in the configs below, to avoid paradox)

Terminal Server IP: 192.168.1.50
Terminal Server Name: Legion-srv

Example wnos.ini

AutoLoad=0
AutoPower=yes
SignOn=no

include=$mac.ini

connect=rdp \
icon=default \
description= "Legion-srv" \
host=192.168.1.50 \
Fullscreen=yes \
Reconnect=yes \
Autoconnect=yes

Example 0123456789AB.ini

connect=rdp \
description= "Legion-srv" \
host=192.168.1.50 \
icon=default \
username="Front Desk" \
password=easyPass8 \
domainname=toasterco.local \
Fullscreen=yes \
Reconnect=yes \
Autoconnect=yes

Exit=all

Example 1023456789CC.ini

connect=rdp \
description= "Legion-srv" \
host=192.168.1.50 \
icon=default \
username="Utilities Console" \
password=easyPass4 \
domainname=toasterco.local \
Fullscreen=yes \
Reconnect=yes \
Autoconnect=yes

Exit=all

Final Comments

The line "include=$mac.ini" in wnos.ini will cause execution to jump to the individual config file if the MAC match is successful. The line "Exit=all" at the end of an individual config will stop execution. Otherwise, it would return to the general config file and individual settings would be overwritten.

Wyse Support has plenty of reference documentation covering these config file options and many more. Don't even have to log into the support site to access this material. Yay for that.

Thursday, November 1, 2012

Monthly Picks

On the first day of each month, I will be posting about papers I've found interesting in Philosophy or Library & Information Science. I'll try to make sure at least one is accessible to everyone.

Adriaans, P. (Oct 2012). Information. Stanford Encyclopedia of Philosophy.
[link] freely accessible

International Federation of Library Associations and Institutions (Aug 2012). IFLA Code of Ethics for Librarians and other Information Workers.
[link] freely accessible

Saturday, October 13, 2012

Fantastic Fiction's Fading Heritage

"It will be a terrible waste if the stories from the pulp era vanish because of this issue." (Science Fiction and Fantasy Writers of America, Inc. [SFWA], 2005, p. 9)
Because of the way copyright law is set up in the United States, it can be difficult or impossible to locate copyright owners for protected works going all the way back to the 1920s. Without a way to ask permission to reprint these "orphan works," they tend to fade out of culture and sometimes out of physical existence. Science fiction and fantasy literature grew into their modern forms in the 20s through 50s, but many of these genre-developing works are unpublishable orphans. No one is reading them or receiving royalties from their sale.

This paper will look at how copyright law created the so-called "orphan works problem" and how the Science Fiction and Fantasy Writers of America responded to the U.S. Copyright Office's call for comments on the situation.

Peer Pressure

In 1866, most of the major European powers signed an international copyright agreement in Berne, Switzerland. The Berne Convention for the Protection of Literary and Artistic Works (or simply the "Berne Convention") required its members to respect the rights of other member nations' authors as if they were domestic authors:
"Authors shall enjoy, in respect of works for which they are protected under this Convention, in countries of the Union other than the country of origin, the rights which their respective laws do now or may hereafter grant to their nationals, as well as the rights specially granted by this Convention." (Berne Convention for the Protection of Literary and Artistic Works [Berne Convention], 1979, art. 5)
The Convention disallowed any sort of requirement that authors register their works or stamp them with an official declaration before being protected by copyright:
"The enjoyment and the exercise of these rights shall not be subject to any formality; such enjoyment and such exercise shall be independent of the existence of protection in the country of origin of the work." (Berne Convention, 1979, art. 5)
A little over 120 years later, the U.S. finally signed on when Congress passed the Berne Convention Implementation Act of 1988. Why wait so long? One major issue was the "no formalities" clause quoted above. U.S. copyright term was also far shorter than the Convention's minimum of 50 years after the death of the author (Berne Convention, 1979, art. 7). In 1866, U.S. copyright worked like this (Peters, 1850, p. 436-439):
  • 28 years of copyright, from the time the title of the work was properly registered.
  • Plus a 14 year extension, if re-registered within six months of the original expiration date.
  • So long as the correct notices are given in the book and in a newspaper...
  • ...and a copy is put on deposit with the government.
Immediate adoption of the Berne Convention would have been an abrupt change in both duration and scope of copyright protection. In the meanwhile, the U.S. did sign the Buenos Aires Convention of 1910, which provided mutual copyright protection in much of North, Central, and South America and did allow formalities. To accommodate the U.S. (and other nations refusing the Berne Convention), a compromise was created in the form of the 1954 Universal Copyright Convention, which was widely accepted by the United States, Latin America, and Berne Convention members. By the 1980s, U.S. copyright worked like this:
  • Protection for the life of the author, plus 50 years after death.
  • Registration "is not a condition of copyright protection." (Copyright Act of 1976, Sec. 408, 1976)
  • Registration may still be required before suing infringers.
It was no longer a big leap to achieve conformity with Berne Convention standards. In 1989, the United States officially joined the Berne Convention.

The Trouble With "No Formalities"

For most of American history, copyright formalities put a substantial burden on authors, with several opportunities to slip up and lose protection:
"Given the complexity of these formalities, the cost of compliance was not trivial, and the consequences of noncompliance were severe. Failure to comply would result in copyright failing to arise (registration), being unenforceable (notice, deposit), or being subject to early termination, with entry of the work into the public domain (renewal)." (Sprigman, 493)
To a certain extent, the Berne Convention's push to remove formalities made sense as a way to more reliably protect authors' rights. It also fit with a popular European view that copyright is a kind of moral right which comes into existence the moment a work is put into a fixed form. Legal copyright would therefore serve to recognize and enforce a pre-existing moral copyright. Contrast this with the U.S. Constitution's utilitarian (goal oriented), positive (created by law) characterization of copyright: "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries" (art. I, § 8, cl 8). This allowed the U.S. government much more leeway on crafting law to promote these specified public goods. Requiring registration was a way to ensure that some official information was recorded about each copyrighted work; requiring renewal was a way to ensure neglected works would enter the public domain more quickly...or at least that the official information would be updated. The details of compliance were arguably too burdensome, but the removal of formalities has led to other problems.

Despite continued growth in writing and publishing, now-voluntary copyright registration has leveled off (Sprigman, 2004, p. 496):


And now-voluntary renewals are on their way to extinction (Sprigman, 2004, p. 498):


This means a smaller and smaller proportion of the kinds of works that were traditionally registered are being registered. And of these, an even smaller proportion are being renewed. By comparison:

Old Way
  • Many works never under copyright because their creators did not consider them worth the trouble of registering.
  • Registration records exist for copyrighted works.
  • Renewal records exist for works under extended copyright.
New Way
  • All works under automatic copyright, including poems in notebook, blog posts, personal song recordings, dance routine descriptions, etc.
  • Registration records might not exist for copyrighted works.
  • Renewal records probably don't exist for works under extended copyright.
What's the problem with this? The chance of relatively recent works becoming "orphaned" has greatly increased. A work is orphaned when locating its copyright owner becomes prohibitively difficult or outright impossible. Publishers can't reprint it. Creators can't seek permission to use it or adapt it into new works. And, of course, authors and their heirs miss out on potential income. When authors cannot be located, everyone loses.

Amazing Stories and Weird Tales

In a sense, there are two orphan works problems. The removal of formality requirements in the late 1970s — as preparation for joining the Berne Convention — has caused a problem with contacting the owners of unregistered or unrenewed works. But there was already a problem with official information falling out of date. A novel published in 1923 and renewed in 1950 is still under copyright until at least 2018. The name of the person who renewed it 62 years ago might not sufficient to discover who owns the copyright in 2012.

Think of these as the "no official records" and the "outdated official records" orphan works problems. One area of literature strongly affected by these problems is modern fantastic fiction, here defined as the science fiction and fantasy genres. A little history:

Science fiction and fantasy both got their start in the age of universal public domain (i.e. before 1923). Jules Verne, H.G. Wells, and Edgar Rice Burroughs were especially effective pioneers of science fiction from the 1860s through the 1910s. Fantasy fiction goes back to folklore, but it began its transformation into modern fantasy from the 1850s through the 1910s in the works of George MacDonald, Lewis Carroll, and L. Frank Baum.

Interest in these genres greatly expanded in the 1920s with the rise of pulp magazines offering monthly short stories on the cheap. Weird Tales began publishing fantasy and horror stories in March 1923. Amazing Stories began its run of science fiction stories in April 1926. Other pulp magazines hopped on the bandwagon and public interest in these genres continued to grow, spurred on by the publication of now-classic novels like Brave New World (1932), The Hobbit (1937), The Sword in the Stone (1938), Foundation (1942), 1984 (1948), and The Lion, The Witch, and the Wardrobe (1950). These novels and certain pulp stories like those of H.P. Lovecraft have been nearly continuously republished, but copyright owners for many lesser-known works published from the 1920s to the 1990s are difficult or impossible to locate today.
"There are scores of dead writers whose work is gone and forgotten because there is no one able to take responsibility for the rights. I bought a story from the estate of Richard McKenna a few years ago. The woman from whom I acquired the rights was his aged sister-in-law or someone like that. If that woman doesn't pass the rights on to someone else and let anyone know about it, Richard McKenna's work will not be reprinted for what, another 30 years? Do you really think anyone will remember who he is then? They barely remember him now.
Gerald Kersh is another example. I spent two years trying to track down rights to no avail. Someone who is a Kersh aficionado tried for two years before me. I finally was able to publish a couple of short stories by him via quasi legal means that protect my company from litigation. Kersh was a terrific writer and his stories deserve to be read.
That's why there is a problem." (SFWA, 2005, p. 9) [with minor corrections]



Pulp stories in the 20s and 30s. McKenna and Kersh in the 60s. The "outdated official records" problem is smudging out the fine lines of fantastic fiction's development, leaving only the thickest strokes. This would have been a problem even without the lifting of copyright formalities. Today, the "no official records" policy is compounding the issue:
"Since works are given copyright protection the moment they are written, there is no ready way to find authors to seek their permission to republish material, and the penalties for infringement are high, there is a lot of material that cannot be republished because the authors are essentially unlocatable. That is, the cost to locate them, if they can even be located, is often too high to justify the use of the work. Factoring in the 95 years / Life+70 years duration of copyright, a large amount of work is likely to be unrepublishable for over a hundred years and possibly lost altogether." (SFWA, 2005, p. 1)
In 2056 — the same distance into the future as the publication of Gerald Kersh's Nightshade & Damnations in the past — an editor may want to include a short story from 2012 and have even less hope than the publisher quoted above because the story was never officially registered.

Fantastic Fixes

On January 26, 2005, the U.S. Copyright Office put a notice in the Federal Register, asking for "written comments from all interested parties" on the topic of orphan works.
"The issue is whether orphan works are being needlessly removed from public access and their dissemination inhibited. If no one claims the copyright in a work, it appears likely that the public benefit of having access to the work would outweigh whatever copyright interest there might be." (Orphan Works, 2005)
The Copyright Office received over 700 initial responses from individuals and organizations! One of the "interested parties" was the Science Fiction and Fantasy Writers of America. The SFWA (as it's abbreviated) put out its own call for comments. Some of the resulting anecdotes are cited above. After lively internal debate, SFWA's formed-for-the-occasion Orphan Copyright Committee agreed on a set of seven proposals "felt to comprise a feasible solution to the problem and a dramatic improvement over the current situation" (SFWA, 2005, p. 2).

These proposals can be roughly organized into three themes: modernizing and simplifying the registration process (#1, #3, #5, #6), creating a legal path to using orphan works (#2, #3, #4), and issuing guidance on "succession of copyright interests (#7). To simplify even further, the proposals seek to make orphaning less likely to occur, and to open the remaining orphan works for responsible use.

SFFA's main recommendation for improving registration is the establishment of an Author Information Directory. This would be an online database that offers free or nearly free account setup for authors to enter information about their works and keep their contact information up to date. Authors could be encouraged to include at least the first 100 words of their works and would have the option to add notarized forms or digital signatures to verify their identity. From the point of view of authors, the directory would serve the dual function of providing more opportunities for royalties and of eliminating the chance of their works being used under the new rules for orphan works.

What new rules? After conducting a search according to guidelines drawn up by the Copyright Office, followed by a multi-month posting of public notice, publishers could pay into an escrow fund at a common rate for similar works. Such works could then be published for a limited time without fear of lawsuit. Authors who later come forward would simply be able to claim the funds already set aside for this purpose. Publishers who don't follow these guidelines would be fully at risk of current legal remedies for copyright violation.

Congressional (In)action

After taking comments from SFWA and hundreds of other groups, the U.S. Copyright Office issued a Report on Orphan Works to summarize concerns and give its own proposed solutions. The Copyright Office rejected calls for any kind of new database, worried that it would be too "burdensome" at this time, but recommended revisiting the question in ten years. Also rejected were the calls for specific search guidelines (libraries and archives opposed it), an escrow system (too complex), or a public notice requirement (publishers were against it). The Copyright Office did recommend legislative changes to limit legal remedies to "reasonable compensation" when copyright infringers are able to prove they had conducted a thorough search.
"The term 'reasonable compensation' is intended to represent the amount the user would have paid to the owner had they engaged in negotiations before the infringing use commenced." (U.S. Copyright Office, 2006, p. 116)
This compensation would not apply to non-commercial users, who would only be required to cease infringement activities immediately (U.S. Copyright Office, 2006, p. 13). The report ended with recommended legislative language.

From 2006 to 2008, several bills made their way through the House and Senate, based on the Copyright Office's report. The most successful bill was the Shawn Bently Orphan Works Act of 2008  which passed unanimously in the Senate. A similar bill, the Orphan Works Act of 2008, stalled out in the House.

The Senate bill echoed the Copyright Office's recommendations about limiting legal remedies to "reasonable compensation," and waiving even this compensation if the infringement was (1) non-commercial,  (2) "primarily educational, religious, or charitable in nature," and (3) stopped on receipt of a valid claim of infringement. Also following recommendations, the bill required evidence of a "qualifying search" before infringing, plus clear attribution while infringing. The Senate bill added a requirement that a new symbol for orphan works be created and used to label such publications (S. 2913 § 2).

The House bill's most controversial difference was the requirement of a "notice of use archive": a database where users of orphan works must document the work they are using, what steps they took to locate the copyright owner, how the work be used, and contact information for the user (H.R. 5889 § 2). Prominent library groups opposed this requirement on the grounds that it would be too burdensome on large organizations wanting to use many orphan works (Adler, 2008). Some artists opposed the archive because they believed it would be too friendly to large organizations wanting to use many orphan works! There appears to have been a significant amount of misinformation going around in artistic communities at the time (Huttler, 2008).

The whole issue has been effectively shelved by Congress since 2008.

Attack of the Powerpoints

In April 2012, the Berkely School of Law held an orphan works symposium. Among the ideas floated during these talks was Jennifer Urban's suggestion that existing Fair Use law might be applicable to orphan works (2012). One of the four factors of Fair Use analysis concerns the "nature" of the copyrighted work, but what this means, exactly, is not spelled out in federal law. Urban cited cases where availability played some role in Fair Use decisions and argued for expanding this line of thinking to explicitly cover orphan works.

Lydia Loren advocated a change in metaphor: rather than continue using the term "orphan works," labeling them as "hostage works" would emphasize the way these are "works that are held hostage by the complexity of our copyright system. By its duration, by its lack of formalities, and then of course, coupled with the absentee owner" (2012, 2 min). Under this metaphor, users might be seen as hostage-liberators rather than orphan-exploiters. Loren also showed a troubling graph from a talk by Paul Heald (2012, 12 min 45 sec):


The main lesson to draw from this graph is that books in the public domain from before 1923 are still very popular. Same goes for recent books under copyright. It's that dip from the 20s through the end of the century that shows a severe under-representation of what was written in those decades. New works do have novelty going for them; public domain works tend to have low prices going for them, thanks to both the lack of royalties and competition. So while a moderate dip is only to be expected for older, copyrighted works, it's very likely that the orphan works problem has aggravated the situation.

Notice where the bulk of science fiction and fantasy's genre development occurred on the graph above. For fantastic fiction and all the other fading stories created in that gap, orphan works legislation would open exciting new opportunities for rediscovery and appreciation.

My Two Cents

This paper has focused on written works, but copyright law also applies to music, dance, visual arts, architecture etc. Creators in these areas aren't necessarily going to be well-served by orphan works legislation that focuses on texts. Today's technology is completely up to the task of storing and matching text, but still very much in development for finding re-used melodies, dance steps, or even photographic remixes. It might be smart to push for text-specific orphan works legislation first, as a kind of pilot program. When the creative world doesn't come to an end and information technology has improved, other types of content could be added.

The biggest flaw in orphan works legislation hasn't been the legislation itself, but misunderstandings, misrepresentations, and outright scare mongering. What's needed are multiple promotional campaigns by libraries and artists' groups (like SFWA). Specific examples of unrepublishable works would be most effective because it would raise awareness and increase interest in what the public is missing. What if a copyright owner appears because of these campaigns? There would be an opportunity to show the benefits of reconnecting owners with interested publishers! If the owner allows it, the book could even be marketed as a "rescued orphan." Everyone wins.

It's important to keep in mind that no orphan works legislation is going to be perfect; it just needs to meet the realistic goal of being a strong improvement over the current situation. Laws can always be amended later to more perfectly reflect contemporary values and technology. It just takes that first daring step to try something new.


References

Adler, P. S. (May 1, 2008). RE: S. 2913 [letter to Senators Leahy and Hatch on behalf of the Library Copyright Alliance]. Retrieved from http://www.sla.org/pdfs/publicpolicy/LCA050108DarkArchive.pdf

Berne Convention for the Protection of Literary and Artistic Works (1979, revised from 1886). Retrieved from http://www.wipo.int/treaties/en/ip/berne/trtdocs_wo001.html

Copyright Act of 1976, Pub. L. No. 94-553. 90 Stat. 2541 (1976). Retrieved from http://en.wikisource.org/wiki/Copyright_Act_of_1976#.C2.A7_408._Copyright_registration_in_general

Heald, P. (March 16, 2012). Do bad things happen when works fall into the public domain: The market for audiobooks. [Seminar video]. http://www.youtube.com/watch?feature=player_detailpage&v=-DpfZcftI00#t=765s

Huttler, A. (April 28, 2008). Orphan Works Act of 2008. [Web log post]. Retrieved from http://www.fracturedatlas.org/site/blog/2008/04/28/orphan-works-act-of-2008/

Loren, L. (April 12, 2012). Abandoning the orphans: An open access approach to hostage works [Audio presentation] Retrieved from http://media.law.berkeley.edu/qtmedia/BCLT/bclt_20120412-symposium/day1/Loren.m4a

Orphan Works, 70 Fed. Reg. 3739 (2005). Retrieved from http://www.copyright.gov/fedreg/2005/70fr3739.html

Orphan Works Act of 2008, H.R. 5889, 110th Cong. (2008). Retrieved from http://thomas.loc.gov/cgi-bin/bdquery/z?d110:h.r.05889:

Peters, R. (1850). The Public Statutes at Large of the United States of America, From the Organization of the Government in 1789, to March 3, 1845 (Vol. 4). Boston: Charles C. Little and James Brown.

Science Fiction and Fantasy Writers of America, Inc. (March 23, 2005). RE: Orphan Works Study (70 FR 3739). Retrieved from http://www.copyright.gov/orphan/comments/OW0607-SFFWA.pdf

Shawn Bently Orphan Works Act of 2008, S. 2913, 110th Cong. (2008). Retrieved from http://thomas.loc.gov/cgi-bin/bdquery/z?d110:s.02913:

Sprigman, C.J. (2004). Reform(aliz)ing copyright. Stanford Law Review, 57. p. 485-568. Retrieved from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=578502

Urban, J. (April 12, 2012). Orphan works and mass digitization: Obstacles and opportunities. [PDF presentation]. Retrieved from http://www.law.berkeley.edu/files/Urban.pdf

U.S. Copyright Office. (2006). Report on Orphan Works. Retrieved from http://www.copyright.gov/orphan/orphan-report.pdf