Seek and Ye Shall Find (Maybe) – Wired 1996

Sometimes it takes a look back to look forward. With all the talk going on about folksonomies, a re-read of this was in order for me.

…Created in 1994 by Jerry Yang and David Filo, two disaffected electrical engineering and computer science
grad students from Stanford University, Yahoo! lists more than 200,000 Web sites under 20,000 different categories. Sites that track pollution, for example, are listed under Society and Culture:Environment and Nature:Pollution. These categories form what the people at Yahoo! a bit pretentiously refer to as their ontology – a taxonomy of everything. Their ordering of the Web is precise enough – and intuitive enough – that almost 800,000 people a day use Yahoo! to search for everything from Web-controlled Christmas trees to research on paleontology. In almost every way you can measure, Yahoo! has successfully exerted order on the chaotic Web.

…But how much longer can its hold last?…It’s a concern that Jerry Yang, the less publicity shy of the two founders, had been thinking a lot about lately. …As he told me, leaning back and raising his arms in an exaggerated shrug, “I like tough problems. The harder to solve, the better. And organizing the Web is probably the hardest information science problem out there.”

That may be, but Yahoo!’s technology, at least, is relatively straightforward. Yahoo! works like this: First, the URLs of new Web sites are collected. Most of these come by email from people who want their sites listed, and some come from Yahoo!’s spider – a simple program that scans the Web, crawling from link to link in search of new sites. Then, one of twenty human classifiers at Yahoo! looks the Web site over and determines how to categorize it.

Really, the only hard part – the only part that your average high-school geek couldn’t do – is developing the classification scheme. The ontology.

…To solve this problem, Yang and Filo hired Srinija Srinivasan as their “Ontological Yahoo!” Another former Stanford student, Srinivasan is unfailingly helpful, quick to answer any question in her relaxed California accent. Perhaps that’s why Newsweek claimed she was trained in library science when including her among the 50 people who matter most on the Internet.

…A few months ago, Srinivasan told me, she was adding categories and making changes to the ontology almost every day. Now major adjustments are becoming much more infrequent. She pointed to this as support for Yang’s assertion that “at some point, our scheme will become relatively stable. We will have captured the breadth of human knowledge.”

…a story he and Srinivasan told me about recent events at Yahoo! left me convinced I would have to look elsewhere for the answer.

The story began when the Messianic Jewish Alliance of America submitted its Web page to Yahoo! A classifier quickly reviewed the site – which contains everything from Stars of David to articles about Israel, not to mention the word “Jewish” in its name – and placed it under Society and Culture:Religion:Judaism.

But here’s where things got tricky. True, MJAA members are born of Jewish mothers and are hence, by definition, Jews. But they also believe that Jesus Christ is the messiah. In the eyes of most Jews, that makes the MJAA a bunch of heretics. Or at least Christians.

So when a few vocal and Net-savvy Jews saw the MJAA listed under Judaism, they let loose a salvo of email demanding that Yahoo! remove MJAA’s listing. A bit taken aback by the protesters’ virulence (“threats of boycotts,” Yang said with amazement), Yahoo! yielded and reclassified MJAA under Christianity with a cross-reference from Judaism. Of course, this caused the MJAA to protest that they were now being incorrectly labeled. After a modern-day Solomonic compromise, the MJAA and a few similar groups can now be found listed under Society and Culture:Religion: Christianity:Messianic Judaism – which is linked by a cross-reference from Judaism.

Yang looked at me sheepishly when telling this story. After all, he believes in truth, justice, and the Internet way. Hell, he even gave me a mini-sermon that morning about how the Net is egalitarian – the little guy can publish just as easily as the big guy. Yet, he knows the MJAA was pushed around because it didn’t have mainstream Judaism’s clout.

But the MJAA story is interesting not just for exposing the realpolitik of classification. It’s proof that no ontology is objective – all have their own biases and proclivities. Yang was quick to admit this: in fact, he referred to Yahoo!’s ontology as the company’s editorial. “Organizing the Web is sometimes like being a newspaper editor and inciting riots,” he said with a touch of exasperation. “If we put hate crimes in a higher level of the topic hierarchy, well, it’s our editorial right to do so, but it’s also a very heavy responsibility.”

Yahoo!’s success, Yang argued, is evidence that point of view and knowledge classification are not incompatible. Just as we learn to automatically compensate for right-wing bias while reading The Wall Street Journal’s editorial page, we can also learn to adjust for the perspective that Yahoo! embodies. …The real problem, Yang and Srinivasan agreed, is making sure that Yahoo!’s point of view remains consistent even as the company expands to keep up with the growth of the Web.

After all, Yahoo!’s point of view comes from having the same 20 people classifying every site, and by having those people crammed together in the same building where they are constantly engaged in a discussion of what belongs where. Lose that closeness and the biases will start to become more diffuse. Yang admitted as much, saying, “It’s hard to expand Yahoo!, because you end up with too many points of view.” Instead of the Journal’s editorial page, you end up with something like CNN, where prejudices are masked by a pretense of objectivity. For Yahoo!, that translates to a category scheme where users have a hard time guessing where they’ll find what they’re looking for.

…In my mind, Yang identified the problem with Yahoo! when he noted that “it is much more of a social-engineering problem than a library or computer science problem.” By relying on human intelligence to organize the Web, Yahoo! falls victim to subjectivity.

Wired: Seek and Ye Shall Find(Maybe): May 1996