Monday, November 1, 2010

With Help from You, New Search Engine Slashes Through Spam | Epicenter | Wired.com

With Help from You, New Search Engine Slashes Through Spam

Blekko CEO Rich Skrenta explaining how it monitors its growing cluster of search servers from the company's Redwood City, California office. Credit: Ryan Singel/Wired.com

Ignore for a minute the awkward name: Blekko.

Instead, behold that as of late Sunday afternoon, there’s a new search engine in town that’s got a fresh approach to weed out the ever-proliferating junk and spam sites polluting search results.

It’s no mean feat — it’s taken 3 years, some $25 million in venture capital, and a gamble that there’s enough people who care about good search to make its model of curated results work.

And Blekko does works, thanks to a little thing called slash tags.

Basically slash tags tell Blekko to limit your search to a human curated category of websites — a custom search. So say you want to find good resources for learning about arrays in PHP? Type “arrays /php.” Need a good pumpkin pie recipe. Yup, you guessed it – append the /recipes slash tag.

What happens is that an editor or set of editors decide what sites return good results in that particular category, and blekko only searches those sites when you include a slashtag in your query.

Why is such a thing necessary?

Well, according to CEO and co-founder Rick Skrenta, it’s because the web is filling up with spam and low-rent webpages from content farms like Demand Media, saying the web now has 100 billion urls, most created by bots.

“You need to bring large scale human curation and combine it with algorithmic techniques to bring the quality back,” Skrenta said. “If you have the set of the top 150 health sites, you know what, you really can answer nearly any health question, and you know what, you really don’t want to be searching outside of that set.”

“You don’t want to search the World Wide Web for health information,” Skrenta said. “It is scary.”

So will we all have to learn a new search language?

Blekko hopes to solve that as well.

After three months of an invite-only testing period with 8,000 volunteers, Blekko now has more than 3,000 slashtags including things such as /glutenfree, /liberal, /conservative and /colleges. The site will suggest some of these as you start typing in a query.

But for eight categories of searches, Blekko will automatically opt users in: health, colleges, recipes, lyrics, autos, hotels, and personal finance.

“Take a search like ‘cash-back credit card’,” Skrenta said. “These are train wrecks of spam on Google and Bing, but we can detect you did a personal finance query and opt you into a high quality curated set of results.”

Of course, there are some trade-offs.

Say you are searching for a recipe for say a Cambodian dish “Green Mango Salad.” You’ll find some mainstream results in /recipes, but you will miss the obscure, and possibly more authentic ones findable in a more global search. Oddly, you’ll be more in luck if you are searching for an obscure recipe like “Red Curry Cambogee” — for which there are no results under /recipe, so Blekko automatically runs a global search, finding you a recipe outside the curation.

So won’t this model degenerate into infighting over what gets included and rejected? What if spammers try to set up a category to find a way to sneak their sites into a trusted slashtag?

Skrenta, a Silicon Valley veteran whose last venture was the local news aggregator and discussion site Topix.net, has an answer for that as well.

His team has been there before.

They built the Open Directory, now known as DMOZ.org, the largest human categorization project in the world. The Open Directory, now owned by AOL and open-sourced, categorizes websites and is used by sites across the web, including Google, to rank and provide data on websites. At its peak, it had 88,000 volunteer editors and still has over a million categories.

At AOL, DMOZ had exactly one paid community guide, who relied on a trusted tier of editors to oversee their topical fiefdoms, much as Wikipedia has created a working, if fractious, community of trusted editors.

Skrenta thinks that crowdsourcing model can work for slashtags. Even in its launch, Blekko has a way for users to not only create their own slashtags, but also to work cooperatively, adding to both global slashtags run by Blekko, as well as those created by other users.

The key for Blekko is likely to be two-fold: one, create enough dedicated users to curate categories and reduce infighting to keep them involved and two, make slashtags largely invisible and automatic for the large majority of searchers, who just want to find information without having to navigate a fight between the moderators of the /techblogs and /techbloggers slash tags in order to find reviews of the newest tablet computer.

The core idea of applying human intelligence to aid searches has a long and munificent history on the web.

Yahoo rose to prominence on its curated directory. Google took over the search mantle after figuring out that links created by humans could be treated as votes to indicate what pages were the best (the fabled PageRank algorithm.) Ask.com’s Teoma team figured out that you could then use the information that users tended to click on say the third result to later boost that in future results, an insight quickly adopted by all the major search engines.

As for the name, well, Blekko was just an early name, culled from a url Skrenta owned for a server he had in college. The company went through an arduous process of hiring a naming firm and consumer testing, only to find they thought the expensive, tested name was boring and not memorable.

Blekko, ugly as it may be, somehow sticks in people’s heads.

Despite the name, Blekko may well be onto something.

Skrenta is optimistic, given the enthusiasm and return rates of its early testers.

And even if it doesn’t work out, he and his team built a search engine, a project that starts with copying most of the web onto a cluster of servers and figuring out how to make sense of them.

That’s /cool and /audacious, no matter if Blekko goes the way of the now-dead “Google Killer” Cuil, gets snapped up by Bing, or turns into BLKO on the NASDAQ.

Follow us for disruptive tech news: Ryan Singel and Epicenter on Twitter.

See Also:

Posted via email from ElyssaD's Posterous

No comments:

Post a Comment