dimanche 14 juin 2009

World Future News: Digital Domain - Hey, Just a Minute (or Why Google Isn’t Twitter) By RANDALL STROSS

TECHNOLOGY blogs have wondered whether Google is a lumbering giant in this Twitter moment, unable to handle streams of tweets that were broadcast just seconds earlier.

Google moves faster than some of its critics think. But even if didn't, the more important question is this: Do we really want Google's search engine to swallow Twitter's output as fast as it comes, without filtering, analyzing and ranking by authority?

http://www.psychologytoday.com/files/u45/twitter-hashclouds.jpg

"Real-time search begets real-time spam," writes Danny Sullivan, the editor in chief of the Web site Search Engine Land.

Anyone who signs up to follow a particular Twitterer receives tweets instantaneously, as they are dispatched (when the system is functioning). Filtering is not an issue in such cases: The 1.77 million followers of Britney Spears presumably look forward to receiving every morsel of information broadcast from her account.

But if one wants to search Twitter for tweets about a topic — say, about Ms. Spears, but encompassing anyone's tweet that happens to mention her — Twitter's data fill an ocean in which it's hard to find specific fish.

Twitter's search page says, "See what's happening — right now." But Twitter's database was not originally designed to be searched like Google's was. Last year, in fact, Twitter bought another start-up, Summize, to provide it with search functionality.

Even so, search performance on Twitter is sluggish compared with the live tweet stream. Mr. Sullivan notes that Twitter's search service does not consistently deliver real-time results: 20 or more minutes often pass before a given tweet appears in search results. At Google only hundredths of a second are needed to check its index when a search phrase is submitted. But to prepare, the company re-surveys the wide Web to update that index on a schedule that the company does not divulge. Some Web sites, like those of news organizations, are checked very often. Others await their turn in a rotating schedule of visits by Google's crawler, the software that collects copies of Web pages.

Peter Norvig, director of research at Google, says that Larry Page, one of Google's co-founders, has consistently pushed the company's engineers to index the most active Web pages faster. When the frequency was increased to hourly, Mr. Page insisted that the interval be referred to as "3,600 seconds" to emphasize that it would be reduced further, which it was.

Google checks news feeds constantly but does not so easily pull in tweets. At a press event in London last month, Mr. Page was asked to comment on any plans that Google had to search Twitter in real time. After praising Twitter for doing a "great job" in showing information to users in real time, Mr. Page said he had long been pushing his search teams to index every second. "They sort of laugh at me and go, 'It's O.K. if it's a few minutes' old,'" he said. "And I'm like, 'No, no, it needs to be every second.'"

A number of search start-ups have appeared recently that differentiate their offerings from older search engines' by playing up their specialized focus on the real-time Web. For example, OneRiot, based in Boulder, Colo., covers Twitter among other social media, but it has an intriguing means of reducing Twitter spam: it does not index the text in tweets — it plucks only the links, reasoning that the videos, news stories and blog posts that are being shared are what others will be most interested in.

OneRiot follows the link, checks for spam by comparing the content of the page with the content of the tweet, and then uses its own algorithms to figure out where the link should go in its always-changing index of "hot" items.

Strictly speaking, this is not real-time processing. But checking links before adding them to the index seems to be time well spent.

Tobias Peggs, general manager at OneRiot, said his company could process, check and index a link within 37 seconds. When asked why he bothered to measure the seconds if it took 20 or more minutes just to receive searchable tweets from Twitter, he explained that the delays at Twitter's search site did not affect his company's search service, which receives the data stream at the same time Twitter's own search engine does. Because one venture capital firm, Spark Capital, has invested in both OneRiot and Twitter, OneRiot has "access to Twitter data that other third parties don't," Mr. Peggs said.

GOOGLE crawls Twitter's Web site — the frequency is not disclosed — to collect the same links included in tweets that OneRiot collects, and these may show up in Google's search results. If Google were to negotiate direct access to the tweet stream that OneRiot enjoys, it presumably could move just as fast and match OneRiot's lists, like "most shared today" or "today's hottest videos."

Google's almost-real-time search provides much higher-quality results than does literal real-time search. When speaking about the need to index the Web "every second," Mr. Page acknowledged the usefulness of taking a wee bit of time to analyze the gathered information.

"If you really want up-to-the-second information, it's not going to be as good as if you're willing to wait a couple of minutes," he said. "I'm not sure everybody needs to be seeing this stuff every second."

Randall Stross is an author based in Silicon Valley and a professor of business at San Jose State University. E-mail: stross@nytimes.com.

...!... ...!... ...!... ...!... ...!... ...!... ...!... ...!... ...!... ...!... ...!... ...!... ...!... ...!... ...!...
WF/News
World Future News

"One best world it's possible"
::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Mundo Futuro / Información - http://mfinformacion.blogspot.com/
Mundo Futuro / Notícias - http://mfnoticias.blogspot.com/
World Future / News - http://4wfnews.blogspot.com/
Mondo Futuro / News  - http://4mfnews.blogspot.com/

0 comentários:

Carousel

MP3 Clips