» Archive for the 'Search' Category

In the briefing room: Yakabod’s Yakabox

Wednesday, August 12th, 2009 by Cody Burke

When one strips away all the marketing hype, technical terminology, and buzzwords from knowledge sharing and collaboration products, the real measure of a tool is simple: does it help get work done?  The future of the knowledge workers’ workspace is the Collaborative Business Environment (CBE) but, until our vision is addressed and realized by vendors in this space, it is incumbent upon companies to find tools that support the CBE’s basic principles, namely to provide a single work environment for knowledge workers, reduce friction in knowledge sharing, and embed community into the workspace.

It is easy to lose sight of the fundamental question an organization should be asking when deploying a knowledge sharing and collaboration tool, that is: “how will this tool help my company get work done?”  This often happens because products and tools are segmented into arbitrary and confusing market segments (just look at the variation in TLAs in the content management market, you have CM, ECM, WCM, DM, among others).

A breath of fresh air in this space is Yakabod; the company offers a product, the Yakabox, that promises to be an end-to-end platform that gets work done.  This offering is a hardware appliance incorporating enterprise search, content management, collaboration, and social networking functionality.  A hosted version is also available.  Yakabod’s value proposition is to keep things simple by placing those four applications in one place, aiding in knowledge sharing, collaboration, and the ability to find what one is looking for.

The user interface is very clean and straightforward, and features an activity feed-like stream of items that are relevant to the user, as well as user profiles and favorites that are content-based, such as documents, teams, blogs, or any other item in the system.  What is presented in the activity feed can be fine tuned via a “Matter Meter”, which can be adjusted to show items of varying degrees of importance.  A busy knowledge worker, for example, could set the meter to only show items of high priority.  Yakabod’s enterprise search works in a similar way: the system learns a user’s preferences and adjusts search results accordingly based on relevance to the user.  The results are drawn from structured and unstructured data sources, including online repositories, wikis, social tools, and existing legacy systems.

To make deployment easier, the Yakabox integrates with existing sources such as Microsoft SharePoint and Office, shared drives, and electronic repositories.

Security is a strong point for the Yakabox.  The company has its roots in providing collaboration and knowledge sharing tools to the U.S. Intelligence Community, and the Yakabox meets Department of Defense PL3 security standards.

One promising aspect about Yakabod’s philosophy as a company is the recognition that knowledge sharing and collaboration applications such as enterprise search, content management, collaboration, and social networking are interconnected and interdependent.  Put simply, when these normally disparate elements are combined, the sum is greater than the parts.  The Yakabox may be in some respects closer to the Collaborative Business Environment than many other offerings currently on the market: it provides a single, overarching environment for knowledge workers, reduces friction in knowledge sharing through tight integration, and embeds collaboration tools into all areas of knowledge work via social networking functionality.

Cody Burke is a senior analyst at Basex.

In the briefing room: SAS Content Categorization

Thursday, July 30th, 2009 by Jonathan Spira

Looking for something?  If it’s enterprise content, you probably won’t find it.

Locating content and information in the enterprise is a considerable challenge, one that not only hampers organizational productivity but also throttles individual knowledge worker efficiency and effectiveness.  Workers typically use search tools to find content and this is where their struggle begins.

There are two key problems with search technology today: 1.) such systems provide “results,” not answers, and 2.) they do not support natural language queries.  In addition, typical search tools do not always understand relationships and context: Java could refer to a type of coffee, an island in Indonesia, or a programming language.  Typing “Java” into the Google search engine returned results only relating to Java as a programming language for the first three pages.

Thanks to the various flaws common to most search tools, 50% of all searches fail.  The good news is that those failures are obvious and recognized by the person doing the search.  The bad news is that 50% of the searches people believe succeeded actually failed in some way, but this was not readily apparent to the person doing the search.  As a result, that person uses information that may be out of date, not the best response for what he was looking for, or is simply incorrect.  (We call this the 50/50 Rule of Search.)

The problems with search contribute greatly to the problems of Information Overload in the enterprise.

According to research conducted by Basex in 2006 and 2007, knowledge workers spend 15% of the work day searching for content.  This figure is far higher than it needs to be, and represents the time knowledge workers waste as a result of poor search tools, bad search techniques on the part of knowledge workers, and a lack of effective taxonomies.

In an age of Information Overload, where we create more content in a day than the entire population of the planet could consume in a month, more effective tools are needed.  One approach towards improving search is better and more effective categorization.  We recently had a look at SAS Content Categorization, one promising product in this space.  Content Categorization helps to categorize information so that search engines can present relevant results faster by having the user navigate through topics/facets related to the user’s query.

SAS acquired Teragram, a natural language processing and advanced linguistic technology company, in March 2008.  After integrating Teragram as a division, SAS launched Content Categorization in February 2009.

The offering enables the creation of taxonomies and category rules to parse and analyze content and create metadata that can trigger business processes.  Taxonomies and category rules are created via the TK240, a desktop tool for administration and taxonomy management that is a component of SAS Content Categorization.  Once a taxonomy is created, high level categories are selected, followed by narrower ones.  There is no limit as to how granular the categories can go, allowing for users to drill down on topics.  The system also includes prebuilt taxonomies for specific industries such as news organizations, publishers, and libraries.

Whoever is doing the setup – and SAS Content Categorization is designed for use by non-technical users – can develop category rules from within the TK240 as well.  The rules may consist of multiple keywords, based on the percentage appearing in a document, as well as weighted keywords that give more value to certain words than others.  Additionally, it is possible to apply Boolean operators, so, for example, to meet the rule Java and programming must appear in the same sentence, while Java and coffee appearing in the same sentence would not meet the rule.  Rules can be created for extremely specific situations, such as the presence of URLs, grammatical instances, or the presence of suffixes (Inc., Corp., AG., etc.).

The system is also equipped with options for setting role-based permissions to allow users to read/write, and enable multiple users to collaborate on developing taxonomies.  This allows multiple taxonomists to have secure access to projects, with individual levels of read/write access to category rules and concept definitions.

SAS Content Categorization can be an effective weapon against Information Overload by allowing the creation of complex automated systems to categorize content, increasing the likelihood of the knowledge workers being able to find what they are looking for in a timely manner.  In addition, increasing the relevance of search results by using taxonomies to provide context raises the value of content that is found, decreasing the likelihood of knowledge workers moving forward with second-best or faulty information.

Companies looking to take decisive measure to lower Information Overload should carefully review their current search tools and, where appropriate, give serious consideration to SAS Content Categorization.

Jonathan B. Spira is CEO and Chief Analyst at Basex.
Cody Burke is a senior analyst at Basex.

Google Breaks Microsoft Search?

Wednesday, June 17th, 2009 by David Goldes

The fun never ends in the Microsoft v. Google wars. Chris Vander Mey, Goole’s senior product manager for Google Apps, acknowledged that certain programs, such as Windows Desktop Search, that work directly with the outlook data file “don’t currently work well” with Google Apps Sync for Microsoft Outlook.

According to a  post by Vander Mey on Google’s enterprise blog, Windows Desktop Search will not properly index Google Apps Sync data files. In order to prevent indexing from running indefinitely, the Google Apps Sync installer disables it.

Microsoft’s Outlook product manager, Dev Balasubramanian, writing in an official Microsoft blog, said Apps Sync includes a “serious bug/flaw” which disables Outlook’s ability to search data such as e-mail and contacts.   He also provided a fix which involves editing the Windows registry, something many users may not wish to do.

Balasubramanian further stated that the problem impacts Outlook’s search capabilities, not just Windows Desktop Search because Outlook search “relies upon the indexing performed by Windows Desktop Search.”  Google contends that Outlook search will work even if Windows Desktop Search does not.

David M. Goldes is president and senior analyst at Basex.

In the Briefing Room: BA-Insight Longitude

Thursday, June 11th, 2009 by Cody Burke

Without question, search is the Achilles heel of knowledge work.  It is almost universally acknowledged that 50% of all searches fail.  The dirty little secret in search – and one that we uncovered through research we conducted in 2007 – is that 50% of the searches that knowledge workers believe to have been successful also fail in some manner (i.e. outdated information, second-best information, or content that is just outright incorrect).

Obviously, failed search is a major issue and large contributor to information overload.  Part of the problem is not the search technology per se, but the selection of the source that provides the results.  If a search only looks through unstructured data, it ignores the valuable information that exists as structured data.  Search tools need to look at all information sources in order to return not only complete results, but to rank results for disparate data sources accordingly.

For reasons that are inexplicable to this writer, many companies have not chosen to deploy search tools that examine every nook and cranny of a company’s information assets.  A few smart companies are deploying search tools that do look in every knowledge repository.  Exercising due diligence in searching can avoid failures that result from searching in a partial source set.

BA-Insight is a company that is attempting to even the odds through Longitude, its search enhancement for Microsoft SharePoint and Microsoft Search Server, as well as connectors to ERM, CRM, and EMC platforms.  The premise is simple; by expanding the sources through which a search is conducted, as well as improving the user interface, search results will have more value, be found in less time, and be easier to utilize once found.

Longitude search product enhances SharePoint Server and Microsoft Search Server by presenting results in page previews, eliminating the need to download the document.  The preview is presented in a split screen, with the search results on one half, and the preview panel on the other.  When a document is selected, the preview opens to the relevant page, not just the beginning of the document.  This saves time in two ways; one, the time it would take to open what might be an undesirable document, and two, the time it would take to find the relevant text in the document by scrolling though it manually.  Longitude also supports collaborative work by making functionality such as e-mailing documents, editing, and adding tags and bookmarks, available from within the preview panel.

Longitude supports federated search through multiple repositories of both unstructured and structured data via connectors for Lotus Notes, Documentum, Exchange, Microsoft Dynamics DRM, and Symantec Enterprise Vault among others.  Content is assigned to metadata automatically as users search and find content, and search is guided through Parametric Navigation that takes the metadata into account to search using complex queries.

Knowledge workers spend on average 15% of the day searching.  We know that 75% of those searches fail when we account for the two types of failure previously mentioned.  Clearly the odds of finding what one is looking for are against the searcher.  Most tools in a company don’t search in enough places, and because of technology sprawl, knowledge workers are just as likely to have stored critical information in a vat that is not touched by the search system as one that is.  Tools such as Longitude go a long way towards evening the odds for the knowledge worker.

Cody Burke is a senior analyst at Basex.

Wolfram Alpha – Better Search?

Friday, May 15th, 2009 by David Goldes

Wolfram Research announced the launch of the Wolfram Alpha computation engine.  The new tool is intended to provide specific and precise factual answers to questions rather than present a list of Web sites which may or may not contain the correct answer.  A key problem in search technology today is that such systems provide “results” instead of answers.  50% of all searches fail as a result; we have found that 50% of searches people believe succeeded also failed in some other way, by not providing the most up-to-date, accurate, or correct information although the individual conducting the search believes the answer is correct at the time.

Back in March, founder Stephen Wolfram wrote on his Web site that, “[F]ifty years ago, when computers were young, people assumed that they’d quickly be able to handle all these kinds of things … and that one would be able to ask a computer any factual question and have it compute the answer.”  We all know that’s not how things turned out simply by going to Google, currently the most popular online search tool, and entering a search query.

Wolfram Alpha, according to Wolfram himself, understands questions that users input and then calculates answers based on its extensive mathematical and scientific engine.

The system is scheduled to go live later today at www.wolframalpha.com.  We’ll find out then whether Wolfram has found a way to build a better search tool.

David M. Goldes is the president of Basex.

In the Briefing Room: Kosmix

Thursday, April 30th, 2009 by Cody Burke

Knowledge workers have traditionally had a love/hate relationship with search technologies.  The vast amounts of information that we must shift through to find data that is relevant to us in a given situation make search tools a necessity.  We love being able to quickly find the current price of a product and the location of stores selling it or the most recent article on a key competitor.  The flip side is that our search tools actually fail us most of the time; 50 percent of search queries fail outright (these we are aware of), and of the 50 percent that we believe succeed, a further 50 percent of those fail us in some way that we may not even realize.

Although we have largely resigned ourselves to a world of Google searches that return results instead of answers, there is no shortage of those who are laboring to reimage search and attempt to address some of its fundamental flaws.

One such company, Kosmix, is taking a slightly unorthodox approach: they are not even attempting to fix search.  Instead, Kosmix is targeting the way in which we browse topics, and leaving the navigation aspect of search (finding a specific Web site) to Google and its ilk.  By separating discovery and research from traditional search, Kosmix is attempting to divide and conquer the search problem by zeroing in on a key weakness of results-based searching, namely the presentation of the contextual information that surrounds a topic.

Kosmix’ core product is its eponymously-named Web site, currently in beta, which allows users to browse content by topic.  The content is pulled from around the Web and presented in modules; a search for netbooks, for example, yields a definition from Wikipedia, images from Google and Flickr, related question and answer threads from Yahoo Answers, reviews and guides from EHow, video content from Truveo and Blinkx, Google blog search results, content from tech-related Web sites, relevant Facebook groups, shopping options from EBay and Amazon, and a summary of related items such as specific brands of netbooks and related topics that can be drilled down.

For comparison, a Google search for netbooks resulted in 35,700,000 results, with the only organization of the links being small subsets for news and shopping.

The content that Kosmix presents may not please everyone; automated editorial choices are made as to where to pull content from on a query-by-query basis based on what is available, the value of a site, and the relevance of articles.  For example, the system takes a query then determines what video site’s content is best suited, based on relevancy and ratings on the site.  Kosmix acknowledges that the aggregated content is not always a perfect fit but is working to improve the system in order to deliver better results as it moves forward with the product.

Kosmix is a useful tool for research and discovery around a specific topic, and does a good job of presenting content in a manageable manner, from a broad variety of sources.  Leaving navigation to the established search companies is a wise move for Kosmix, as is demonstrating that there is a better way to find content online than Google searches that return results lacking in context, and more often than not, lacking the information we were looking for in the first place.

Cody Burke is a senior analyst at Basex.

The Googlification of Search

Thursday, March 19th, 2009 by Jonathan Spira

Google’s clean home page, combined with the simple search box, has made it easy to look up something online.  Indeed, using Google may just be too easy.

Google uses keyword search.  The concept sounds simple.  Type a few words into a search box and out come the answers.  Unfortunately, it isn’t that simple and it doesn’t really work that way.

Search is a 50-50 proposition.  Perhaps 50% of the time, you will get what appear to be meaningful results from such a search.  The other 50% of the time, you will get rubbish. If you’re lucky that is.

Why does this only work sometimes?  This is because there are two types of searchers, or more accurately, two types of searches.  One is keyword search, the second is category, or taxonomy, search.

It is possible to get incredibly precise search results with keyword search.  Indeed, there is no question that keyword search is a powerful search function.  Being able to enter any word, term, or phrase allows for great precision in some situations – and can result in an inability to find useful information in many others.

However, the use of a taxonomy, or categories, in search, allows the knowledge worker to follow a path that will both provide guidance and limit the number of extraneous search results returned.  Using a taxonomy can improve search recall and precision due to the following factors:

1.)    In keyword search, users simply do not construct their search terms to garner the best results.
2.)    Users also do not use enough keywords to narrow down the search.
3.)    Google’s search results reflect Google’s view of the importance of a Web page as determined by the company’s PageRank technology, which looks at the number of high-quality Web sites that link to a particular page.  This doesn’t necessarily mean that the first pages in the search results have the best content but only that they are the most popular.
4.)    Web site owners can manipulate Google and other search engine results through search engine optimization (SEO).  There is an entire industry built around this service and the use of SEO can dramatically impact the positioning of a Web site on the results page.

Unfortunately, in part thanks to Google’s ubiquity as well as its perceived ease of use, the concept of search to most people seems to equal keyword search.  As more and more Web sites and publications (the New York Times being one prominent example) move to a Google search platform, the ability to find relevant information may be compromised.

In the case of the New York Times, much of the functionality previously available disappeared when the Times deployed Google Custom Search.  Only those visitors who know to click on “advanced search” can specify a date range and whether they want to search by relevancy, newest first, or oldest first, although even the “advanced” search experience is still lacking compared to the Times’ earlier system.  Thanks to the Googlification of search, however, most visitors only access the search box, and their ability to find the answers they are seeking is hobbled by the system’s limitations.

Jonathan B. Spira is the CEO and Chief Analyst at Basex.

Google Glitch: Human Error the Culprit

Sunday, February 1st, 2009 by Jonathan Spira
The Google warning Saturday morning

The Google "warning" Saturday morning

A glitch in the Google search service caused the company to warn users – including me early Saturday morning – that every Web site listed in the results could cause harm to their computer.

While doing a search on Google at that time (yes, my work-life balance has been decimated), I noticed something funny about Google’s results.  Every result included a disclaimer that “[T]his site may harm your computer.”  Fearing a virus or other malware (although I couldn’t see how it could possibly have this effect), I tried several other computers including a Mac running Safari.  All searches, regardless of topic, computer, and browser, returned similar warnings.  In addition, although they were present and highlighted in green, the links to the actual Web sites were not clickable.

The problem seemed to last for about an hour.

Google later acknowledged on its blog that all searches during that time period produced links with the same warning message.

The warning was not limited to English

The warning was not limited to English

“What happened?” Google explained in the blog. “Very simply, human error.”  Unbeknownst to most of us, Google does maintain a list of sites that install malware on visitors’ computers in the background.

The list of sites is periodically updated and Google released an update Saturday morning.  This is where the human error comes in.  A Google employee included the URL of “/” in the list and “/” is part of all URLs.  Google caught this problem fairly quickly; according to the company, the maximum duration of the problem for a given user was ca. 40 minutes.  It seemed to impact  me a bit longer than that but then the problem disappeared.

Fortunately, I made several screen captures of the error for posterity.

Google does have a reputation for an extremely reliable service although errors do creep in from time to time.  Last month, a glitch in Google Maps sent drivers travelling within Staten Island on a 283-kilometer detour to Schenectady.

Jonathan B. Spira is the CEO and Chief Analyst at Basex.

Clicking on a link led to this page on Saturday.

Clicking on a link led to this page on Saturday.

Finding the Needle in the Corporate Haystack: Dow Jones and Generate join forces

Friday, April 25th, 2008 by Cody Burke

Searching for information is a time consuming task, one that often results in disappointment and overwhelming quantities of information.  A search may result in the correct answer, however, wading through and separating the relevant from the irrelevant is no small task for the knowledge worker.  Ultimately this is the consequence of searches returning correct results, but not necessarily correct answers.

With an eye towards resolving this problem, Dow Jones announced last week that it had acquired Generate, a business intelligence company, and would be forming a new Business and Relationship Intelligence unit within the Enterprise Media Group.  The Generate platform works by crawling millions of Web sites and extracting data on four million companies and over six million executives.  This, when combined with so-called trigger events, such as mergers, executive changes, venture funding, and partnerships, provides precise reports that allow companies to detect changes in the competitive landscape, identify prospects, and nourish their own networks.  Extraction is complemented by relationship mapping technology, showing the best possible path to approach an executive, anticipate shifts in the market, and make the most out of personal and corporate connections.  While busy executives may not think to update their Xing or LinkedIn profile for weeks after a change, if at all, a system utilizing Generate technology would pick up on a change almost instantaneously and ensure that those who need to be aware of the change are.

The savings in time and headaches for the knowledge worker – and by extension in productivity for the enterprise – from such a system should not be underestimated.  The pairing of Dow Jones and Generate is indicative of the massive importance of better searching technology to the knowledge worker.  The potential gains in the fight against Information Overload are no less exciting – as searching technology improves, those overwhelming piles of results will shrink, and the information we were looking for all along will float to the top.

Cody Burke is a senior analyst at Basex.

Searching for Search

Friday, November 30th, 2007 by Jonathan Spira

The search market is a very fragmented place.  And it’s been very active recently as well.  To make it simple, you can divide the search market into two groups.  In the entry-level group you will find products such as the Google Search Appliance and Google Mini, Oracle Secure Enterprise Search, and Microsoft´s MOSS (Microsoft Office SharePoint Server 2007).  However, these offerings pale in comparison to offerings from search companies such as Autonomy, Endeca Technologies, Fast Search and Transfer (FAST), Isys, Recommind, and Vivisimo.

Now Microsoft is focusing more closely on the enterprise search market with two new offerings, Microsoft Search Server 2008 and Search Server Express 2008, which are based on SharePoint technology.  The express edition is free but it is restricted to a single installation.  It does however include connectors to content stored in EMC Documentum and IBM FileNet platforms and provides support for federated search capabilities based on the OpenSearch standard.  The commercial edition will be priced (not yet announced) to be competitive, according to Microsoft.  Translation: it won’t cost as much as the offerings from companies focused solely on the search market.

Companies including EMC, Cognos, HP, Business Objects, SAS, and OpenText have already announced plans to support federation with Search Server and Search Server Express.  Federated search connectors will become available in the first half of 2008 when the final search offering is released.  (Federated search is the simultaneous search of multiple databases and data stores.  Search results from multiple platforms will be consolidated in one search results page, making the results more contextual and therefore more accurate.)

Just to make things more interesting (or confusing, take your pick), there is also a variety of free search tools, including IBM Omnifind Yahoo Edition.

The above does not even begin to take into consideration the area of desktop search, where Google has tremendous mindshare but where there are also excellent offerings from smaller companies, such as Copernic and X1.  That market is far from standing still:, just last week, Copernic announced a corporate edition for its desktop search offering, free of advertising but with a license fee.

Jonathan B. Spira is CEO and Chief Analyst at Basex.


google