» Archive for the 'Search' Category

In the briefing room: Comintelli Knowledge XChanger

Thursday, June 24th, 2010 by Cody Burke

The battle to find the right piece of content at the right moment is a never ending quest for the knowledge worker.

Calling all cars...

While most companies have organized their various internal content stores and many have contracted for authoritative external content from sources such as Factiva and LexisNexis, this is only half the battle.

All of this progress notwithstanding, a knowledge worker often has to search through multiple systems to find exactly what he is looking for.  Frequently, he may not end up with the best and most up-to-date content because the individual searches produced results different from those an aggregated search would have presented.

Comintelli, a Swedish company founded in 1999, addresses this challenge with its Knowledge XChanger offering.  The solution aggregates content from both internal and external sources and then classifies, organizes, and presents relevant items to knowledge workers.  The content is packaged and delivered to work groups in a role-based and customized format so that only the most relevant information is presented.  Additionally, users select topics and enter search terms to further drill down on an area and refine the result set.

Knowledge XChanger allows knowledge workers to publish information through an easy-to-use browser-based interface or via e-mail.  In addition, the system supports commenting, voting, and chat around content.

Users can personalize how they receive information by using automatic e-mail alerts and/or via a customized start page.

When the user does perform a search, he is tapping into content that has been drawn from vetted and authoritative sources, which could include internal sites or select external sources such as news sites as well as from content providers such as Factiva.

A particularly valuable feature in Knowledge XChanger is the ability to find experts on a given topic.  The system uses Knowledge Points, a customizable feature that assigns points to users based on activities, to determine expertise.  For instance, a user may receive points for every time he reads an article, searches on a term, or comments on content.  Users can search for individuals who have expertise in a given area.

Tools such as Knowledge XChanger are key components on the road to the development of true Collaborative Business Environments.  In addition, by aggregating and delivering timely and relevant role-based content to the knowledge worker, the system tackles several aspects of Information Overload relating to search and information management.

Finally, by supporting expertise location with the system’s ability to associate individuals in an organization with topics they have knowledge and interest, Comintelli has taken a big step in improving knowledge sharing and collaboration by connecting knowledge workers to each other and jump-starting the collaboration process.

Cody Burke is a senior analyst at Basex.

Searching for Needles in Haystacks: How our brain sabotages our searches

Thursday, January 28th, 2010 by Cody Burke

In a recent study funded by the U.S. Department of Homeland Security (DHS) and reported in LiveScience, researchers found that subjects’ expectations of finding something had a direct effect on their success rates for finding the items in question.

Found it yet?

Found it yet?

In the study, subjects looked at X-ray scans of checked baggage and tried to identify the presence of guns and knives.  In the first trial, a gun or knife was present in 50% of the bags, and subjects only missed the weapons 7% of the time.  In the second trial, the guns and knives were in only 2% of the bags, and the subjects missed the weapons 30% of the time.  In short, when something is harder to find, our accuracy in identifying it drops significantly.

This is a trick our brain is playing on us as it becomes bored when we do not find what we are looking for and stops paying attention, meaning we then miss things when they do appear.

While the implications for airline security are obvious and somewhat chilling, the implications for the enterprise are also worth examining.  Knowledge workers spend ca. 15% of their day searching for content.  Applying the lessons learned in the DHS study, we can assume that if a search query returns fewer correct results in relation to incorrect results, the knowledge worker’s accuracy in picking out the relevant items will decline.

Conversely, just as in the DHS study, if the correct to incorrect ratio is better, meaning there is a higher number of correct results, then the knowledge worker is much more likely to find more of them.

For knowledge-based organizations and providers of software to these groups, the lessons from this study are clear: search tools must be improved to provide better ratios of relevant, useful results.  Today’s search tools focus on returning large sets of results and the answers to a search query may very well lie somewhere within these.  However, the low signal-to-noise ratio virtually ensures low accuracy even if one were to comb through every last result.

Search results need to be highly contextual and limited in volume to ensure accuracy and provide a favorable ratio of correct to incorrect results.  This keeps the knowledge worker engaged and not feeling that he is looking for a needle in a haystack; this, in turn, increases the probability of identifying the needed content.

Cody Burke is a senior analyst at Basex.

Search: How to Find What You Are Looking For (or 5 Tips for Better Search)

Thursday, December 17th, 2009 by Jonathan Spira

50% of all searches fail in a manner that the person doing the search recognizes as a failure. 

cloud

What is it that you are looking for, my dear?

A far more significant problem is that 50% of the searches believed to have succeeded failed, but the person doing the search simply doesn’t realize it.  As a result, that person uses information that is at best out of date but more often incorrect or just not the right data.  When the “bad” information is then used in a document or communication, there is a cascading effect that further propagates the incorrect information.

In an age where Information Overload costs the U.S. economy ca. $900 billion per annum, finding the right information has become far more critical.

To increase the odds that you will find what you are looking for, we’ve prepared five simple search tips that should result in better and more accurate results, regardless of where you are searching.

1.)    Boolean logic
Search engines typically use a form with a search box into which one types the search query.  To control the search results, use Boolean logic by typing AND or OR.  Many search engines including Google default to AND when processing search queries with two or more words.  To exclude words, use NOT (java NOT coffee, java -coffee).  For increased relevance, use NEAR (restaurants NEAR midtown Manhattan).

2.)    Options
Most search engines include options (on Google, these are found by clicking on Advanced Search).  Use options to narrow down the field you are searching.  Examples include file format (.ppt, .doc, .pdf, etc.) or Web site (basex.com).

3.)    Search tools
When it comes to search, one size does not fit all.  Use a variety of search tools beyond Google.  Try search visualization tools such as Cluuz and KartOO on the Web and KVisu for behind the firewall.

4.)    Meta search engines
A meta search engine runs several searches simultaneously.  Tools that may be helpful include Clusty and Dogpile.

5.)    Archived (out-of-date) materials or nonexistent Web sites
The Wayback Machine on the Internet Archive is useful for both older versions of Web pages and sites that have disappeared over time.

Jonathan B. Spira is CEO and Chief Analyst at Basex.

In the briefing room: Simplexo

Thursday, November 19th, 2009 by Cody Burke

Knowledge workers spend a good part of their day in search of information; therefore it is no surprise that having the correct search tools is of paramount importance.

Simplexo search simultaneously addresses structured and unstructured data.

Simplexo search simultaneously addresses structured and unstructured data.

The limitations that exist in many search tools, combined with poor search techniques, lead to the frequent use of outdated information, the recreation of content that exists but can not be found, and the waste of significant amounts of time.

Failed searches are a very visible symptom of Information Overload.  It is generally acknowledged that 50% of searches fail outright but few realize that 50% of the searches that people believe to have succeeded actually failed too, in that they presented stale, incorrect, or simply second-best information.  This last figure is far more insidious because the knowledge workers are unaware of the searches’ failure and blithely proceed to use the incorrect information in their work.

Training and proper search techniques can make a huge difference in improving search results but equally important are the tools the knowledge worker uses.  In most cases, information is stored in separate silos, and search tools need to be able to reach across those boundaries.  A search query that reaches across multiple information stores at once provides far more complete and relevant results than multiple separate searches.

Simplexo is one company that is addressing these issues with its Simplexo Enterprise search offering.  The company accesses the native indexing capabilities of databases and existing software such as SharePoint and Outlook to index unstructured data.  Indexing takes place in real-time and runs continuously, which helps ensure that the most current information is presented in search results.  Data is de-duplicated to remove extra copies of information, reducing the overall amount of content that must be indexed.

Simplexo uses a dual index approach that looks at both structured data in real-time and unstructured data during processor idle time.  The system examines and retrieves unstructured data from sources such as Web pages, e-mail, text files, PDF files, and Open Office files, as well as data from structured sources such as databases, business applications, CRM applications, and information portals.  The wide net that Simplexo casts when searching has the potential to improve knowledge worker efficiency. The ability to look in multiple silos and repositories with a single search query is extremely helpful in ensuring that information that is buried in a far flung repository or inbox folder is taken into consideration.

In addition to indexing, Simplexo supports native integration into platforms such as browsers, Office, Outlook, AutoCAD, and Lotus.  This integration is key to enabling the knowledge worker to remain in one work environment.  Simplexo also supports mobile devices via Simplexo Mobile for iPhones and Windows Mobile 6 devices, with plans to add BlackBerry support in the future.

Enterprise search solutions such as Simplexo take a realistic view of the challenges faced by knowledge workers when searching for information.  They recognize that relevant information can reside in any repository, be structured or unstructured, and is in need of continuous indexing to remain up-to-date.

Cody Burke is a senior analyst at Basex.

Information Overload – It Isn’t Just Too Much E-mail

Thursday, August 20th, 2009 by Jonathan Spira

One might assume that pinpointing the sources of Information Overload is relatively black and white, i.e. it’s just too much e-mail. In reality, nothing could be farther from the truth.

The problem of Information Overload is multifaceted and impacts each and every organization whether top executives and managers are aware of it or not.  In addition to e-mail, Information Overload stems from the proliferation of content, growing use of social networking tools, unnecessary interruptions in the workplace, failed searches, new technologies that compete for the worker’s attention, and improved and ubiquitous connectivity (making workers available anytime regardless of their location).  Information Overload is harmful to employees in a variety of ways as it lowers comprehension and concentration levels and adversely impacts work-life balance.  Since almost no one is immune from the effects of this problem, when one looks at it from an organizational point-of-view, hundreds of thousands of hours are lost at a typical organization, representing as much as 25% of the work day.

So what else besides e-mail overload is at issue here?  Here’s a quick rundown.

- Content
We have created billions of pictures, documents, videos, podcasts, blog posts, and tweets, yet if these remain unmanaged it will be impossible for anyone to make sense out of any of this content because we have no mechanism to separate the important from the mundane.  Going forward, we face a monumental paradox.  On the one hand, we have to ensure that what is important is somehow preserved.  If we don’t preserve it, we are doing a disservice to generations to come; they won’t be able to learn from our mistakes as well as from the great breakthroughs and discoveries that have occurred.  On the other hand, we are creating so much information that may or may not be important, that we routinely keep everything.  If we continue along this path, which we will most certainly do, there is no question that we will require far superior filtering tools to manage that information.

- Social Networking
For better or worse, millions of people use a variety of social networking tools to inform their friends – and the world at large – about their activities, thoughts, and observations, ranging down to the mundane and the absurd.  Not only are people busily engaged in creating such content but each individual’s output may ultimately be received by dozens if not thousands of friends, acquaintances, or curious bystanders.  Just do the math.

- Interruptions
We’ve covered this topic many times (http://www.basexblog.com/?s=unnecessary+interruptions) but our prime target is unnecessary interruptions and the recovery time (the time it takes the worker to get back to where he was) each interruption causes, typically 10-20 times the duration of the interruption itself.  It only takes a few such interruptions for a knowledge worker to lose an hour of his day.

- Searches
50% of all searches fail and we know about the failure.  What isn’t generally recognized is something that comes out of our research, namely that 50% of the searches you think succeeded failed, but the person doing the search didn’t realize it.  As a result, that person uses information that is perhaps out of date or incorrect or just not the right data.  This has a cascading effect that further propagates the incorrect information.

- New technologies
We crave shiny new technology toys, those devices that beep and flash for our attention, as well as shiny new software.  Each noise they emit takes us away from other work and propels us further down Distraction Road.  It’s a wonder we get any work done at all.  Even tools that have become part of the knowledge workers’ standard toolkit can be misused.  Examples here include e-mail (overuse of the reply-to-all function, gratuitous thank you notes, etc.) and instant messaging (sending an instant message to someone to see if he has received an e-mail).

Jonathan B. Spira is CEO and Chief Analyst at Basex.

In the briefing room: Yakabod’s Yakabox

Wednesday, August 12th, 2009 by Cody Burke

When one strips away all the marketing hype, technical terminology, and buzzwords from knowledge sharing and collaboration products, the real measure of a tool is simple: does it help get work done?  The future of the knowledge workers’ workspace is the Collaborative Business Environment (CBE) but, until our vision is addressed and realized by vendors in this space, it is incumbent upon companies to find tools that support the CBE’s basic principles, namely to provide a single work environment for knowledge workers, reduce friction in knowledge sharing, and embed community into the workspace.

It is easy to lose sight of the fundamental question an organization should be asking when deploying a knowledge sharing and collaboration tool, that is: “how will this tool help my company get work done?”  This often happens because products and tools are segmented into arbitrary and confusing market segments (just look at the variation in TLAs in the content management market, you have CM, ECM, WCM, DM, among others).

A breath of fresh air in this space is Yakabod; the company offers a product, the Yakabox, that promises to be an end-to-end platform that gets work done.  This offering is a hardware appliance incorporating enterprise search, content management, collaboration, and social networking functionality.  A hosted version is also available.  Yakabod’s value proposition is to keep things simple by placing those four applications in one place, aiding in knowledge sharing, collaboration, and the ability to find what one is looking for.

The user interface is very clean and straightforward, and features an activity feed-like stream of items that are relevant to the user, as well as user profiles and favorites that are content-based, such as documents, teams, blogs, or any other item in the system.  What is presented in the activity feed can be fine tuned via a “Matter Meter”, which can be adjusted to show items of varying degrees of importance.  A busy knowledge worker, for example, could set the meter to only show items of high priority.  Yakabod’s enterprise search works in a similar way: the system learns a user’s preferences and adjusts search results accordingly based on relevance to the user.  The results are drawn from structured and unstructured data sources, including online repositories, wikis, social tools, and existing legacy systems.

To make deployment easier, the Yakabox integrates with existing sources such as Microsoft SharePoint and Office, shared drives, and electronic repositories.

Security is a strong point for the Yakabox.  The company has its roots in providing collaboration and knowledge sharing tools to the U.S. Intelligence Community, and the Yakabox meets Department of Defense PL3 security standards.

One promising aspect about Yakabod’s philosophy as a company is the recognition that knowledge sharing and collaboration applications such as enterprise search, content management, collaboration, and social networking are interconnected and interdependent.  Put simply, when these normally disparate elements are combined, the sum is greater than the parts.  The Yakabox may be in some respects closer to the Collaborative Business Environment than many other offerings currently on the market: it provides a single, overarching environment for knowledge workers, reduces friction in knowledge sharing through tight integration, and embeds collaboration tools into all areas of knowledge work via social networking functionality.

Cody Burke is a senior analyst at Basex.

In the briefing room: SAS Content Categorization

Thursday, July 30th, 2009 by Jonathan Spira

Looking for something?  If it’s enterprise content, you probably won’t find it.

Locating content and information in the enterprise is a considerable challenge, one that not only hampers organizational productivity but also throttles individual knowledge worker efficiency and effectiveness.  Workers typically use search tools to find content and this is where their struggle begins.

There are two key problems with search technology today: 1.) such systems provide “results,” not answers, and 2.) they do not support natural language queries.  In addition, typical search tools do not always understand relationships and context: Java could refer to a type of coffee, an island in Indonesia, or a programming language.  Typing “Java” into the Google search engine returned results only relating to Java as a programming language for the first three pages.

Thanks to the various flaws common to most search tools, 50% of all searches fail.  The good news is that those failures are obvious and recognized by the person doing the search.  The bad news is that 50% of the searches people believe succeeded actually failed in some way, but this was not readily apparent to the person doing the search.  As a result, that person uses information that may be out of date, not the best response for what he was looking for, or is simply incorrect.  (We call this the 50/50 Rule of Search.)

The problems with search contribute greatly to the problems of Information Overload in the enterprise.

According to research conducted by Basex in 2006 and 2007, knowledge workers spend 15% of the work day searching for content.  This figure is far higher than it needs to be, and represents the time knowledge workers waste as a result of poor search tools, bad search techniques on the part of knowledge workers, and a lack of effective taxonomies.

In an age of Information Overload, where we create more content in a day than the entire population of the planet could consume in a month, more effective tools are needed.  One approach towards improving search is better and more effective categorization.  We recently had a look at SAS Content Categorization, one promising product in this space.  Content Categorization helps to categorize information so that search engines can present relevant results faster by having the user navigate through topics/facets related to the user’s query.

SAS acquired Teragram, a natural language processing and advanced linguistic technology company, in March 2008.  After integrating Teragram as a division, SAS launched Content Categorization in February 2009.

The offering enables the creation of taxonomies and category rules to parse and analyze content and create metadata that can trigger business processes.  Taxonomies and category rules are created via the TK240, a desktop tool for administration and taxonomy management that is a component of SAS Content Categorization.  Once a taxonomy is created, high level categories are selected, followed by narrower ones.  There is no limit as to how granular the categories can go, allowing for users to drill down on topics.  The system also includes prebuilt taxonomies for specific industries such as news organizations, publishers, and libraries.

Whoever is doing the setup – and SAS Content Categorization is designed for use by non-technical users – can develop category rules from within the TK240 as well.  The rules may consist of multiple keywords, based on the percentage appearing in a document, as well as weighted keywords that give more value to certain words than others.  Additionally, it is possible to apply Boolean operators, so, for example, to meet the rule Java and programming must appear in the same sentence, while Java and coffee appearing in the same sentence would not meet the rule.  Rules can be created for extremely specific situations, such as the presence of URLs, grammatical instances, or the presence of suffixes (Inc., Corp., AG., etc.).

The system is also equipped with options for setting role-based permissions to allow users to read/write, and enable multiple users to collaborate on developing taxonomies.  This allows multiple taxonomists to have secure access to projects, with individual levels of read/write access to category rules and concept definitions.

SAS Content Categorization can be an effective weapon against Information Overload by allowing the creation of complex automated systems to categorize content, increasing the likelihood of the knowledge workers being able to find what they are looking for in a timely manner.  In addition, increasing the relevance of search results by using taxonomies to provide context raises the value of content that is found, decreasing the likelihood of knowledge workers moving forward with second-best or faulty information.

Companies looking to take decisive measure to lower Information Overload should carefully review their current search tools and, where appropriate, give serious consideration to SAS Content Categorization.

Jonathan B. Spira is CEO and Chief Analyst at Basex.
Cody Burke is a senior analyst at Basex.

Google Breaks Microsoft Search?

Wednesday, June 17th, 2009 by David Goldes

The fun never ends in the Microsoft v. Google wars. Chris Vander Mey, Goole’s senior product manager for Google Apps, acknowledged that certain programs, such as Windows Desktop Search, that work directly with the outlook data file “don’t currently work well” with Google Apps Sync for Microsoft Outlook.

According to a  post by Vander Mey on Google’s enterprise blog, Windows Desktop Search will not properly index Google Apps Sync data files. In order to prevent indexing from running indefinitely, the Google Apps Sync installer disables it.

Microsoft’s Outlook product manager, Dev Balasubramanian, writing in an official Microsoft blog, said Apps Sync includes a “serious bug/flaw” which disables Outlook’s ability to search data such as e-mail and contacts.   He also provided a fix which involves editing the Windows registry, something many users may not wish to do.

Balasubramanian further stated that the problem impacts Outlook’s search capabilities, not just Windows Desktop Search because Outlook search “relies upon the indexing performed by Windows Desktop Search.”  Google contends that Outlook search will work even if Windows Desktop Search does not.

David M. Goldes is president and senior analyst at Basex.

In the Briefing Room: BA-Insight Longitude

Thursday, June 11th, 2009 by Cody Burke

Without question, search is the Achilles heel of knowledge work.  It is almost universally acknowledged that 50% of all searches fail.  The dirty little secret in search – and one that we uncovered through research we conducted in 2007 – is that 50% of the searches that knowledge workers believe to have been successful also fail in some manner (i.e. outdated information, second-best information, or content that is just outright incorrect).

Obviously, failed search is a major issue and large contributor to information overload.  Part of the problem is not the search technology per se, but the selection of the source that provides the results.  If a search only looks through unstructured data, it ignores the valuable information that exists as structured data.  Search tools need to look at all information sources in order to return not only complete results, but to rank results for disparate data sources accordingly.

For reasons that are inexplicable to this writer, many companies have not chosen to deploy search tools that examine every nook and cranny of a company’s information assets.  A few smart companies are deploying search tools that do look in every knowledge repository.  Exercising due diligence in searching can avoid failures that result from searching in a partial source set.

BA-Insight is a company that is attempting to even the odds through Longitude, its search enhancement for Microsoft SharePoint and Microsoft Search Server, as well as connectors to ERM, CRM, and EMC platforms.  The premise is simple; by expanding the sources through which a search is conducted, as well as improving the user interface, search results will have more value, be found in less time, and be easier to utilize once found.

Longitude search product enhances SharePoint Server and Microsoft Search Server by presenting results in page previews, eliminating the need to download the document.  The preview is presented in a split screen, with the search results on one half, and the preview panel on the other.  When a document is selected, the preview opens to the relevant page, not just the beginning of the document.  This saves time in two ways; one, the time it would take to open what might be an undesirable document, and two, the time it would take to find the relevant text in the document by scrolling though it manually.  Longitude also supports collaborative work by making functionality such as e-mailing documents, editing, and adding tags and bookmarks, available from within the preview panel.

Longitude supports federated search through multiple repositories of both unstructured and structured data via connectors for Lotus Notes, Documentum, Exchange, Microsoft Dynamics DRM, and Symantec Enterprise Vault among others.  Content is assigned to metadata automatically as users search and find content, and search is guided through Parametric Navigation that takes the metadata into account to search using complex queries.

Knowledge workers spend on average 15% of the day searching.  We know that 75% of those searches fail when we account for the two types of failure previously mentioned.  Clearly the odds of finding what one is looking for are against the searcher.  Most tools in a company don’t search in enough places, and because of technology sprawl, knowledge workers are just as likely to have stored critical information in a vat that is not touched by the search system as one that is.  Tools such as Longitude go a long way towards evening the odds for the knowledge worker.

Cody Burke is a senior analyst at Basex.

Wolfram Alpha – Better Search?

Friday, May 15th, 2009 by David Goldes

Wolfram Research announced the launch of the Wolfram Alpha computation engine.  The new tool is intended to provide specific and precise factual answers to questions rather than present a list of Web sites which may or may not contain the correct answer.  A key problem in search technology today is that such systems provide “results” instead of answers.  50% of all searches fail as a result; we have found that 50% of searches people believe succeeded also failed in some other way, by not providing the most up-to-date, accurate, or correct information although the individual conducting the search believes the answer is correct at the time.

Back in March, founder Stephen Wolfram wrote on his Web site that, “[F]ifty years ago, when computers were young, people assumed that they’d quickly be able to handle all these kinds of things … and that one would be able to ask a computer any factual question and have it compute the answer.”  We all know that’s not how things turned out simply by going to Google, currently the most popular online search tool, and entering a search query.

Wolfram Alpha, according to Wolfram himself, understands questions that users input and then calculates answers based on its extensive mathematical and scientific engine.

The system is scheduled to go live later today at www.wolframalpha.com.  We’ll find out then whether Wolfram has found a way to build a better search tool.

David M. Goldes is the president of Basex.


google