» Archive for the 'Information Management' Category

Information, Information, Everywhere… But Not A Lot Of Good It Does

Thursday, September 29th, 2011 by Jonathan Spira

But where is the RIGHT information?

This essay is being written after numerous and somewhat frustrating encounters with the latest information technology.  One would think that we’ve reached a point where systems and computers should work flawlessly but that is less and less the case every day.

On the one hand, the Information Revolution of the late 20th century has resulted in an anywhere, anytime information society that has become accustomed to boundless gobs of information on demand.

On the other hand, no one has said that the stuff works.

From a technical standpoint, the advent of true ubiquitous computing (or at least, state-of-the-art ca. 2011) has markedly changed our attitude towards and interactions with information.  Our constant exposure to information leads us to have the expectation that it will be shared across systems, accurately and quickly.  If Facebook and Google can keep track of everything we are reading, sharing, and writing while we surf the Web, surely everyone else can too, right?

Unfortunately, information does not always get to where it needs to be.  I’ll use my recent experience with an airline as an example.   Airlines are known to be leaders in IT; American Airlines introduced the first ever computer reservation system, Sabre, in 1960.  At the time it was one of the largest and most successful mainframe deployments ever.

Today, despite tremendous advances in technology over the course of 50 years, information often fails us.  Calls to customer service representatives at call centers asking the same or similar questions yield widely disparate answers, despite the fact that the agent is being guided by the system.

My own experiences in the past week relating to several different issues with an airline, including an error that was apparently computer generated as well as misinformation that was repeated by several agents almost verbatim, show me that we have a long way to go.

It won’t surprise you to learn that fixing these problems took multiple phone calls and e-mail messages and wasted hours of time both on my part and on the part of the call center agents.

We used to say that computers don’t make mistakes, but rather that the people who write the programs do.  I believe that this belief has become somewhat quaint if not obsolete.  While we are far from enjoying true artificial intelligence where machines actually think and respond on their own, we are at a point where autonomic or self-healing systems do evolve on their own, and sometimes seem to add in mistakes just to keep things interesting.

We want the right information on demand, without delay, without error.   As we add in more information, more systems, and more ways of getting information, what we end up with is something very different.


Jonathan B. Spira is CEO and Chief Analyst at Basex and author of Overload! How Too Much Information Is Hazardous To Your Organization.

Cloud Burst – Dark Skies Ahead for Cloud Computing?

Wednesday, April 27th, 2011 by Jonathan Spira

Last week, Amazon’s side business of selling online computing resources suffered a major failure and businesses of all sizes were significantly impacted.

Time to come in from the storm?

In some cases, a company’s internal systems, as well as its Web sites, went down for several days and even as this is written, some sites were still being affected.

Amazon, which entered the cloud computing business five years ago, has been a leader in a field that has become popular as more and more organizations look to move computing from their own data centers onto the Internet.

With cloud computing companies are essentially purchasing raw computing power and storage.  They do not need to invest in computers or operating systems; that part is handled by Amazon and its competitors.

Two major trends have evolved in this effort.  One is a utility computing model à la Amazon; the other looks to sell companies cloud technology that the company owns and manages (the so-called private cloud).

In recent years, more and more data storage and processing have been off-loaded into the cloud without the appropriate precautions being taken to ensure data accessibility and redundancy.

Amazon has data centers in North America, Europe, and Asia, and the problems seem to have centered around a major data center in Northern Virginia.  As of last Thursday, dozens of companies ranging from Quora, a question-and-answer site, to Foursquare, a social networking site, reported downed Web sites, service interruptions, and an inability to access data stored in Amazon’s cloud.

Some companies using Amazon’s servers were unaffected because they designed their systems to leverage Amazon’s redundant cloud architecture, which means that a malfunction in one data center would not render a system or Web site inaccessible.  Less sophisticated companies typically have neither the budget nor the know-how to do this and therefore paid the price in the form of downed systems and sites.

As of Saturday, April 23, (effectively, day 3 of the outage), Amazon’s status page continued to show problems in the Northern Virginia data center including “Instance connectivity, latency and error rates” in the Amazon Elastic Compute Cloud and “Database instance connectivity and latency issues” in the Amazon Relational Database service.

Visitors to the Web site of BigDoor, a software company, saw the following message on Saturday: “We’re still experiencing issues due to the current AWS outage.  Our publisher account site and API are recovering now, but apparently AWS thinks our corporate site is too awesome for you to see right now.”

At 4:09 EDT on the 25th, Amazon posted the following: “We have completed our remaining recovery efforts and though we’ve recovered nearly all of the stuck volumes, we’ve determined that a small number of volumes (0.07% of the volumes in our US-East Region) will not be fully recoverable. We’re in the process of contacting these customers.”

Even without such outages, moving your organization to a cloud computing-based architecture still entails risks.  Unlike typical scenarios where in-house IT staff control access to your sensitive information, a cloud computing provider may, by design or otherwise, allow privileged users access to this information.  Data is typically not segregated but stored alongside data from other companies.  In addition, a cloud computing provider could conceivably go out of business, suffer an outage, or be taken over, and all of these could impact data accessibility.

Amazon’s outage serves to reinforce that cloud computing is not immune from risk and is not the magic bullet that companies that offer the service would like their customers to believe.  Rather, it is in many respects no different than any other distributed system and such systems are nothing new.  Indeed, distributed systems have been around for decades and IT professionals should have learnt enough about fault tolerance and security by now.

Backing up and replicating data and applications across multiple sites should be old hand by now, but the Amazon incident proves this not to be the case.  Until we begin to think about cloud computing as being no different than any other type of computing, we will continue to experience major systems failures such as Amazon’s.

Jonathan B. Spira is CEO and Chief Analyst at Basex.

Man v. Machine: Who Analyzes Information Faster?

Thursday, January 20th, 2011 by Cody Burke

Who do YOU think will win?

That information functions as currency is no surprise to anyone in this day and age.  What may be surprising are the lengths that people go to collect the raw data, turn it into actionable nuggets of information, and disseminate it.

Two recent articles in the New York Times call attention to the way that information has become meaningful and critical in both the business world and the political world.

Both articles, “Computers That Trade on the News” on December 22, 2010 and “Where News Is Power, a Fight to Be Well-Armed,” on January 16, 2011, detail the extent to which information is valued in politics and trading.

The subjects of these articles are also a study in contrasts.  As junior political aides parse the news and pull out the most useful and relevant insights and ammunition for the day’s politicking, the computers of Wall Street are humming away, using advanced algorithms and linguistic-based software to extract meaning from the wealth of unstructured data available online.

Politics has always been a bit old fashioned, but the divide between how Wall Street and Washington are utilizing information has never been starker.  In Washington, the day for junior aides and staffers begins early in the morning as they read the news, monitor blogs and social networks, and condense their findings into memos for their superiors.  These information gatherers put in a few hours of data collection and selection, then report to an office for the rest of their day that can stretch well into the evening.

The staffers are valued for their ability to ferret out tidbits of information online that have value.  Describing a media monitor under his supervision, Dan Pfeiffer, White House communications director, stated that, “For such a young guy, Andrew has a great ability to sniff out stories that need to be handled with dispatch. During our biggest fights, from health care to the Supreme Court confirmations, Andrew repeatedly spotted potential problems in the farthest reaches of the Internet before anyone else. That information was essential to our success.”

On Wall Street, a very different picture is playing out.  Traders are using software tools that analyze the massive amount of online unstructured data to extract sentiment around companies, analyze it, and then trade on it, often without human assistance.  The software analyzes words, sentence structure, and emoticons from blogs, company Web sites, editorials, news articles, and social software tools such as Twitter.  If the software detects anything that reflects positively or negatively on a company or a section of the market, then it can trigger automated trading.
According to Aite Groupa, a financial services consulting company, around 35% of quantitative trading firms are currently exploring the use of unstructured data in automated trading.  The trend is not likely to stop there, given the competitive advantage that even a few seconds can give a firm in high frequency trading.

Could Andrew Bates, our talented political aide, sniff out information faster and more accurately than the algorithms of Wall Street?  Short of a face-to-face showdown, it is hard to say.  One thing is for sure though; he would probably appreciate the extra hour of sleep.

Cody Burke is a senior analyst at Basex.

The Note Taker’s Dilemma

Thursday, December 9th, 2010 by Cody Burke

Mankind has been taking notes using paper and pencil since paper and pencil were invented.  We do it to capture ephemeral information that would otherwise escape us, and to aid our fallible human brains, which often do not recall things clearly or at the precise moment we need them to.

Note taking on paper is easy, but limited in features.  There is no Google-style search, no way to ensure that notes are automatically stored in a filing system, and no way to instantly make exact copies of a note or piece of information.

Two solutions on the market today offer similar approaches to the problem of capturing and storing ephemeral information.  OneNote 2010, part of Microsoft’s Office 2010 suite, is the latest iteration of the company’s solution that first debuted in 2003.  Evernote, which launched in 2008, offers both free and paid versions of its software, which provide a similar feature set.   On a basic level, both solutions allow users to use a desktop client, a browser-based application, and mobile applications for smartphones to take notes, store links and documents, digitize hand written notes, upload pictures and make them text searchable, record audio, share notebooks with other users, synchronize notes and notebooks between devices, and search through stored information.

OneNote is tightly linked to other Microsoft offerings such as Outlook and Word and, just as those offerings, features advanced editing capabilities, templates, and more structured file organization.  However, when it comes to mobile phone support, OneNote is limited to Windows Phone 7, Microsoft’s recently launched mobile platform.  It is possible to access the OneNote Web App via a smartphone browser, but that isn’t an optimum solution, as the user loses what I consider to be the best feature of the mobile application, which is the ability to take text searchable pictures of notes or business cards from a smartphone.

Evernote, on the other hand, lacks the tight integration with office productivity tools such as Microsoft Office and doesn’t offer the advanced editing tools and templates that a power user would typically require.  It is, however, available on multiple platforms including Apple iOS, Palm webOS, Android, BlackBerry OS, and Windows Mobile. (There is, however, no Evernote for Windows Phone 7 mobile application at this time.)  A key feature is Evernote’s Trunk, an online application and hardware store that allows download and purchase of applications and hardware that integrate with Evernote.
The dilemma for the knowledge worker then is integration with everyday office tools versus greater platform support.  For the Microsoft Office-centric knowledge worker, OneNote has a clear advantage, but for the user that depends largely on his smartphone and who probably doesn’t have a Windows Phone 7 device, Evernote is the more attractive option.

Cody Burke is a senior analyst at Basex.


Thursday, December 2nd, 2010 by Cody Burke

Where exactly do I click "accept"?

Information Overload rears its ugly head in many respects, including in relatively mundane matters that are part of daily life.  Take banking, for example.  You agree to extensive terms and conditions (that you probably won’t read) before you can transfer from your checking account to your savings account.  Have you purchased something from eBay recently?  You are agreeing to their extensive rules and policies as well.

Google’s terms of service are only 20 sections but by using Google you agreed to the “use of your data in accordance with Google’s privacy policies”, which is a completely separate document with 35 different privacy practices that vary based on the Google product you are using.  Not only that, but the 35 policies may be updated from time to time “without any notice.”

Then there is the End User License Agreement, or EULA, which is required for everything from buying an e-book (more on this shortly) to using a hosted e-mail service such as Gmail.  These agreements operate on the principle of using information-to obfuscate.  A large amount of information is presented to the user that covers all legal issues inherent in a transaction, but is so lengthy, complex, and possibly even vague that the user simply gives up and clicks the “Accept terms and conditions” button or equivalent.

For the vast majority of cases, accepting the EULA causes no further problems, and the users will likely forget about the agreement they just entered into and move on with their busy lives.  The problem is that most people don’t read or think about the EULA, and what it may mean.

If you walk into a book store, or even buy a book online from Amazon, you exchange money for a physical object, in this case a book. You are then free to do whatever you want with the book, read it where ever you want, loan it to a friend, or resell it at a second hand bookstore.  You could even end up reselling it via Amazon’s own Web site.  However, if you go to Amazon.com and purchase an eBook for the Kindle e-reader, the transaction is significantly different.  If you dig far enough into the license agreement, you will find this: “Unless otherwise specified, Digital Content is licensed, not sold, to you by the Content Provider.”  This should come as no surprise as digital content, similar to software, is typically licensed as opposed to sold.

This is not, however, a distinction without a difference.  While the difference between licensing and purchasing an item may not seem important to most consumers, however, it does matter in our eBook transaction.  In accordance with the EULA, the Kindle book can only be viewed on the Kindle e-reader or on a supported device using Kindle software.  A Kindle book can not be resold.  A Kindle book can be loaned in a limited sense, but under a restrictive set of rules, namely that each book can only be lent once for a 14 day period and the content provider has the right to opt out of that service.

This distinction may seem trivial and the vast majority of book buyers and average consumers will not notice, or even care about it, that is until something happens that makes them realize they have agreed to a very restrictive EULA as opposed to having outright ownership.  The amount of information that is pushed at us on a daily basis obscures the important details that are contained in these agreements that we routinely enter into every day.

In the case of software downloads, EULAs can contain clauses allowing the company to do a variety of things, including (in at least one case) download invasive spyware to a user’s computer.  Despite these risks, the vast majority of EULAs go unread.

Indeed, to illustrate this point, PC Pitstop, a PC diagnostics company, inserted a clause into one of its own EULAs that promised a monetary reward if the user sent an e-mail to the address provided in the EULA.  Four months and 3,000 downloads later, one solitary individual finally wrote in and claimed his $1,000 prize.

Cody Burke is a senior analyst at Basex.

Information Overload Bots in the Market?

Thursday, August 12th, 2010 by Cody Burke

Warning: Information Overload bots at work?

Information Overload is often thought of as an annoyance and a productivity killer – but there are far more nefarious aspects to it, such as spreading disinformation that misleads competitors, which can in turn disrupt markets.

Alexis Madrigal, writing in the Atlantic, reported that Jeffrey Donovan, a software engineer at Nanex, a data services firm, uncovered the activity of trading bots in electronic stock exchanges that send thousands of orders a second. The orders have buy and sell prices that are not near to market prices, meaning they would not ever be part of a real trade. The activities of these bots are not noticeable unless viewed on an extremely small time scale, in this case just milliseconds.

Donovan noticed the strange activity when looking for causes of the May 2 “Flash Crash” in the Dow, when it plunged nearly 1,000 points over a few minutes. He managed to plot the activity, and found distinct patterns emerge. The patterns show that the bots are making these extremely quick and non-serious orders (meaning with no intention of buying or selling anything) at almost all times.

Donovan’s theory, although not universally supported, is that rival trading companies could use the bots to introduce noise into the market, and use the delay caused by the noise to gain a millisecond advantage in trading, which in high frequency trading, may be significant.

There are other theories to explain the activity, including that the bots are actually real trading algorithms being tested, that they are a financial radar that is probing the market, and even that they are an emergent AI of sorts. Ultimately, no one really seems to know what the bots are doing or whom they belong to.

Regardless of the motives of these mysterious bots, they raise the specter of using Information Overload as a weapon, whereby you distract your competitor with information noise.

It is possible that the introduction of these orders on a timescale small enough in order to avoid detection may have played a part in the May 2 crash, and may play a role in future crashes. However, Madrigal notes in his article that, although Donovan believes that this kind of bot activity may have been a factor on May 2, he also stresses that there were many other variables, making assigning blame next to impossible.

Can Information Overload be used as a disruptive tactic? The answer is, of course it can. As we introduce more and more information into our lives, the potential to spread misinformation and to game the systems that filter and manage that information grows exponentially.

Cody Burke is a senior analyst at Basex.

Information Creation: To What End?

Thursday, July 15th, 2010 by Jonathan Spira

It’s hard to avoid information. Not only do we live in a world full of it, making it nearly impossible to escape, but for some perverse reason, we actually like it.

Is it too much yet?

Indeed, we like it so much that we continuously create more of it and have even designed machines to do this for us as well.  In addition, we frequently compile information into metrics and ratios that describe other information.

A recent survey by a computer company showed that 90% of information was only looked at once after it was created.  The current Basex survey on how knowledge workers work already tells us that 50% of us spend one to two hours of our days creating information – and 15% spend more than three hours.  (If you haven’t already taken the survey, click here to do it now .)

Is this figure simply too high and are we in fact simply creating more information, not for its value but purely for the sake of making the pile bigger?

As we go about our day, it might be wise to cast a critical eye on our work that results in the creation of more information and ask ourselves some hard questions.  One, what is the practical purpose of the information that we are creating, and two, is it important enough to justify burdening others with it?

A quote generally attributed to Albert Einstein notes that “[N]ot everything that can be counted counts, and not everything that counts can be counted.”

Perhaps it would do us all good to think about why we are creating so much information, and whether perhaps we could get by with a bit less of it.

Jonathan B. Spira is CEO and Chief Analyst at Basex.

When Too Much Knowledge Becomes a Dangerous Thing

Thursday, June 17th, 2010 by Jonathan Spira

Socrates was relentless in his pursuit of knowledge and truth and this eventually led to his death   In The Apology, Plato writes that Socrates believed that the public discussion of important issues was necessary for a life to be of value.  “The unexamined life is not worth living.”

Danger, Professor Robinson?

In the olden days, before the Web, someone wishing to leak secret government documents would adopt a code name (think “Deep Throat” of the Watergate era) and covertly contact a journalist.  The reporter would then publish the information if, in the view of the reporter, editor, and publisher, it did not cross certain lines, such as placing the lives of covert CIA agents in danger.

Enter WikiLeaks.

WikiLeaks, founded in 2006, describes itself as “a multi-jurisdictional public service designed to protect whistleblowers, journalists and activists who have sensitive materials to communicate to the public.”

The site was founded to support “principled leaking of information.”  A classic example of an individual following this line or reasoning, namely that leaking classified information is necessary for the greater good, is that of Daniel Ellsberg, who leaked the Pentagon Papers, thereby exposing the U.S. government’s attempts to deceive the U.S. public about the Vietnam War.  The decision by the New York Times to publish the Pentagon Papers is credited with shortening the war and saving thousands of lives.

Time magazine wrote that WikiLeaks, located in Sweden, where laws protect anonymity, “… could become as important a journalistic tool as the Freedom of Information Act.”

On the other hand, the U.S. government considers WikiLeaks to be a potential threat to security.  In a document eventually published on the WikiLeaks site, the Army Counterintelligence Center wrote that WikiLeaks “represents a potential force protection, counterintelligence, operational security (OPSEC), and information security (INFOSEC) threat to the US Army.”  The document also states that “the identification, exposure, termination of employment, criminal prosecution, legal action against current or former insiders, leakers, or whistleblowers could potentially damage or destroy this center of gravity and deter others considering similar actions from using the WikiLeaks.org web site.”

Ten days ago, Wired magazine reported that U.S. officials had arrested Spc. Bradley Manning, a 22-year-old army intelligence analyst who reportedly leaked hundreds of thousands of classified documents and records as well as classified U.S. combat videos to WikiLeaks.

Although WikiLeaks confidentiality has never been breached, Manning reportedly bragged about his exploits, resulting in his apprehension.

According to Wired, Manning took credit for leaking the classified video of a helicopter air strike in Baghdad that also claimed the lives of several civilian bystanders.  The previously-referenced Army Counterintelligence Center document also reportedly came from Manning.

The case of Manning is perhaps the tip of the iceberg.  Several million people in the U.S. hold security clearances and, while their motives may vary from clear (e.g. trying to end a war as in the case of Ellsberg) to unclear (e.g. Manning), the genie is clearly out of the bottle.

Socrates, a social and moral critic, preferred dying for his beliefs rather than to recant them.  Indeed, Plato referred to Socrates as the “gadfly” of the state.  The motives of today’s leakers may not be as virtuous as Socrates’ but today’s technology virtually ensures that a secret may not remain a secret for very long.

Jonathan B. Spira is CEO and Chief Analyst at Basex.

The Document Jungle

Tuesday, February 16th, 2010 by Jonathan Spira

The world of the knowledge worker is document centric.  As a group, knowledge workers spend significant time creating, managing, reviewing, and editing documents.

doc mgmt paper mountain

Danger lurks in the document jungle

[For the purposes of this discussion, we define a document as written communication created using word processing software, a typical example of which is Microsoft Word.]

A recent Basex survey of 300 knowledge workers revealed (not surprisingly) that 95% of them create and review documents on a regular basis.

The prevalence of word processing tools and e-mail have made it easy, some would say too easy, to send documents anywhere and everywhere for input from colleagues, business partners, customers, and suppliers.

A mere twenty years ago, document review was very different.  Fewer documents were being generated overall so there were fewer to review.  The review process was paper based, documents were typically stored in file cabinets, and, since making corrections and revisions often meant retyping a document, people only made important corrections and tried to get it right the first time around.

Today, the typical knowledge worker creates one to two documents a day comprised of one to two pages each.  He also receives three to five documents that are between three to five pages long for review each week.

Why the disparity in size and quantity between documents created and documents received?  People who create longer documents also create more of them and are more likely to send them out for review.  In addition, 22% of documents are not sent to anyone for review and a similar number are sent to only one colleague.

What happens when a document comes back to its creator with these edits and comments is also interesting since most documents come back with multiple edits, changes, and comments.

Despite the tools available both within word processing software and externally, the typical knowledge worker uses a fairly inefficient process to review documents, 60% of knowledge workers say they e-mail the documents as attachments to several reviewers at once.  46% report that they then compare edits and comments manually once they have received them back from reviewers.

As a result, almost 40% of knowledge workers say they miss edits and comments in the documents they get back from review.  Fewer than half of the knowledge workers surveyed say they get documents back in a timely fashion.  Another 25% of knowledge workers say they intentionally leave people out of the review process for fear of slowing it down.

All of these inefficiencies come with a significant cost to the bottom line.  Errors in documents that are overlooked can result in lost sales and lower profits.  The multiple hours a typical knowledge worker spends each week trying to manage the review process could be put to far better use.

The future for document review and revision is far from dismal.  Software companies ranging from start-ups to industry giants are tackling the problem.  Nordic River, a version management company based in Sweden, offers TextFlow, a browser-based tool that generates marked-up review copies of a document based on changes and comments made in individual versions of a document.   Microsoft, in the forthcoming Office 2010 suite, will introduce Co-authoring, a set of tools that allows for multiple users to edit a document at the same time.

Jonathan B. Spira is the CEO and Chief Analyst at Basex.

Intelligence Gathering Meets Information Overload

Thursday, January 14th, 2010 by Cody Burke

In 2007, the Air Force collected ca. 24 years’ worth of continuous video from Predator and Reaper unmanned drones in the skies over Afghanistan and Iraq, a fact first reported by the New York Times in the last few days.

Shall we drone on?

Shall we drone on?

All video collected is watched live by a team of intelligence analysts, so this translates into ca. 24 years of analyst team time being used in one year.

The amount of data (and the amount of time spent watching the videos) will only grow as more advanced drones are deployed that can record video in ten (with future plans for up to 65) directions at once instead of the one direction that is currently supported.

The use of  UAVs (unmanned aerial vehicles) is not only an expanding phenomenon in the military but also domestically as the U.S. Customs and Border Protection agency and local police forces begin to use these tools.  The advantages are clear: pilots are not in danger, intelligence gathering capabilities are improved, and the ability to conduct strikes in remote areas is enhanced.

There are of course myriad issues that the use of UAVs for military operations present, ranging from humanitarian arguments that drone missile strikes are more likely to result in civilian casualties to political considerations about where they can operate, as seen in recent disagreements with Pakistan.

Complicating and contributing to these issues is the huge problem of how do deal with the flood of information that drones are returning to the analysts.

Mistakes such as falsely identifying threats can lead to unnecessary and potentially tragic civilian casualties, which could then inflame international public opinion and impair the ability for the military to operate effectively.  Likewise, missing a real threat because of Information Overload could also lead to fatalities.

The use of these tools will only increase, making it critical that we develop systems to organize, parse, tag, and act upon the data that is collected in an effective manner.  The Air Force in particular is working on this problem, but will have to move quickly to stay ahead of the mountain of incoming data.

Cody Burke is a senior analyst at Basex.