This blog covers the practical techniques, trials and tribulations associated with the transformation of IT systems from legacy technologies to systems using SOA and modern open systems. It also includes the occasional interlude with rants about technology in general.
Friday, November 5, 2010
Data, Taxonomies, and the Road to Wisdom Revisited
While early computing was referred to as “Data Processing”, the term “Information Systems” became prevalent with the increased sophistication of functionality. This makes sense. After all, there has always been a platonic goal to have computers process information just as humans do, except much, much faster. As originally framed, this goal was known as AI (Artificial Intelligence) and, despite some early successes with heuristic algorithms and neural networks, AI research eventually reached major roadblocks. Ultimately, AI’s most touted commercial achievement was the codification of narrow domains of expertise under the guise of “Expert Systems”. Expert Systems went through a hyped-up phase back in the Eighties only to fade away with the realization that the logic needed to replicate how humans process and organize knowledge is dependent on contextual, subjective, and often un-expressible decision-making rules. In other words, we humans process knowledge in a manner that is often inaccurate, biased or even intuitive. Still, the subjectivity of our knowledge has served us well along our evolutionary path and is more than enough to help us deal with quotidian existential needs, even if this knowledge is not always precise. (Who cares if a tiger wasn’t actually hidden in the brush? Your ancestor taking off upon the rustling of leaves was only being sensible!) Recently, implementing “fuzzy” and flexible computer logic has yielded more effective AI applications, particularly to systems applying the Bayes Theorem, which relies on prior and conditional probability logic. Modern Machine Learning algorithms applying this and other algorithmic variations usually return reliable results to problems dealing with pattern and language recognition. However, given the probabilistic nature of the base algorithms, results are sometimes wrong. To err is not only human. Today’s computers can also err. Given that the Bayes rule and other algorithms provide results that are not always correct, we may well conclude that there is a universal law stating that intelligence implies fallibility. If we are ever going to rely on these systems for life-and-death scenarios, we will need to incorporate some form of control feedback in the way they reach their results. Perhaps in humans, “Wisdom” is that control. But how then do we attain Wisdom? By now, you may have noticed that I am using the term “Information” in its most generic sense. Information can often be “misinformation”. Yet, misinformation and even lack of information are also forms of information (a dog that failed to bark was the clue that helped Sherlock Holmes solve a crime). When implementing information systems it does help to classify the type of information we are dealing with. There is raw information, and there is wisdom-based information. This progression to wisdom involves a series of steps that must be methodically climbed: Data, Content, Knowledge, Understanding and ultimately Wisdom.
Data is primarily raw figures and “facts”; by nature it is voluminous and difficult to deal with and so is best stored and communicated in a mechanical way. Data can be wrong. The old GIGO adage (Garbage In/Garbage Out) captures what ought to be the highest priority in the automation of data: ensuring that the data inputted into the system is correct. Do not become confused by the term “Big Data”, by the way. Big Data actually refers to the Knowledge step on the ladder as it deals with the acquisition of knowledge via so-called Data Science analytics.
Contentis data that has been collated, ordered and classified. That is, Content is Data plus its Taxonomy. Taxonomy is the categorization or classification of entities within a domain (the actual structure of the domain is defined by its Ontology). Consider the following taxonomies used to describe the animal kingdom. Linnaeus Taxonomy:
Kingdom: Animals, Plants, Single Cells, etc.
Phylum: For Animals: Chordatas, Nematoda (worms), etc.
Class: mammals, amphibians, aves. . .
. . . et cetera
In "The Analytical Language of John Wilkins,“ Jorge Luis Borges, the famed Argentinean writer who belongs to the ontological set of writers who deserved to win the Nobel Prize but didn’t, describes 'a certain Chinese Encyclopedia,' the Celestial Emporium of Benevolent Knowledge, in which he lists this very unique taxonomy for animals, classifying them as follows:
those that belong to the Emperor
those that are trained
those included in the present classification
those that tremble as if they were mad
those drawn with a very fine camelhair brush
those that have just broken a flower vase
those that from a long way off look like flies
Knowledge is what is produced when the information is placed in context and the resultant significance of relationships within the data is realized. The addition of contextual information requires some element of human input; so the progression to this stage will most likely not be possible through the use of computers alone. To see the difference between Content and Knowledge, I suggest you try this exercise: Go to google.com and enter “IBM Apple”. You will get content listing all the sites in which IBM and Apple are discussed. Now, go to wolframalpha.com and enter, “IBM Apple”. You will get a digested and structured response comparing these two companies. The former is content; the latter is beginning to look a lot like knowledge. Production and discovery of knowledge is at the core of many start-ups business plans today. The emerging field of Data Science is leveraging big data sets to mine data in ways that produce knowledge. Organizations, such as Gallup or Nate Silver’s FiveThirtyEight, exist to mine data and content and produce knowledge on a variety of topics. Voting trends, consumer preferences, etc. are examples of mined knowledge. Business Intelligence, associated Data Mining technologies, and the more recent Internet-driven “Collective Intelligence” applications are examples of the more recent trends in the automation of knowledge acquisition. We are in the midst of moving from the Age of Content to the Age of Knowledge.
Understanding is interpreting the significance of relationships between two or more sets of knowledge and deriving prime causes and effects from these relationships. While Gallup may unearth the knowledge that 33% of voters are likely to vote for a particular candidate, understanding why they lean that way is something that information systems can only hint at. Understanding remains an endeavor only humans are adept at. No matter what you may hear from the “hypesters” (not to be confused with the hipsters!), understanding cannot yet be performed by computers. As much as it might appear to be the case, the Siri and Google Talk systems lack an understanding of your commands. “Understanding” is how consultants and advisors make a living. Companies such as Gartner or writers of popular science and “How To” books are in the business of providing distilled understanding. Of course, if you happen to watch regular Sunday morning political discussion programs showcasing pundits and politicians in topical debates, you know that the “understanding” you get from these guests often can be biased and even wrong. Enter wisdom . . .
Wisdom is the ability to choose between correct and faulty understanding. This is the famous feedback loop I referred to at the beginning of this article. The fact is, understanding can be the result of wrongly extracted knowledge, which may come from bad source data (outright misinformation), or content improperly formed with inappropriate taxonomies. For example, the taxonomy that classifies human beings according to race or some other categorization of “otherness” often leads to xenophobia, homophobia or racism. Wisdom represents the highest level of value in the information progression. Wisdom is not always objective or static. It can be subjective, and it is certainly dependent on the cultural environment or transitory circumstances. This is why it is unlikely that we will ever be able to codify “hard-coded” wisdom within computers, and why the belief that these future computers may act as judges in the affairs of men is dubious at best. Wisdom can be applied toward either material or spiritual benefits. Yes, Wisdom can be applied for profit and business advantage. However, just because something is applied with wisdom, does not dictate whether it is right or wrong. Beyond wisdom we enter into the realm of morality and philosophy. Even this last point is open to debate. Some have an “understanding” that moral-relativism is wrong, but some of us don’t think so. But I digress. . . Whether future software will be capable of Understanding (much less Wisdom), is open to debate. There is much we still do not know about how we humans think and about the nature of our cognitive processes. Humans mastered flying only after they stopped trying to replicate the way birds take to the air. Avionics accomplishes flying even better than birds by leveraging the underlying laws of nature; something birds do, only differently. This is why I believe that multi-million dollar projects such as the European “Human Brain” project and the American-sponsored Brain Activity Map Project (BRAIN), that try to map the neurons in the human brain not unlike the way the human genome was successfully sequenced, have the markings of being fools’ errands. Recycling an old saying: “It’s the software stupid”. If the much predicted Singularity is to happen, it will probably require computer systems that “think” very differently to the way we humans do. And that “thinking” will be software based. Even then, I cannot conceive of truly automated wisdom (aka “Strong AI”) without first solving the question of what is “consciousness”. We are a couple of Einsteins away from figuring that one out. But I digress again . . .Whether strong AI is feasible or whether the Singularity will occur are problems best left for the next generation. As you stand securely atop the Content stage, remember that nothing is stopping you from moving up the next step on the road to wisdom: the Knowledge stage. Time to dive more into that Data Science stuff!