"Every term is sacred, Every term unique. So we give them facets To find them when we seek."
A flight engineer calculates the amount of fuel his airplane will need to travel between Los Angeles and London. At 400 pounds per 200 miles he orders 4,000 pounds of fuel for the 2,000 mile one way trip. Somewhere over Winnipeg, Manitoba the Fuel Low Indicator starts to sound and Flight 465 to Gatwick radios in the bad news: wrong London.
There are 51 locations named 'Los Angeles' on the planet and 14 'Londons' including one in Kentucky, and while the scenario played out on the flight deck of the ill-fated flight to England is unlikely, it is similar to the one played out every day in search engines across the digital frontier.
The Flat World of Digital Assets
'To err is human; to really screw up you need a computer.' This pithy bit of wisdom began appearing in data processing shops back in the eighties when it became increasingly obvious that the hardware destined to change our lives was going to require a lot of attention. Things haven't changed much.
Fact: computers are stupid. Fast, but stupid. You would be fast, too if all you had to worry about were two dimensions. Yes or no, on or off, black or white, up or down, left or right... life would be pretty simple. Think of an endless game of 20 questions. In this two-dimensional or 'flat' digital geometry, the term: 'Los Angeles' is undifferentiated. So, when you ask a Google, Autonomy, Bing or FAST: Have you got a 'Los Angeles?' you get 'yes', but 234 million times - and that doesn't begin to count the 16 values of 'Los Angeles' in Mexico.
This flat 'digital geometry' shows up everywhere: in folder structures, naming conventions, data models, taxonomies, indexes and especially search. If you doubt this, try putting a file with the same name in the same folder; try splitting a single person into two jobs; or try searching for 'Los Angeles, Mexico' in Google. Humans mitigate this lack of depth by adding semantics to the mix - something we are collectively very good at. However, as the increasing cost of search proves every day, linguistic capability is expensive at the same time as the cost of doing nothing gets more and more prohibitive.
We are collectively arriving at a point where something has to give in the digital world. Luckily there are a number of historical precedents for overcoming the increasing digital entropy in which we find ourselves. To paraphrase Albert Einstein: The kind of thinking that got us out of other messes is exactly what we need to extricate ourselves from this one. In other words, we need to apply an existing perspective to this new problem.
A Short History of Big Ideas
In his remarkable book: The Fifth Language: Learning a Living in the Computer Age, Robert Logan traces the evolution of human communication through five distinct phases. He makes a compelling case for the idea that human speech, writing, mathematics, science and computing represent ever more efficient levels in the acquisition, processing and exchange of information.
Each evolutionary pattern of communication has three distinct phases:
1. The introduction of a new way of compressing the old patterns.
2. The adaptation and extension of the new pattern into new forms.
3. The over-use of the new pattern to the point where it becomes either perfectly balanced (low entropy) or chaotic (high entropy) in which case a new compression phase is needed.
Examples of perfectly balanced patterns of thought include the periodic table of elements, the coordinate system of navigation, and the Linnaean system for the classification of living things.
One of the characteristics of a good theory is its economy: with very few moving parts, a good theory takes less energy to communicate and can even predict new stuff that does not currently exist. Another is its usefulness: the acid test for the adoption of a new pattern of communication from a scientific view is that it 'works'. A new theory that explains more phenomena than the previous theory is more valuable. Finally a good framework instills confidence due to increased precision and accuracy.
On the flipside, as we have seen, the geometric foundation supporting the current landfill of digital assets is in serious need of help. The rest of this article outlines a new approach to data that ties information to the sacred discipline of geometry.
A Condensed Geometry for Linked Data
An alternative way of looking at the problems associated with search is one of identity. None of the search algorithms in use so far can differentiate identical points of data. There are ways to mitigate this using context and other statistical techniques, but the 'stupid' computer prevails. Put another way, it takes at least three additional points of related information to confidently identify a single piece of data. In a typical search scenario, these three pieces are supplied by the smart, albeit expensive, user.
In flat Euclidian geometry the 'dimensionality' of an entity is determined by how many coordinates it takes to locate a point on or in that thing. An analogous metric in identity management is how many attributes can be added to uniquely identify any entity. Figure 1 visually describes this.
Figure 1: Geometry of Data Dimensions
Starting from the left: a point has zero dimensions, which means that it identifies itself, so the string (point) 'Los Angeles' can exist all by itself. Like computers, without more information (coordinates) we can't differentiate one 'Los Angeles' from another, so there can only be one.
A line AB can place a single point P along its length, so is said to have one dimension. This is equivalent to having a line called 'USA' and placing 'Los Angeles' somewhere along it. Likewise, a line called 'Mexico' can have a single value called 'Los Angeles' on it because the inclusion of the line name makes it a unique occurrence. This is the same as placing two identically named (but different) 'ReadMe.doc' files in separate folders.
Similarly, a map or flat surface can use two coordinates as differentiators. Now we can have Los Angeles (USA, California), Los Angeles (Mexico, Oaxaca) and Los Angeles (Mexico, Baja) and so on.
This usually requires a table or an XML file or some other two-dimensional construct to keep track of all the instances.
Finally, a three-dimensional shape or solid can be static as a Domain, or dynamic as a Domain through time. Figure 1 shows a Domain as a cube, which of course implies that there are X, Y, and Z coordinates, consistent with the three pieces of data needed to uniquely identify all of the points.
Quantum Semantics
The new geometry of information takes the compact three-dimensional 'shape' and pushes it to the left on the existing continuum of dimensionality, giving each data 'quantum' at least three semantic coordinates with which to identify it. The flatland of data is replaced by a more realistic information landscape.
Figure 2: A Compressed Geometry of Information
Instead of starting at zero, all data points have a minimum of three intersects.
This new landscape opens up a number of new possibilities and techniques with respect to information management, but there is one more topic needed to round out our understanding.
In order to establish the semantics of each data point, the model constrains the choice of classes that can be used as facets. The 'cube' we are talking about is made up of six independent, or in data modeling lingo 'orthogonal', facets. Each facet is designed to answer one of six types of questions: Who, What, Where, Why, How and When. The model, called Q6, further divides these facets into a total of 19 segments. Each segment represents a class of natural language strings or tokens as outlined in Table 1:
Table 1: Q6 Facets
Who
What
Where
Why
How
When
Organization
Digital Asset
Absolute Location
Discipline
Activity
Cycle
Person
Logical Asset
Named Location
Role
Event
Point
Physical Asset
Relative Location
Use
Process
Span
Task
Status
The concept of faceted classification is important to the new geometry, because the first coordinate for any piece of information in a Q6 environment relates to which segment the value rests on. The second and third required coordinates intersect with any one of the remaining facets, forming at a minimum a cubic data form with at least three coordinates.
This technique of combining compression of old patterns and triangulation to precisely locate points in three-dimensional space has some strong historical precedents and should deliver a comparatively rich solution set for navigating digital space.
Coordinates and Classification
One of the important consequences of the Q6 approach is that all previously flat information can be given coordinates in much the same way as fixed assets use latitude, longitude and elevation to locate them on the globe.
Another logical extension of quantum semantics is that any three-dimensional object (or its semantically enriched two-dimensional representation) can act as a potential interface for the acquisition, transformation, storage of and navigation around information. Geo-locating content on maps is beginning to gain some traction, and with Q6 as a foundation there are many other possibilities for embedding digital assets in previously inaccessible places.
Another discipline that lends itself almost intuitively to the semantics of Q6 is content classification. Most records managers employ a functional classification taxonomy as a best practice. With the Q6 compression and quantum semantic techniques, each digital asset can add two or more facets so that any record can be accessed along multiple semantic lines. In a similar vein, previously flat metadata and keywords can be faceted through Q6 to create rich and locatable values in their own right.
The Impact for Information Management
Q6 and Quantum Semantics change the focus of information management - in all its forms - from complex applications to the fundamentals underlying all digital pursuits. In doing so, a new perspective begins to assert itself. Where before the landscape of business information appeared to be flat, made up of two-dimensional hierarchies or relational tables reflecting the way computers manage information, the new view is now much more representative of the way enterprises want to see themselves: as multi-dimensional and adaptive yet stable and predictable.
John O'Gorman bills himself as an information integration specialist and the inventor of the language- and technology-agnostic Q6 architecture for information management. He lives in the foothills of the Rocky Mountains thirty-five miles south and west of Calgary, Canada with his family, a cat named Tigger and a dog named Panda. He can be reached at jogorman@tiberon-ia.com
The FreePint Family is a family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success.
'FreePint... provides most of my professional development because it won't come through work and [other resources] just don't cut it.'
FUMSI Forum: Do you have a research question? Post it to the FUMSI Forum, where professionals share Q&A and useful tips on how to Find, Use, Manage and Share Information. It's free.