11. Darwin Conjectures

Darwin onjectures of the Darwin Ontology are depicted in the figure above. Established Knowledge K hosted in the Web space (black region) interacts with billions of semantic pieces of People’s Knowledge K’ connected to the Cyber space (green region). Most actual interaction proceeds an unidirectional broadcasting from K to K’.
K’ appears as chaotic meanwhile K appears as structured even though “flat”, in only one level. People are only enabled to query K. Data and Intelligence only flow one way, unidirectional from K to K’.
Under a new paradigm, enabled to use Internet at its full capacity, win-win K-K’ matchmaking scenarios could be attained optimizing the overall cognitive offer and the learning process accordingly. Throughout “e-membranes” (yellow) an AI algorithm agents driven may map K in as many levels as inherently exist. Once K is mapped its content is fully accessible, understood, and directly retrievable, ideally in only one click from K’, users and in its turn K’ becomes semantically coherent as seen from K side and a similar AI Algorithm may proceed now to map it. Once both sides are structured and mutually understood and known, information, intelligence, and even wisdom may flow freely from K to K’ and conversely from K’ to K. The teaching-learning paradox is alive in a sort of thermodynamic equilibrium.

It’s time of presenting Darwin Conjectures. You may see our old Conjectures set in our prototypes Website located at http://www.intag.org titled: Towards a New Approach to Knowledge Management - A set of Conjectures, Juan Chamero, CEO Intag, December 3rd 2003, that was updated on March 2006. Most of old conjectures probed to be valid. However some of them were more a matter of common sense but when building our second prototype mapping The Art in the World  new ones appeared as necessary. The new set is summarized as follows.

Conjecture 0. Knowledge tends to structure like a topological tree


This figure above depicts the upper levels of the Art in the World tree. This tree that has 7.570 nodes was recently (October 2007) unveiled by Darwin agents. We may write thousand of pages trying to justify the validity of this Conjecture but it would be like trying to justify why we humans are as we are. However Darwin algorithms always test this "trend" by calculating for each node the uniqueness of its ancestry and  of its practicall non existence of "forbiden" collateral interconnections.

Conjecture 1. General man-machine interaction could be imagined as a continuous dialog - and dynamic equilibrium - between two sides: the Established Knowledge side hosted in K- Realm versus the People’s Knowledge side hosted in K’ Realm.

This conjecture clearly prompts to our minds from elementary human behavior observations. The "natural intelligence" on K' Realm has a natural trend to broadcast it to K side as much as possible if permitted and enabled!. The  established "frozen" intelligence  (sorry perhaps the best we humans have but really a "fossil" between updates) hosted in K Realm is intentionally broadcasted to the K' side as much as possible meanwhile the people's intelligence brewing on K' side, trying to be known, and lastly assimilated!. Fortunately Internet is a formidable media that ENABLES this type of two ways communication. That’s the crucial difference with other media as newspapers, radio and TV.

Conjecture 2. Through the subtle interface between K and K’, relatively to each side inflow and outflow only two kinds of semantic particles: “Established Concepts” (from K to K’) and “People’s Concepts” (from K’ to K).

These particles are separated and somehow embedded within literary strings in K side and separated by communication instancesin side K’, necessaries to make dialog meaningful.

Concepts are semantic chains of type [s0, s1, s2,...,sn, k] where links s0 to sn are (n+1) subject names that along a given knowledge tree defines the concept internally called at n-level as keyword k. Sorry it is a little hard to follow, see an example now. When an Art Website send to the other side the keyword "Rigoletto" it presupposes that this word is interpreted as Rigoletto (k) -semantic head link- within the Opera genre (s4), within Theater (s3), within Performing Arts (s2), within The Arts (s1), and finally within Art (s0) as the root -semantic tail link- . So  in a communication between two machines the message would ["art" "the arts" "performing arts" "theatre" "opera" Rigoletto] instead!.

A human should understand that simple one word message Rigoletto makes reference to the well known opera.  Concerning K' concepts things are more complicated because subjectivity. The k "tails" of individual semantic chains are most times fuzzy and hard to explicit. An individual issues k as a reasoning or a piece ot it without informing to the other side to which subject this k may belong. 

Conjecture 3. Documents and messages, the elementary objects of Realms K and K’ are only constituted by two kinds of particles: “Common Words and Expressions” and “Concepts”.

This has been the first Darwin milestone. It’s a strong conjecture that prima facie strikes against our common sense beliefs we have about messages. Perhaps the clue rests on Information Theory: The less frequent a fact is and the less expected an event is the more information they carry. Initially we made fun by reading bibliographies where previously all that was considered "literary" was suppressed. Surprisingly meanings remains pretty much the same!. Of course our brains made some tricks by unconsciously filling holes. Then we discovered that "well written" documents split better in two complementary semantic classes than "bad written" ones.  To avoid subjectivity we were forced to define precisely what is the meaning of well written.  And we arrive to the conclusion that Web authorities were in fact WWD’s, Well Written Documents.

Conjecture 4. This digital dialog may also be imagined like performed trough e-membranes, resembling bio membranes with endoderm, mesoderm and ectoderm where inflow and outflow traffic of semantic particles and instances could be “seen” without perturbing neither K nor K` Realms actors.

Darwin took its network strategy of this Conjecture: Darwin stands for Distributed Agents to Retrieve the Web Intelligence as a Darwin network of e-membranes. It´s important to emphasizes this point. Darwin philosophy rests on the free behavior of K-K' actors. The traffic is never interrupted because inferences and conclusions emanate naturally from the traffic, freely and spontaneously and free. Darwin unveils the intelligence of collectives by making inferences about Main Behavior and Traits of Groups, never by spying people!. Darwin architecture is based on the belief of the universal validity of a sort of Uncertainty Principle in Communications: people and agents at both sides of a man-machine interface, consciously and unconsciously react against any type of third party intervention altering the validity of their messages from the moment of the intervention onwards.

5.  Documents in K side tend to discriminate in “disciplines” of the Established Human Knowledge.

It is not easy to determine the total amount of branches the Established Human Knowledge as_it_is has in the Web. The criterion used in our first prototype was a classic one with philosophy. mathematics, history, sciences, art, society, religion, and technology as the eight top levels that probed to be coherent but actually disproportioned in their openings, especially of science, technology and society. One example of this disproportion is in Art where new forms of  art appear by hundreds over sizing classic ones. Darwin agents have detected more than 160 disciplines by sampling billions of queries via conventional search engines. However the first two levels of the Human Knowledge are not easy to conform: at least we could not find a coherent correspondence between the approximate 160 existent disciplines and the suggested eight for the first level. This true ontology task is missing. Perhaps there exist two levels in between the Knowledge root and the 160 disciplines, let’s say something in the order of 1:8:40:160.

Conjecture 6. Subjects are those specific concepts associated to the nodes of their respective discipline trees as “paths” that arrive to them from their roots. Concepts are the same for all languages. For each node there exists one and only one subject.

6.1. Being the subject known appear new and somehow derived pseudo-concepts that belong to it: its keywords.

Subjects are slippery concepts hard to detect by agents. Darwin agents need some human guide to accomplish this task autonomously. Agents suggest lists of potential subjects names that accommodate best to collections of authorities. Each collection need to be associated to one and only one of the suggested names and in case of doubt is the human who decides. Each Collection points to a specific subject whose “best name” as_it_is in the Web should be defined.  What happens is that is frequent to find situations where the most popular names are falsely promoted due to deviations and/or bad and sometimes near to illegal behavior of some Website owners and administrators.

Conjecture 7. For each subject there exist at a given moment, within universal and huge reservoirs like the Web, a set of authorities dealing with it with a well defined authoritativeness.

Discipline and subjects authoritativeness influence agents setup, tuneup and training. Apart from WWD's for each discipline and  for some of its sub-trees we need to define templates that complement the WWD´s crietria concerning  documents authoritativeness.

Conjecture 8.  From authorities sets we have developed a sort of industrial process to extract their dominant keywords sets determining the following correspondence: authorities: subjects: keyword sets.

Discipline trees will have their nodes associated to their names, to their authorities and to their keyword sets, defining as a whole entity the Web Thesaurus.

Conjecture 9. A similar Thesaurus could be defined and unveiled in the K’ Realm as the People’s Thesaurus.

Similarly to subjects-K, authorities-K, keywords-K could be defined subjects-K’, authorities-K’ and keywords-K’.

Conjecture 10. Once K and K’ sides are known as_they_are in the Web actors on each side are enabled to know as much as possible of the other side.

This event will accelerate the human learning process. K and K’ could be considered as fully mapped and this mapping may be continuous and perfected along time.