20. Darwin Applications scope

By Juan Chamero





    Darwin algorithms may deal with a vast scope of Applications mainly those derived from IR, Information Retrieval and IIR, the same IR faced and empowered with AI, Artificial Intelligence tools. Content Generation is a leading application of today as it is supposed we are living in the Content Era. The actual state of the art of it is restricted to aids and facilitations of Content Generation performed by humans, like Specialized Editors and Compilers with access to Tutorials, Guides, Quotations, Cites, References, Similar, Main Thematic Trends, Thesauruses, Glossaries and Dictionaries.
    If you imagine a Control Panel where from you may access all these resources and you are an expert writer as well your intellectual production will be many times empowered, However in order to transfer this talent to agents it would be necessary to execute and save all possible sequences of actions and of their corresponding outcomes and all steps clearly understood and explained at its deepest detail. Believe me this is still a cyclopean task!. As an example of this talent steps and micro-steps please go back to see our analysis about K-side dissection where we present examples of querying Search Engines, one of the simplest tasks onto the above mentioned Control Panel. As we will see agents could perform any computable task as long it is perfectly and clearly understood and explained.
    Darwin may behave in some instances like an efficient Differential Data Mining implying that once you have unveiled the inherent and most times hidden intelligence of a huge data reservoir there is no more need of “mining”. Darwin agents only need to process the last shell of aggregated data!. Darwin processes by “de facto” evolve by themselves.


 
  The other types of applications are those that can be performed by anthropic algorithms where A, Agents perform their tasks slightly supervised and sometimes complemented by H, Humans. These algorithms are designed in a way that once agents may perform a given Human task efficiently they will replace humans. One Darwin example was keywords detection in our first prototype: It was initially performed by university students until the discrimination talent was defined as computable. The first 18,000 keywords were unveiled by students in three months meanwhile the remaining 36,000 were unveiled by agents and with a better quality in a couple of hours. You may see a Darwin prototype of e-procurement in our Website www.procurebot.com
    Let’s analyze Darwin scope within both realms of its ontology: The Established Knowledge Realm K and the People’s Realm K’.

Established Knowledge Realm
 
   We may find examples of this side in all type of data reservoirs either concentrated or distributed. Within the first type we have Catalogs and within the second the Web. However Search Engines provide databases to facilitate the search process. Darwin Ontology is well suited to work on any of these scenarios. Its acronym that stands for Distributed Agents to Retrieve the Web Intelligence tells us about its versatility.

Cataloging and Catalogs Harmonization
    Its first aim is to facilitate the semantic ordering of the established side building natural hierarchic indexes. Giving for instance a Catalog of Materials it may check its consistency and harmony, suggesting changes to improve both. Giving a list of materials needed Darwin agents may be committed to go to permitted databases and data reservoirs to extract those Catalog trees and sub trees that match needs in order to build feasible Catalog models.

Information Offer versus Demand Optimization
    Given a well defined information “offer” database it could be cloned in as many market front ends exist and study the different Offer versus Demand matchmaking interfaces in order to: a) have a better knowledge of the true demand and b) adjust its global and distributed offer accordingly!.

Potential Demand
 
   The potential demand could be also tracked in the absence of a particular offer. Darwin algorithm could be settled and adjusted to study demand without offer. For example in the Cyber space every day almost one thousand million users from all over the world are continuously trying to learn, to teach and to broadcast their ideas and opinions. We may tune up this info brewing to focus on a small window of the demand spectrum in a “hearing at” attitude. We may also then browse the whole Web space to account for the available offer to satisfy this potential demand. Non Satisfied Demand could be then next Darwin step.

SSSE
    The WWW, World Wide Web is a network of routers, servers, and databases created to satisfy the World Information Demand. Search Engines facilitate this complex task by concentrating in their databases meaningful summaries of all documents dispersed along the Web. Is supposed that there are about 20 billion documents located in this net and its number grows at a terrific pace. From this data asset only about 10 billion are actually “classified” by the most powerful search engines. This classification is rather primitive, mostly by “words”, with all documents sharing only one semantic level of thematic specificity. In order to facilitate and make more efficient the search process the trivial solution would be to classify documents thematically. Darwin algorithm may perform this task uprising the thematic diversity from “zero ground” to as much as thirteen levels. Darwin may use the same information search engines have but providing users with a sort of “semantic glasses” ® to “see” the Web better, as ordered in up to thirteen levels instead of a fuzzy and noisy zero ground. .

People's Knowledge Realm

People’s Behavior Patterns Inferences

 
   The simplest scenario is users trying to learn as much as possible from a given Information Offer as it is the case of people using search engines. In the near future search engines will make sound actual and potential Information Demand inferences as long as people interact with their services freely. With SSSE’s users will be encouraged/induced to question via concepts (semantic chains in our approach), and as concepts match specific subjects it is relatively easy to make probabilistic inferences about what users are trying to find out.
   Users’ will question via ”established concepts” or via their own “people concepts”. People concepts play a similar role to established concepts, and should match their corresponding people’ subjects. These subjects are strongly related to “people’ information needs”. So “people information’ needs” are in some extent equivalent to “established subjects”. People query established information sources trying to satisfy their information needs. These needs are also matter of analysis within Darwin Ontology that states as a strong Conjecture that people tend to specialize in Major Activities and presupposes that these major activities, similar to disciplines in the established side, structure themselves as trees. Querying chains as semantic chains tend to structure probabilistically onto these trees. Most frequent querying chains suggest significant activities perhaps a Major Activity. It is supposed that people that belong to a given Major Activity when querying search engines will use, in the long range, a similar kit of querying chains.
    Darwin algorithms are anthropic, that is some steps are necessarily performed by humans until the retrieving talent is precisely defined and then transferred to agents. For example Darwin algorithm that work onto K-side has about 80 steps being most of them (75) performed by agents. Darwin algorithms working in the next future work onto K’-side will initially have steps that necessarily must be performed by humans because it is a realm less known than K. .


Some Darwin byproducts



    Eventually a Darwin algorithm may be located as a smart interface of a SDI, Selective Distribution of Information system as in the figure above. The main outcomes, The Hot information is secure and selectively distributed in time and form over the Organization hierarchy creating by de facto an Organization Strategic Infoduct.

    Darwin byproducts are many and fruitful. Along the distilling process mentioned above as in the oil industry appear byproducts out coming from intermediate steps. For example to build the Art Map it was necessary to process about 500,000 documents, many of them authorities.  The pure textual sequences needed to unveil keywords is a corpus a huge and linguistically structured set of texts that could be used for linguistic statistical analysis, text of validity of linguistic rules, concordance analysis, etc.  (in English in this case). To give an idea of its importance BYE, the Corpus of American English: It has more than 360 million words in nearly 150,000 texts, including 20 million words each year from 1990-2007. The “Art Corpus” that could be delivered by Darwin may have about 1,000 million words corresponding to 500,000 texts.
    Another important byproduct would be K’-side versus K-side rotation. If in K-side we suppose hosted a meaningful sample of the actual Human Knowledge, in K’-side we may imagine the actual People Knowledge flowing and we may ask ourselves: how many times K’-side must “rotate” to change K-side content?. At large how to know more about the life cycles of both realms?. As a gym about it some speculations follows. 

     People brewing in K’: 500,000 daily in the average, with an average bandwidth of 100 KB each which gives us a flow of  5x10**10 Bytes per day. As we do not have any metric yet we may dare a figure that estimate the number of days to change K significantly, for example 100,000 days. We ignore how is the relative “volatility of the disciplines of the human knowledge, some extremely volatile like news, fashion, computing and some other extremely stable and inert to changes like religion and to take into account this inertia we have to estimate a redundancy factor, which would stand for cycling insistence to effectively generate a change. If we bid for a redundancy equal 10 we finally would need an accumulatde traffic of 5x10**16 Bytes in order to change K completely.
    Now let’s get a reasonable estimation for K-side size concerning this problem. Out of the whole Website documents we should only take into consideration authorities that we estimate in 1000,000,000 with 2,000 words each in the average that taking 5 characters per word and a byte per character gives us a Web volume size of 10**13 Bytes. Finally we are going to need 5,000 rotations of K’ over K in order to change K completely. Of course this calculation is only a speculation to test order of magnitudes because real rotation has to be measured.