12. Art Map Prototype - I

By Juan Chamero

   In this series I’m going to describe in detail how an important discipline as Art in the World is mapped by Darwin at the highest possible resolution level by extracting disperse information and intelligence from the Web Ocean. As we have seen till here disciplines data and intrinsic intelligence could be mapped on (or adjusted to) topological trees. You may download a sample of its skeleton at Darwin proto-site . This semantic skeleton “filled” with its associated art data and intelligence constitutes the Art Thesaurus. The same procedure applied to all disciplines of the Human Knowledge constitutes the Web Thesaurus. Any conventional Search Engine that has a Web Thesaurus could enable to their users thru an intelligent Search Wizard an ideal YGWYN-IOOC, You Get What You Need – In only One Click interface. We say that search engines that could offer this type of retrieving service are in fact SSSE, Semantic Super Search Engines.    
  
On the contrary, any computer with a built in Web Thesaurus could “see” the Web as ordered thru conventional search engines. Theses Darwin Thesauruses can be programmed to be continuously updated and to improve by themselves.

 
    In the figure above you may see depicted a piece of a tree in its four upper levels. The "node paths" in this example were automatically generated by agents browsing the tree from root to leaves and from right to left. In the example the root is "Art" that opens in six branches being "The Arts" the third from right so the agent codifies this node as 0.3.

Then "The Arts” opens in five branches being "Classic Arts" the second from right and accordingly agent codifies this node as 0.3.2. and so on and so forth. In this Art Map the first level opens in 17 branches as you may check being the code numbering system the same as in this example.
    At right you may see all node information. For each node you may navigate the Web thru any Search Engine or over a pool of them. The basic map semantic skeleton fields are: node name, node ID, node path, ancestry, redundancy, and virtuality. Semantic chains are not shown here but correspond to the chains of node names from root to nodes.
   In the example the agent thru Google, finds for this node an intrinsic popularity of 28,900,000 and a semantic popularity of 1,680,000 as you may easily check. Take into account that Google used to "change its humor" from second to second so you may find significant differences in popularity depending of the IP where from you are browsing, the language country where that IP hosts, the preferences you have chosen, the time, and many other factors.

Note: Intrinsic popularity corresponds with the one assigned by the search engine to the node name (“comedy of situation” in the example that follows), the tail of its semantic chain. Semantic popularity corresponds with the one assigned by the search engine to the whole chain with their links AND-ed.

Tree harmonization and consistency tests
Virtual nodes
    Here below you may appreciate a skeleton detail of a node neighborhood depicting a type of virtual opening. The green virtual node was artificially added –without altering the discipline semantic- to focus and to obtain a better ordering of this part of the tree. There are several types of virtuality and this is type 1 is one of the commons. It replaces the six green fuzzy branches in the tree as_it_was unveiled by Darwin agents. This is one of the most difficult tasks of Tree Harmonization. This task that was performed by humans in our first prototype is now being transferred to agents.

Note: Virtual nodes tend to harmonize the tree and warn humans about potential semantic holes not yet neither covered nor considered by authorities and general people..



Tree Consistency

Below is depicted a 11 links semantic chain that has Art as its root and the subject Rigoleto as its node-subject tail.

Note: Let’s remember that subjects are special keywords, main topics of authoritative documents, in this case pages dealing with the Opera Rigoletto and nothing but Opera Rigoletto. –as much as possible, desired, probabilistically-. Let’s also remember that associated to each node-subject there exist several types of keywords’ sets.
 


This semantic chain could be seen as follows:

1. [art(0) 1800,000,000,
2. The arts (0.1) 47,600,000,
3. Performing arts (0.1.2) 62,500,000,
4. Main performing arts (0.1.2.2) NULL,
5. Theater (0.1.2.2.2) 287,000,000 theater - 13,000,000 theatre,
6. Genres (0.1.2.2.2.2) NULL,
7. Opera (0.1.2.2.2.2.14) 21,900,000 opera882,000 the opera,
8. History (0.1.2.2.2.2.14.1) 1,580,000,000 history - 620,000 opera history,
9. Italian opera (0.1.2.2.2.2.14.1.6) 85,200,
10. Bel canto movement (0.1.2.2.2.2.14.1.6.10) (42 bel-canto movement - 180,000 bel-canto,
11. Rigoletto (0.1.2.2.2.2.14.1.6.10.5) 2,700,000 rigoletto101,000 rigoletto bel-canto ]

Where: path codes and intrinsic popularity are attached to each link-node-subject (in red). Null data corresponds with nodes considered virtual, hidden, to issue intrinsic popularity queries. Semantic head and tails are in blue bold.

Darwin agents proceed to check all semantic chains. For each node agents have at hand similar names. Let’s suppose that the first Art tree version assigned the name theater for node 0.1.2.2.2 leaving theatre as its second option. When browsing semantic chains agents may find some popularity breakdown inconsistencies like the one depicted in link 5 above. Humans may by sure gave more than a reason to explain it!. Darwin Agents simple suggests the second. The same criterion was applied to link 6.

When similarity does not work to “smooth” some severe discontinuities agents try to solve it by going upwards to the next non virtual ancestry. See the severe discontinuity in 8. As agents provisionally changed opera by “the opera”, instead of history they suggest “the opera history” instead of history as a better name. However this name is at its turn changed by “opera history” as a better English construction to imply “the history of opera” or “the history of the opera”.  
In link 10 agents find that “bel-canto movement constraints too much the search and suggest the second option it has: bel-canto.

Finally agents unveil another severe discontinuity in tail link 11. So as in link 8 they suggest “rigoletto bel-canto” as the adequate name. Perhaps this suggestion could be considered a little fuzzy by humans. Why not “Rigoletto opera” that surely fits better?. Why not “Rigoletto Italian opera” more precise instead?.

We have to take into accounts that are humans who finally approve what agents do and suggest. Perhaps they decided to eliminate the link 10 that even not being an art expert appears as coherent. 

The suggested popularity downgrading for this chain follows:

1,800,000 => 47,600,000 (indefinition) => 62,500,000 => 13,000,000 => 882,000 => 620,000 => 85,200 => 180,000 => 101,000

However agents still have some incoherence facts to overcome:

2. The Arts breakdown suggests a subject in formation;
5. Check theatre versus theater deep discontinuity;
7. Check possible inconsistency of opera keyword instead of suggested “the opera”;
8. Check possible inconsistency of history instead of suggested “opera history”;
10. Check possible inconsistency of “bel-canto movement” instead of suggested “bel-canto”;
11. Check strong inconsistency of “rigoletto” instead of suggested “rigoletto bel-canto”.

Semantic popularity
Once we have all semantic chains checked we may proceed not to compute semantic popularity of chains.

[Art, the arts, performing arts, main performing arts, theatre, genres, the opera, history, Italian opera, bel-canto, rigoletto bel-canto]

All semantic chains are now checked against their consistency as potential successful queries. One of main inconsistencies in this sense is redundancy. Virtual nodes (highlighted in green) are by de facto eliminated and also as redundant those highlighted in fuchsia. Then the query to compute the rigoletto node semantic popularity will be:

QUERY full path : [Art, “performing arts”, theatre, “the opera”, history, Italian, bel-canto, rigoletto]: 462

This is a full path query centered in art as perfectly structured semantically as if strongly influenced by authorities and over all ruled by their authoritativeness!. If you as users are strongly focused by subjects more than by the discipline structure to which it belongs Darwin may obtain for each node a broader reach with a head-tail query with the pair [head – tail] an approach used in our first prototype.

QUERY  head-tail : [art rigoletto]: 576,000

This approach could good enough as a first approximation to build  rustic Web Thesauruses.  However we need to filter somehow the Top references delivered by search engines. In this example only three of 10 Top were specific. On the contrary 10 out of the 10 Top were specific and meaningful with the full path query!.