10. A reflection stop


By Juan Chamero

 The figure above means “ping-pong” designing both the game and the table. The first ideogram stands for the onomatopoeic sound “ping”, the middle stands for its associated sound “pong” and the third stands for “spherical ball”; that’s perfect!.

Take this post as an oasis to rest a little along this arid description that intends to explain Darwin Ontology. I’m a “systemologist”, a neologism for expert in systems architecture and systems behavior analysis.  But I’m also a Zen master, a discipline that tries to complement our rationale thinking with intuition, the other way of seeing and thinking. Zen people use to see “non-rationale” analogies between systems belonging to apparently too different realms. In the Japanese Do-in for instance disciples are trained to see analogies between the macro and the micro because one of its main conjectures is “the macro is in the micro and the micro is in the macro”, something equivalent to say that in their deep essence the laws that rule celestial bodies are just the same (analog in this Zen sense) like the ones that rule molecules, atoms and elementary particles. 
Scientifically this training is useful as long as you are well aware that it’s only a holistic body mind gym to open your mind. Along this gym and applied to our Darwin Ontology I used the analogy between pairs [keyword, subject] and Chinese and Japanese ideograms. Of course they are different things in the rationale domain but analog in an ideal “Zen domain”.

Let’s imagine a long lasting primitive mind, smart as ours, but without written background memory. We may also imagine this mind as a collective mind embedded in nature. Be this collective mind attached to a body and as a sentient being endowed with senses and capable to manage meaningfully a primitive set of sounds. This mind will synthesize objects: a stone, a river, a bird; activities: from elementary like running, falling, starting, stopping, killing, attacking, dying, being born, to more complex like growing a fire, throwing a stone to…, approximating to.., caressing to ,  watching to.., taking care of…; and states: satisfied, happy, sad, angry, expectant,….. This primitive mind could be smart enough to discriminate from the very beginning different “classes” of “elementary messages” that deserve memorization and replication for their transmission to individual or other collective minds by creating a chain of elementary sounds and/or creating new sounds if necessary and “sculpting” them as “ideograms” on rocks soils and trees or just remembering them phonetically.

Here we may distinguish two main evolutionary strategies for recording and transmission:  via “characters” and via “ideograms”. By characters in this discourse we mean vowels and consonants and/or syllables and phonemes. Ideograms are ideal to keep concepts alive because they work with our most powerful and multifaceted sense: the sight. Another advantage of the ideogram system is its analogical power because they resemble objects/activities/states as they are graphically represented in our mind. Its limitation rests on the number of concepts represented for a given culture because limitations of the physical screening and discrimination power of our brain.

Let’s suppose now that the collective brain built a first glossary of 5,000 ideograms, to represent the minimum amount of actions, states and things to survive. As different numerable concepts of a modern culture is in the range of 10 million it means an expansion of 1,000 to 2,000 times the primitive set. On the other side character systems may appear prima facie as more versatile: a basic Common Words and Expressions set of 3,000 terms may be easily combined as dyads, triads, and n-ads to represent concepts.

However both ways of expression, by ideograms and by characters, have attained their optimality by using the same principles: defining first a single common words glossary either by ideograms or by string of characters and define more complex and abstract derived concepts by combining sequentially common words. Ideograms evolved quite the same, new ones are based on existent ideograms or pieces of them as you could appreciate in the table below.

Humans and trees: Another analogy in the Zen sense explained herein is the “tree”.  Humans in all cultures expanded their knowledge meanwhile “navigating” thru existent represented as a tree, pretty much in the same fashion. And even languages have a well defined hierarchy that could be thematically depicted as trees. Same words have different meanings as a function of the context where they are used. Life, information, knowledge are all concepts whose meaning depends of he context, but also of the cultural, social, political and economical context where they are issued.

To illustrate this tree trend as a model of semantic perfection to illustrate the presentation of our second Darwin prototype mapping the Art in the World we selected the images depicted below. Life on earth grows upward along tree forms keeping perfect balance. In a continuous striving against the media apparent randomness in the micro becomes a perfect order in the macro.  Even tree skeletons surviving in cliffs are model of equilibrium, sometimes not easy to appreciate!.



Radia Perlman (for many the Internet’s mother), the inventor of the STP, Spanning Tree Protocol algorithm summarized it in the form of a poem titled "Algorhyme":
(This poem was modified from the original entitled "Trees", by Joyce Kilmer.

I think that I shall never see
A graph more lovely than a tree.
A tree whose crucial property
Is loop-free connectivity.
A tree which must be sure to span.
So packets can reach every LAN.
First the Root must be selected
By ID it is elected.
Least cost paths from Root are traced
In the tree these paths are placed.
A mesh is made by folks like me
Then bridges find a spanning tree.


Somme Chinese ideograms explained


 

Back again to work
Stripping

A mess of words: CW&E, Common Words and Expressions, quotations, established keywords, potential keywords, misspellings, neologisms, barbarisms, established subjects, potential subjects, appear once images, ambiguous and editing words, and commands are stripped off. Tables are transformed in text chains row wise and column wise within rows. Quotation marks, special characters, separators, should be carefully considered, for instance quotations marks may warn us about a potential keyword. For each subject documents - out of a 2 sigma amount of keywords interval distribution - could be ignored or kept aside for ulterior human revision. Links deserve special consideration, for instance increasing the probability of pointing to potential keywords.

Algorithms should detect changes in language within the page: one alternative would be to ignore documents that exceed a given percentage of words in a different language. Only pages addressed directly by their URL’s will be considered. Pages size distribution should be computed, to distinguish from briefs and overviews to virtual e-books, trying to discriminate most types of documents; master and doctoral thesis, news articles, directories, tutorials, etc. 

Special sections: Titles, detected as <title> or as inferred by styles or significant font size changes. Bibliographies should be carefully detected and handled because their potential to add superfluous noise.