<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>Giancarlo Niccolai</title>
    <link>http://www.niccolai.cc/</link>
    <description></description>
    <language>en-us</language>           
    <generator>Nucleus CMS v3.33</generator>
    <copyright>©</copyright>             
    <category>Weblog</category>
    <docs>http://backend.userland.com/rss</docs>
    <image>
      <url>http://www.niccolai.cc//nucleus/nucleus2.gif</url>
      <title>Giancarlo Niccolai</title>
      <link>http://www.niccolai.cc/</link>
    </image>
    <item>
 <title>... and the big cirlce is now closed!</title>
 <link>http://www.niccolai.cc/index.php?itemid=463</link>
<description><![CDATA[Wow, looking at the dates I see that more than 2 months have passed since my last article.<br />
<br />
The effort to fire up my new business was quite distressing, and the complexity of the things that we're doing in Falcon required an effort that was beyond my expectation. In particular, I've been coding 12-16 hours straight a day since last Tuesday (it's Saturday) to complete the reflective syntax support; now the basics and the skeleton of the system is done, but I need some help to fill in the gaps, so I am describing the system here. This text will go also into the VM/Falcon specifications when we organize it.<br />
<br />
One fast note; as a preliminary work for this step, I removed the PCode system. For those who didn't follow the development, PCode were a set of pre-ordered expression PSteps to be specially executed by the VM. In the beginning, that seemed an optimization, so that the PCode could run multiple Expression PSteps before returning the control to the VM for new code to run, but that required the expressions to behave differently from all the other psteps (by not removing themselves when their execution was complete, and by considering their CodeFrame host untouchable), and also would have required re-compilation of the Host pcode when the expressions were changed. In the meanwhile, I found a PStep execution pattern that doesn't require the explicit intervention of the VM, and that is even more efficient than the original PCode idea; so we removed the PCode, and now the expressions, the statements and in general all the PSteps share the same exact behavior and have very similar execution patterns.<br />
<br />
<br />
<h2>Syntax Tree Reflection</h2><br />
<br />
Falcon VM executes directly a tree-represented source code through a set of minimal code units called "PStep" that are executed in turn. However, not all the PStep represent a concrete entity of a source program; some of them are just generic instructions to the VM to perform some task. <br />
<br />
<h3>Generics</h3><br />
<br />
A source program written in Falcon is represented by three categories of specialized PStep:<br />
<ul><br />
<li>Statement: A statement is roughly a line of code in a Falcon source, eventually comprising a list of statements to be conditionally executed. Branches, loops, or even single instructions are statements.</li><br />
<li>Expression: An expression is a tree of simpler expressions bind by operators, which evaluate to a single result. Many statements use an expression to obtain a value that is then used to configure their behavior; for instance, the <b>while</b> statement repeats all the sub-statements it holds while the expression it bears is evaluated as "true". Those expressions are called "selector". Expressions found alone on a line form a special statement called <b>AutoExpression</b> whose sole purpose is that to evaluate them and discard or use their evaluated value in some specific way.</li><br />
<li>SynTree: A SynTree is a collection of sequentially ordered Statements. It represents a block of code that must either be executed or skipped. Some of the statements in a SynTree may control its execution status. Also, syntrees are provided with a selector expression, which might be used for special purposes depending on the statement they are included in, and of an optional <b>target</b> symbol that is bound to receive the result of the selector. For instance, the <b>if</b> statement holds a series of syntrees, which are interpreted as alternative branches if they hold a selector. In a try/catch statement, the selector indicates the kind of data that is to be caught, and the target, if present, is used as a variable where to store the value of the caught entity.</li><br />
</ul><br />
<br />
This entities fully represent a Falcon program source file, and they are descendant of a common class called <b>TreeStep</b>. A TreeStep is namely an entity that can be represented as a source Falcon instruction, or set of instructions, or seen the other way, it's the representation of a Falcon source token as an executable VM PStep element. The VM is not bound to execute just TreeSteps, as there are PSteps that can be injected in the VM as complements of the direct code tree representation (for instance, logic expression shortcut gates are PSteps that are onwed by an Expression entity and injected in the VM on need). However, all the code that can be represented as a Falcon source program is derived from the TreeStep class.<br />
<br />
This distinction is very important because the TreeStep class has a very important optional feature that is not provided to all the PSteps: reflection. Each TreeStep must expose a <b>Class</b>, an entity derived from the <b>Falcon::Class</b> base class, which represents a handler through which the Falcon VM and other PStep are empowered to manipulate unknown data. In other words, TreeSteps are data known by the Falcon VM. As it's known the <b>Class</b> handler allows the VM to expose methods and properties to the user, to create new instances of the entity, to manage it's serialization and deserialization processes and to handle it's lifetime through the GC marking system. <br />
<br />
<h3>Internals</h3><br />
<br />
The reflection is controlled by a set of files named <b>engine/synclasses*</b>, and the TreeSteps are under the <b>engine/psteps/</b>. The reason to centralize the Class handlers for the PSteps instead of spreading them writing somewhat nearer to their pstep is twofold. First, the PStep should not care about the fact of being handled by a Class or not. Other than declaring what class they are supported by, a TreeStep should totally ignore its handler. Second, 90% of the handler class can be mainly written by deriving from a common base b>ClassTreeStep</b> and just adding the "virtual constructor" semantic needed to create an entity of the correct type on VM request. Also, many of those Class handler that require a specific behavior (nearly all the statements and some expressions) can have 50% to 90% of their behavior inherited from the base <b>ClassTreeStep</b> handler.<br />
<br />
As such, a dictionary of class handlers is provided in <b>include/falcon/synclasses_list.h</b>, and some preprocessor macros are used to expand it to generate the classes. A <b>include/falcon/synclasses_id.h</b> is provided to store some special class ID used by the lexer and the parser to determine the context as they compile the source, or in some cases, at runtime by the interactive compiler.<br />
<br />
<h3>Script interface</h3><br />
<br />
The <b>ClassTreeStep</b> base class exposes some methods to the scripts that are available for all the TreeStep elements.<br />
<br />
<b>Note</b>: The standard of Falcon Class protocol indicates that a class exposed as "Name" to the script is named like ClassName. So, ClassWhile is seen by the Falcon sources as class named "While". <br />
<br />
<ul><br />
<li>arity: size of the elements that can be directly accessed.</li><br />
<li>Operator[]; the index operator can be accessed to set or get the nth element. Some statements providing fixed but optional blocks can allow some element to be nil. For instance, the <b>for</b> statements access the main block, the <b>forfirst</b>, <b>formiddle</b> and <b>forlast</b> block respectively at index 0,1,2 and 3. Setting one block to nil means removing it.</li><br />
<li>insert(pos, element): Inserts a new element. The element must be of the kind accepted by the parent element (SynTree accepts statements, Statement accept SynTrees, Expression accept other Expressions). If the element has a fixed arity, an exception will be raised. The position (pos) has the same semantic of the [] accessor (negative index start from bottom), and if it is out of range, a new entity will be inserted at end (added). <br />
<li>remove(pos): removes the nth element. Elements having a fixed arity will raise an exception.</li><br />
<li>selector: A property returning or accepting an Expression. Setting it to nil means to remove the selector. Some elements do not accept a selector; other require a selector and can't accept a nil.</li><br />
<li>parent: the parent TreeStep. It's a read-only property; can be nil if the TreeStep is currently unparented.</li><br />
</ul><br />
<br />
The ClassSynTree handler adds a <b>target</b> class that expses a Symbol (internally handled by ClassSymbol) and can accept a nil in case the symbol is absent.<br />
<br />
Each TreeStep element has a parent which is either 0 or a valid TreeStep. A TreeStep can accept as sub-element another TreeStep only if it has not a parent. Reparenting is not allowed. However, it is possible to get an element that currently has a parent and clone() it. The cloning process creates an exactly equal subtree, but the cloned element is unparented and used as the root of the new tree, so it can be stored into any TreeStep accepting it.<br />
<br />
<h3>Garbage collecting</h3><br />
<br />
Parenting is very important for GC marking. When an TreeStep entity is found in a variable, it is marked; but actually, the mark is not applied directly to it, but it escalates up to the topmost parent. Since the TreeStep cannot be unparented once they have a parent, marking the topmost node of a tree (the unparented one) has the same meaning as marking the whole tree. The check for livelyness is performed on the topmost node, that can keep alive its tree alone. Values stored in the tree (for instance, items and symbols) are separately GC-locked, and cannot be disposed until the tree they are hosted in is killed. This grants the topmost parent of a tree full ownership of its sub-elements and to every entity the sub-element may relate to, so that it's possible to plainly delete the children of a node when the node is destroyed or substituted with another unparented node. Functions expose a RootTreeStep which is never exposed to the source files (it has not any handler Class) which can parent the topmost TreeStep and will propagate it's marking to the host function, which might in turn propagate the marking to its host class, if it's a method, and/or module.<br />
 <br />
<h3>Serialization</h3><br />
<br />
Since full reflection is granted to each TreeStep element, the task of saving pre-compiled modules is fully delegated to the Store/Restore system. The TreeStep class flatten and unflatten methods perform automatic storage of all the elements that can be accessed through  TreeStep::selector, TreeStep::arity and TreeStep::nth virtual methods. <br />
<br />
The Store system automatically finds the generator class of an item; so the only specifics that a TreeStep sublcass must respect in order to be serialized is that to offer a public empty constructor. <br />
<br />
<b>Note</b>: The <b>Storer</b> system performs serialization in two steps: first, entities are unflattened; they must declare if some part of them might be subject to separate serialization (i.e. if they have other "items" that could be serialized), then they are stored. In the <b>Class::flatten</b> method, the class handler stores sub-elements that have their own class handler taking care of their own serialization, while in the <b>Class::store</b> method, the handler saves the low level data specific for that entity, if there is any. The <b>Class::restore</b> method is called to allow the handler to create a blank entity and eventually fill it with the specific data saved on the stream, and finally the <b>Class::unflatten</b> method is called to allow the entity to hook all the items that were separately restored.  <br />
<br />
The vast majority of expressions and statements are fully described through their class, selector and elements, so it's unnecessary to provide them with specific flatten/store methods. For those entities having some special attributes (for instance, the <b>return</b> statement that can have a <b>doubt</b> clause) and/or multiple selectors, (for instance, the <b>for/to</b> statement that has two or possibly three expressions defining the loop ends and its step, and a target symbol) it is necessary to reimplement the store/restore, and eventually flatten/unflatten methods. Of course, it is possible to use the base TreeStep::flatten/unflatten methods by invoking it directly, but the code is pretty simple (it just iterates over TreeStep::nth() arity() times, using the child step TreeStep::cls() to compose the item in the flattened array).<br />
<br />
<br />
<H2>Constructors and other details.</H2><br />
To expose the TreeSteps to source scripts so that it is possible to manipulate programs runtime, it is useful to provide some script-level constructor to the Class handling the TreeStep. This is done by implementing the op_create class, which must use the parameters to create and, if useful or necessary configure the TreeStep before returning it to the script via VMContext::stackResult. As the entity is created anew, it should be delegated to the GC via the common FALCON_GC_STORE, where the handler class is the same class creating the entity ("this" is ok), and the stored entity is the just created TreeStep.<br />
<br />
As Falcon has the ability to accept more complex structures and variable parameters as function (and constructor) parameters, the constructors exposed to Falcon need not to mimic those available in C++. For instance, the GenArray constructor accepts all the parameters as expressions that, in C++, must be added later on.<br />
<br />
<b>Note</b>: Array class is the handler for the array items; GenArray is the handler for the ExprArray TreeStep, which GENERATES an array item. Similarly, all the expressions meant to generate a language item are prefixed Gen*.<br />
<br />
For some basic case (n-ary fixed expressions, including zero-ary parameterless ones, variable length of similar tokens as the GenArray constructor), the support is included in the SynClasses system, which can generate standardized Class::op_create code. Specific behavior requires custom implementations.<br />
<br />
Specialized TreeStep that offer some elements which cannot be easily captured through the selector/arity abstractions that the basic ClassTreeStep offers. For instance, the <b>for</b>statements have targets (the for/in may have multiple ones), and expressions that are not seen as selectors. This special behaviors are to be exposed by reimplementing Class::op_setProperty and op_getProperty (which often involve also hasProperty, enumerateProperties and enumeratePV). <br />
<br />
<b>Note</b>: All the classes in the ClassTreeStep hierarcy are derived from <b>DerivedFrom</b>. This class abstracts the behavior of a single-parent Class exposed to scripts. To create a class that the scripts see as derived from TreeStep, you need <b>not</b> to derive from ClassTreeStep, but instead to derive from <b>DerivedFrom</b>, passing the concrete instance of the handler ClassTreeStep to it. As any crucial class, ClassTreeStep is published by the engine, and it can be accessed through Engine::instance()->treeStepClass(). All the other TreeStep handler classes (at least, for the elements that are part of the core Falcon language) are members of SynClasses and instantiated at the construction of SynClasses singleton.<br />
 <br />
<br />
 <br />
  ]]></description>
 <category>Falcon</category>
<comments>http://www.niccolai.cc/index.php?itemid=463</comments>
 <pubDate>Sat, 31 Dec 2011 07:41:00 -0800</pubDate>
</item><item>
 <title>The big circle of serialization</title>
 <link>http://www.niccolai.cc/index.php?itemid=458</link>
<description><![CDATA[We're dealing with a pretty critical element of the Falcon new engine: the serialization process. Serialization is extremely important in the new engine because it actually serves many purposes.<br />
<ul><br />
<li>Storage of built-in type values on static streams.</li><br />
<li>Saving and restoring of pre-compiled source code modules.</li><br />
<li>Assistance in creation of stateful programs across multiple sessions (game save/load, session data on web base applications).</li><br />
<li>Cooperative programming and live data sharing across network nodes.</li><br />
</ul><br />
<br />
Most notably the serialization process must be able to restore the status of a program at a different time, at least for those items that have been serialized. For instance, if an object belonging to a certain foreign class (coded in C++) is serialized and it must be deserialized on another program, the system must be able to dynamically verify the presence of that foreign code on which the class is based, or, if not available, try and load it to make it locally visible to all the entities that must interact with that entity. The thing is even more critical if you think of the case of hyper classes, where Falcon classes and foreign classes are seamlessly merged, and they may come from different Falcon modules and native dynamic libraries.<br />
<br />
Of course, once a so complex system is setup, it is worth to use it as much pervasively as possible. To make it really useful in serializing any generic object, it must also provide a transparent mechanism to flatten cycles and multiple references to object. Without this feature, the serialization mechanism would be unsafe or partial even on simple arrays and dictionaries.<br />
<br />
And, since modules have items to serialize, and some of them may be as complex as object instances, it's worth to integrate this mechanism in the precompile module save/load process (fam). Contrarily with respect to what happens in other programmings languages, fam files are totally standalone modules (similarly to java .class files, but with an internal link-time resolution process that allow to compile them at a place where not al the libraries that they rely on need to be actually available).  <br />
<br />
In the old engine, module serialization was implemented by kinda removing the problem: a fam module didn't contain the items, but just a set of instructions on how to generate them at link time. As a result, only a limited set of standard items could have been stored in the module. For instance, even declaring a dictionary of fully known static data would have required the VM to construct the dictionary at runtime. This is how it's done in every scripting language I know of, but the fact that we did had a way to serialize dictionaries and that I could not apply it to module pre-compilation bugged me. This problem become more evident in case of the attributes (static data about symbols that could have been queried also form a generic code without the need to run the module through a VM in advance). The data you could store in an attribute was limited to the data that the fam module generator was able to understand. At times, having an array of strings in an attribute could have been useful, but that was not possible.<br />
<br />
So, the ability to serialize any kind of value that the falcon parser is able to understand, or even, that the virtual machine is able to generate, was too important to be overlooked and relegated outside the fam module loading process. But, since we had items in module, and ways to restore their value associating their class to them, it was worth to see if the mechanism could have been extended to the module code. <br />
<br />
In Falcon, data and code are different, even if they can be pretty seamlessly merged. However, the code is bound to be an acyclic directed graph, while data in items needs not. Also, types of code entities is enumerable, while items are not. This might have suggested a different approach in the serialization of code and data. But, there was a detail that prevented this naive approach: in a non far future, we want to introduce self-modifying code, or in other words, reflect statements into language items. This means that items might potentially contain code, and more specifically, they might directly point to a code entity they are a part of. While the code itself stays directed and acyclic, once the border of values in the code leaf has been crossed it is possible that the entity they point to is the same code that contains them.<br />
<br />
With this in mind, extending the same serialization of the items to the fam processing ceased to be an option and became a requirement.<br />
<br />
Serialization of items happens through the help of their class, as items are opaque to the VM and the Falcon Class entity represents the item handlers. This means that every single grammar entity must be represented by a Class that the Falcon VM can use. For instance, if we have a child of PStep representing the While construct, we must have a class derived from Class called ClassWhile that knows how to serialize a While instance. In a Falcon program a While instance would be represented by an item containing a While C++ instance and a ClassWhile instance providing the scripting interface to the while construct.<br />
<br />
It is not necessary to fully expose the interface of all the grammar constructs to the scripts by now, but all the classes handling the serialization of all the construct need to be put in place. While this is surely a lot of work, it's not terribly longer than writing a singe class knowing how to deserialize any grammar item stored in a file. And it has an interesting advantage: the grammar structures need not to be a closed enumeration anymore, and it becomes pretty simple to create new grammar constructs dynamically from third party modules. As long as those modules are available on the target environment, it is not necessary anymore that all the code entities are known by and declared in the central engine. While the parser is still not extensible (but it is easy to make it so, or even derive a new parser and use that one instead), this means that new module could bring in in new processing modes, and even whole new programming paradigms, as the grammar elements(the Psteps), are exactly what the VM understands and processes.<br />
<br />
The serializer processor is now already able to query the classes for their instance to have a say in the process. In other words, instances can provide serialization handlers that must be processed by the virtual machine. This means that, contrarily to the old engine, a VM in place and ready to run is now a necessary part of the fam generation and restore method. However, in the most common cases, the VM won't be excited, and the overhead with respect to a simple serialization is totally marginal, (even more if compared to disk write times). Of course, deserializing a tree of instructions, instead of a flat table of code, is way more complex and cpu/memory intensive. OTOH, the deserialization of the code table is not the only thing that a scripting language must do in order to restore a pre-compiled module (and then run it), so I am pretty sure that the net cost of this serialization method is not as high as it could seem.<br />
<br />
<br />
<br />
]]></description>
 <category>General</category>
<comments>http://www.niccolai.cc/index.php?itemid=458</comments>
 <pubDate>Tue, 25 Oct 2011 05:21:05 -0700</pubDate>
</item><item>
 <title>Nesting Ajax</title>
 <link>http://www.niccolai.cc/index.php?itemid=456</link>
<description><![CDATA[Falcon has an elected web application platform called "Nest" (Falcon Nest...) which we're intensely developing and that's becoming ready for production services by the hour. <br />
<br />
Initially, the only modular element in the picture was a so called "Service", a type of standardized module that could have been interacting with other services in a page, and that would have had to be "configured" (actually, programmed) on-site. For instance, a "Form" could have been a service, as a whole "Wiki". The idea is quite interesting for self-contained entities as the "Login" service, where the rendering of the login form, the check of the authorization level and other related activities can be tracked to a single conceptual entity; or, in simple database-table editing, where the form must just read and then store data in the required record; you just need, and most of the time just want, a very simple way to perform some consistency checks before storing data in the database, and then you're done.<br />
<br />
But modern web application design requires far more. Initially, I thought that it was simply possible to delegate everything client-side to third-party client libraries, as jquery or mootools; in fact, Nest is pretty forgiving on the overall site structure and can be pretty shy, to the point that  you won't even notice it's there... and so, not even use it. Especially when it comes to AJAX, while those client-side libraries have interesting pure AJAX frameworks where you just have to provide your server-side programming, the concepts behind Nest were too heavy-weight to delegate everything to a completely separate and stand-alone server-side AJAX interface. For instance, when activating the persistent login and database session management facility, having to javascriptize all the data that the server-side function of an AJAX framework need might be hard and clumsy. <br />
<br />
So I thought a way to circumvent that, without necessarily overseed or outrule client-side javascript libraries, and I came out with the idea of "widgets". They are visual elements of the final page that are rooted inside the Nest application, and thus, they can use all the facilities used by Nest, as, for instance, the current status of the variables are they are set by existing services, or session data as it is handled by the session manager.<br />
<br />
Widgets expose an AJAX interface that is published by Nest, and some simple javascript calls to excite the exposed AJAX interface and enforce actions that the AJAX interface wants them to perform. A simple javascript glue code (approx 150 lines) is used to bridge the Nest server-side widget representation and the client-side rendering. <br />
<br />
Widgets can be configured right on-page (where they are rendered before sending them to the browser) so to link with any foreign javascript code or library. Also, while they have a pretty useful AJAX interface, they can use third party AJAX support. They themselves or the rest of the web application can interact with AJAX functions exposed by Nest independently by their widget AJAX interface, or could attach to the Falcon WebAPI simple AJAX framework (which doesn't require Nest to provide web AJAX features).<br />
<br />
ATM this new feature is not very well integrated with the rest of Nest; i.e. the database oriented Form service is not using them. It will take a few days figure out how to work this feature in the existing framework.<br />
  ]]></description>
 <category>Falcon</category>
<comments>http://www.niccolai.cc/index.php?itemid=456</comments>
 <pubDate>Sun, 25 Sep 2011 15:38:36 -0700</pubDate>
</item><item>
 <title>First day back on Falcon</title>
 <link>http://www.niccolai.cc/index.php?itemid=455</link>
<description><![CDATA[I've been full force on some other projects for sometime now, but I urged to get back to the new engine -- also for business reasons. My intention is to dedicate a bit of care to it every day from now on, but today I worked all the day on it to get back in touch with the new API and the problems that were left open on the ground.<br />
<br />
Today I have fixed a couple of nifty details that were left open: the booleanization of items (a = "value"; if a ...), which is now overridable, and safe and consistent cleanup of the execution context in interactive compiler.<br />
<br />
Also, I have added grammar for both range declaration ([x:y:z]) and range accessor (x[a:b:c]). I dragged those a bit because they were extremely delicate and complex to integrate in the new array declaration syntax (which actually opens a simple, but powerful sub-parser for improved parsing of functional expressions).<br />
<br />
Now I am searching for some ppl at #falcon implementing the implementation of range access in strings and arrays (functions are already in, we just need to copy some glue code from the old engine).<br />
<br />
Well, we're back in action.<br />
]]></description>
 <category>Falcon</category>
<comments>http://www.niccolai.cc/index.php?itemid=455</comments>
 <pubDate>Thu, 22 Sep 2011 10:27:44 -0700</pubDate>
</item><item>
 <title>Target #3 Reached (and how)</title>
 <link>http://www.niccolai.cc/index.php?itemid=453</link>
<description><![CDATA[And so, we got <b>try/catch</b> statement working in the new organic virtual machine. Funny enough, it just took 3 days (less, actually) instead of the year it took with the old machine, and I was even able to add a <b>finally</b> keyword that would have been nearly impossible to be added in the old engine, with just a few touches more. Another proof, if it was needed, of the potential of the new engine. Here follows some details about what it took and what this is determining.<br />
Again, the twist was that if <i>finding a name</i> to solve the problem. The new name, and the new concept introduced in the virtual machine is the <i>code barrier</i>.<br />
<br />
After fumbling around a bit adding a separate try-frame stack, I tried to stuff the "finally" construct in. Other languages making good usage of exception handling, as Java, delegate finally to the role of cleaning up code sections not just in case of error throwing, but even in case of block break-out. This seemed desirable in Falcon as well. We have (willfully) a limited scope protection which makes variables declared inside the try block available also outside it. So, in Falcon it is possible to provide cleanups strategies that can be implemented in other languages through the finally keyword only. For instance, in falcon it is possible to write something like that:<br />
<br />
<code><br />
try<br />
   file = InputStream( ... )<br />
   ....<br />
catch IoError in e<br />
   ...<br />
end<br />
<br />
// if file is an open file, it will be true here, else it will still be nil, and so, false:<br />
if file: file.close()<br />
</code><br />
<br />
while this simple finalization rule <b>require</b> a finally block in other languages. So, finally wasn't terribly necessary to implement cleanup strategies per-se, but the fact of providing a cleanup strategy on unstructured instruction issuing was terribly fascinating. Unstructured instructions are those statements that break the flow of the code and interrupt the flow. C has break, continue, return, goto and switch (partially unstructured), C++ adds throw; in Falcon we have:<br />
<ul><br />
<li>break</li>  <br />
<li>continue</li><br />
<li>return</li><br />
<li>raise</li><br />
</ul><br />
<br />
Moreover, (this is a thing that I wasn't able to write on the blog yet), I have decided to experiment "break values", that is, funcitonal commands or values that instruct foreign code to perform unstructured operations. For instance, it is now possible to write:<br />
<code><br />
function breakMe()<br />
   ...<br />
   if <somecond>: break<br />
   ...<br />
end<br />
<br />
...<br />
while cond<br />
  ...<br />
   breakMe()<br />
  ...<br />
end<br />
</code><br />
Break and continue propagate through the call stack past the call frame, and reach the innermost loop instruction. Actually, this is not a <i>necessity</i> of the new engine, but somehow it seemed to me that it was an interesting ability; however, it's an experimental feature, we might turn it off if it hurts more than help. Anyhow, the interesting thing in that is that you can now see "break" and "continue" as values, so you can "return break", and thus communicate the will to break out from functional loops using a nicer way than the old return oob(0). <br />
<br />
This feature already required the ability to unroll the code stack in search for some "landing point" where it was safe to resume the VM control. Of course, the unroll process should have take the call stack status into consideration; willing to enlarge the scope of break/continue statements (scoping them similarly to "raise", that is, globally), it was necessary to consider eventual call frames to be unrolled while progressing backward in the search for a landing code frame.<br />
<br />
So, this mechanism was present, but I didn't formalize it (didn't give it a "name") until I saw that "raise" and error catching was more or less doing the same thing. I then formalized the idea of "code barriers" that is, code frames that required a temporary or definitive stop of the code stack unroll process fired by a non-structured instruction.<br />
<ul><br />
<li>return: stops at code frames marked as function callers by a call frame (call barrier).</li><br />
<li>continue: stops at a "next loop" code barrier.</li><br />
<li>break: stops at a "loop cleanup" code barrier.</li><br />
<li>raise: stops at a "catch barrier" matching the raise item type.</li><br />
</ul><br />
<br />
Notice that the nature of this barriers are different. In case of the of the return barrier, the barrier status is determined by a value stored in the return frame; also, the nature of this barrier allows to unroll the call frames without performing a reverse traversal, if conditions are favorable (but now you can consider that a mere optimization). In the case of the catch barrier, the barrier may be active or not depending of the fact that a catch clause can block the raised value or not. In the cases of continue and break, their barriers are mandatory and active unconditionally, each time they are met.<br />
<br />
Yet, the mere fact of considering all this phenomenons under a common name allowed me to simplify the code and use just one function to handle all of them. For optimization reasons, the function is actually an inline template, where the template parameter is a code that implements the different behaviors of each barrier.<br />
<br />
But the fact of naming this operation as "code unroll" (on unstructured statement), allowed a niftier thing. The finally block could have been named a code barrier as well: an optional code barrier suspending the code unroll process when met.<br />
<br />
This allowed to deal almost transparently with break, continue and raise crossing call frames, and the fact that return stopped at first call frame could have been considered an incidental, non-influent detail now.<br />
<br />
But this even allowed to sneak in a feature that is almost dreamed of by other languages, even major ones as Java: the ability to deal cleanly double-raise from finally blocks.<br />
<br />
In fact, the engine keeps track of the ongoing process during finally allows to deal correctly with errors happening from within it. Our engine provides "sub-errors" natively, so it came pretty natural to me to deal with error raising from within finally handlers (even nested ones) by adding the error they thrown to the ongoing, already-traveling error.<br />
<br />
ATM, I think it's sensible to let finally to override the ongoing operation; for instance, if they were excited on a break or continue, they could neutralize it by issuing an explicit return, or conversely, if excited on a clean termination of a try block or on an exception raisal, they could nullify it by issuing a continue or break operation. Also, as raise could throw not just exceptions, but any kind of item, a plain item raised by something inside the try block could (and IMHO should) be overridden by an explicit item raise in the finally block. However, those override operations would all be explicit, and not a side-effect of the language structure as in the case of error reaping in throwing exceptions from a Java finally block.<br />
<br />
Yet, as we have full knowledge of what's going on during the finally operation, we might decide to do something else, and issuing an error if finally tries to override the ongoing process, we we find this to be more sensible.<br />
<br />
One final note: actually, the time needed to code all this things in was somewhere about 4 hours. The rest of the 3 days were needed, in part, to refine the concepts of unroll process and code barriers, and in part to complete some elements of the engine that will be useful elsewhere: for instance, the Requirement class, a generalization of the Inheritance class, that allows for link-time resolution of extern symbols on even deep structures, as, for instance, the catch blocks deep inside try statements.<br />
<br />
(Actually, the catch blocks are implemented as an instance of the select statement, which switches over the type of a value -- so that statement internal structure is complete as well).<br />
<br />
]]></description>
 <category>Falcon</category>
<comments>http://www.niccolai.cc/index.php?itemid=453</comments>
 <pubDate>Wed, 17 Aug 2011 16:46:25 -0700</pubDate>
</item><item>
 <title>Target #2 reached</title>
 <link>http://www.niccolai.cc/index.php?itemid=451</link>
<description><![CDATA[The new engine has now partial capability of load submodules and supports import/from statements in all flavors.<br />
<br />
What's still missing in the module loading process:<br />
<ul><br />
<li>Precompiled module serialization-deserialization.</li><br />
<li>Binary module load (but it's a cut-&-paste from old engine).</li><br />
<li>Namespace definition.</li><br />
<li>Macrocompilation.</li><br />
</ul><br />
<br />
But as usual, this element of the new engine is already more powerful that the corresponding one we previously had in the old engine, being able to finely address sources or other module generation devices, differentiating module URIs from module logical names since before the load starts and making the load/import relationship explicit at any step of the process. This makes possible some progress that were not feasible with the old engine, as, for instance, a finer control of plugin modules (and their dependencies) and the import/in self statement that shall allow to copy the global namespaces of other modules (creating module "collections").<br />
<br />
Also, this second target makes the new engine <b>structurally complete</b>. This means that now the new engine, for how still experimental and rudimentary, is able to stand alone and test itself. Every part of the "main loop" is complete; everything else that must be done from now on is just "gap filling" work. The overall structure of the engine, the modules, the garbage collecting system, the symbol integration and sharing, the item model and anything that's vital to the Falcon Programming Language is either completed, advanced, or drawn, however is not anymore "on paper". It's here.<br />
<br />
This also means that "outsiders" might now be able to work on the new engine to help complete it. So, from now on, we should be able to proceed faster.<br />
]]></description>
 <category>General</category>
<comments>http://www.niccolai.cc/index.php?itemid=451</comments>
 <pubDate>Tue, 2 Aug 2011 14:54:09 -0700</pubDate>
</item><item>
 <title>Good Programming is giving names</title>
 <link>http://www.niccolai.cc/index.php?itemid=448</link>
<description><![CDATA[I have been enlightened by a Great Truth(TM) today. Programming (without adjectives) may be something else, but Good Programming (TM(tm)) is all about giving names to things, or, I may say, "dubbing".<br />
<br />
I've been struggling to reintroduce auto-operators (+=, ++, postfix ++ & family) in the Organic Virtual Machine for a couple of days; and a couple of "my" days, so, something 16 work hours per day.<br />
<br />
<br />
The problem was not that of making the thing happen, but to make it general enough not to occupy half the size of the VM by itself. Self-operators are not exactly a cakewalk to be implemented (in fact, many modern languages have dropped them, or just macro-expand them). Especially postfix increment, which must retain the previous value of an item and then store the new value in the same place where it came from. Consider:<br />
<br />
<code><br />
number = array[nextPos()]++<br />
</code><br />
<br />
If nextPos() returns a different index at each invocation, expanding ++ to this:<br />
<br />
<code><br />
temp = array[nextPos()]<br />
number = temp<br />
array[nextPos()] = temp + 1<br />
</code><br />
<br />
would lead to a disaster. <br />
<br />
What you have to do is to:<br />
<ol><br />
<li>Retreive the coordinates of the entity to be accessed (in our example, call nextPos() and obtain the value of the 'array' symbol)</li><br />
<li>Save the coordinates for later use</li><br />
<li>Apply the coordinate in read-mode, to obtain the desired value.</li><br />
<li>Eventually (in case of postfix operator), save the optanied value.</li><br />
<li>Apply the operator (increment, addition, multiplication etc).</li><br />
<li>Restore the access coordinates.</li><br />
<li>Apply the access coordinates in write-mode (which involves the obtained results).</li><br />
</ol><br />
<br />
And that's not all. Depending on the context, you may need to add extra space before starting the process, or remove extra data created by the process after it has completed. <br />
<br />
A complete discussion of the topic is not in the scope of this entry. Or, I may say, I don't feel like explaining all the details; but one thing should be noted: we have three kinds of accessors that might require auto operator application (dot-access, array-index-access and direct symbol access), and three kinds of accesses requiring different setups <b>and</b> in-between operations (auto-op, prefix inc/dec and postfix inc/dec). <br />
<br />
After some non-elementary study and some failed tries, a "brute force" attack begun to seem an honorable way out. A total of 9 different procedures to handle each possible combination was still manageable, and had the advantage to minimize the steps the VM would have been required to perform. But, on one side, the mechanism would have been extremely rigid and inelegant, and on the other, it would have been an hell to test and to maintain.<br />
<br />
So I studied more, and found a better solution, which was that of adding:<br />
<ul><br />
<li>Add a "pre-compile" method specific for auto-operators, with parameters allowing the auto-operators to specify what they required the pre-compiled expression to perform.</li><br />
<li>Add more PSteps to the expressions exposing the ability to accept l-values (we might say, assignment requests), taking care of doing what required by auto operators.</li><br />
</ul><br />
<br />
The idea stood, and was relatively elegant once fine-tuned. However, I had consistent but apparently unexplainable crashes at destructor of PSteps. PSteps are virtual classes, and Expressions are PSteps. I vaguely remember some warning about the fact of having virtually destroyed classes held statically inside other virtually destroyed ones, but this case didn't seem to match. However, after some hours of debugging, I definitely excluded both double-delete and memory corruption due to i.e. buffer overruns or underruns. The damn thing just crashed -- independently of the operations performed actually; it varied on just how many PSteps were declared as members of other PSteps. <br />
<br />
However, I am glad that happened. I was bugged about the size that each expression had. An expression takes about 32 bytes, but adding 5-6 psteps to handle all the possible phases meant it would have gone about 128 bytes. Still acceptable considered the fact that you create them once and run them million times, but ... And also, I was bugged about the 6 (5 + 1) destructors that were called in order to get rid of the items. The destructors themselves are empty, but they must exist to ensure proper memory collection across DLL boundaries. The fact that there was something crashy in the process, which I couldn't track, gave me the final motivation to change the plan. <br />
<br />
After all, those PSteps were not even referencing the expression they came from, they were just doing things to the context at the right place in the right moment. They were so, general, so ... <i>standard</i>...<br />
<br />
And so, <i>I was enlightened</i>. PSteps are meant to be general instructions, similarly to PCODE of traditional VMs, or to Machine Code interpreted by silicon CPUs. The fact that up to date all the PSteps had a specific purpose was just incidental, not programmatic. All the auto-operators could have made use of standard PSteps that knew nothing about the context or the purposed they were used for. As such, we could expose those standard PSteps in the engine, without any need to be related with the expression where they "belong". <br />
<br />
So, I added a StdPSteps class that acts as a storage for the PSteps doing generic tasks, as pushing, popping, duplicating, swapping or saving items in the data stack. This class resides in the engine and is created and destroyed with it. The size of the involved expressions is shrunk, the need for extra creation and destruction steps is obviated and the resulting code is both more elegant and more expressive.<br />
<br />
But that was not the enlightenment, even if finding this solution was a pleasurable side effect of it. The real thing that I discovered is the mental process through which I reached this solution, and even the previous solution of having a method specific for pre-compilation of auto-ops, and many, many other "brilliant" solutions that up to that moment I never how I was able to achieve.<br />
<br />
<b>I give names to things</b>. <br />
<br />
Giving names to a thing is more than defining it. It's mapping it in your knowledge space, making room in your mind to deal with that entity, or we may say, create an entity to deal with. If the name is powerful enough, if the name is <i>the right name</i>, then every problem seem to solve by itself. Every relation with the surroundings comes together, and both the data and the operations come to a natural layout.<br />
<br />
Somehow, that remembered me of <a href="http://en.wikipedia.org/wiki/A_Wizard_of_Earthsea"> Earthsea</a> a fantasy saga written by Ursula Le Guin, where the most powerful magic was calling things with their True Names. <br />
<br />
In the moment I understood those step were "standard", when I gave a name to those PSteps, dubbing them in a significant way, then everything came down to a simple model where I can present those general entities to their users. Similarly (but then I didn't yet realize this mechanism), when I was able to find a name for the process of accepting auto-ops requirements (and request), dubbing it <b>precompileAutoLValue</b>, I was able to relate that process to how the rest of the system wanted to use it and expected it work. Even if it was a "process" and not a just a "thing", a process is still a "thing", and can be named. <br />
<br />
The solution I found is so clear and powerful that I have been able to redesign some common parts, making them simpler, more efficient and more elegant.<br />
<br />
But the most precious treasure I had been able to dig up today is the understanding of my own solution finding scheme. I hope it can be useful to you as well.<br />
 <br />
]]></description>
 <category>Falcon</category>
<comments>http://www.niccolai.cc/index.php?itemid=448</comments>
 <pubDate>Sat, 30 Jul 2011 17:38:49 -0700</pubDate>
</item><item>
 <title>Reintroduction of references</title>
 <link>http://www.niccolai.cc/index.php?itemid=446</link>
<description><![CDATA[After a couple of days of work, I have found that the only sensible way to allow dynamic loading of modules is that to create references for static data and let those to be separately managed by the garbage collector.<br />
<br />
Consider strings, arrays of flat items or even user-defined classes that might go in the static data of a dynamically loaded module. If some item is take from by the user of the loaded module and sent around, it must stay alive as long as needed. If necessary, it must even keep alive the module that is hosting it, otherwise it should just stay around even after the module has died. I wouldn't love to keep a plugin around just because it has returned "OK" from inside a function it exposes, and that OK is kept somewhere for future records...<br />
<br />
Adding the requirement that anything that goes or may ever go outside the dynamic module boundaries must reference the module back would be a nightmare. It may make sense for (some) classes and functions (and I think we can lower this requirement to the external/native code ones) but it's totally unreasonable for i.e. strings. Also the reverse requirement, that is, that a module must stays alive 'till all the data <b>born</b> from it are gone is pretty heavy. <br />
<br />
So, the only sensible way to handle plugin-based dynamic (dischargeable) modules is that of making their static data references as soon as the module is linked dynamically in a virtual machine. When the module is gone, the reference items might still be around and keeping alive the related data.<br />
<br />
For this reason, re-introducing the references was in order before completing the module loading system. <br />
<br />
In the process, I also improved the usability of references that was somewhat limited in the old engine. Here follows a talk about it in our  #falcon chan at irc.freenode.net:<br />
 <code><br />
jonnymind this is crucial to module loading, as I have found that the only consistent way to ensdure static data stability in dynamic modules is to create a reference out of them.<br />
jonnymind Notice th power of the new references:<br />
jonnymind Prototype{val=Ref {"World"}}<br />
jonnymind >>> x = 100<br />
jonnymind 100<br />
jonnymind >>> proto<br />
jonnymind Prototype{val=Ref {100}}<br />
jonnymind ATM I decided that reference value can be altered only writing directly to a symbol.<br />
jonnymind for instance,<br />
jonnymind >>> proto.val = 1<br />
jonnymind 1<br />
jonnymind >>> proto<br />
jonnymind Prototype{val=1}<br />
jonnymind The reference is now broken.<br />
jonnymind BUT<br />
jonnymind >>> proto.mody = {v => val = self.val; val = v }<br />
jonnymind (Method .anonymous())<br />
jonnymind >>> proto.val = x<br />
jonnymind Ref {100}<br />
jonnymind >>> proto.mody( "Hello again" )<br />
jonnymind >>> proto<br />
jonnymind Prototype{mody=(Method .anonymous()), val=Ref {"Hello again"}}<br />
jonnymind >>> x<br />
jonnymind Ref {"Hello again"}<br />
jonnymind I think it's important to consider the references alterable only as values. In this way altering properties or vector elements doesn't bring into surprises,<br />
jonnymind and you always have a way to address the reference, if you need to.<br />
jonnymind Also, foreign class may have different ideas about storing items.<br />
jonnymind suppose I intercept x.v = prop<br />
jonnymind The fact that I am sending a reference there is not so transparent if you're saving i.e. integers and expect to receive integers.<br />
jonnymind OTOH,<br />
jonnymind we may decide that the behavior of OUR classes is that of respecting references.<br />
jonnymind ---<br />
jonnymind like this:<br />
jonnymind >>> x = "Hello"<br />
jonnymind "Hello"<br />
jonnymind >>> proto = p{<br />
jonnymind ...   var = $x<br />
jonnymind ... } <br />
jonnymind Prototype{var=Ref {"Hello"}}<br />
jonnymind >>> proto.var = "World"<br />
jonnymind "World"<br />
jonnymind >>> x<br />
jonnymind Ref {"World"}<br />
jonnymind But then we need a way to break the reference.<br />
billykater when would you need to break a reference? ( I am currently thinking c++ references ^^)<br />
jonnymind For instance, when var needs to be something that's not x anymore.<br />
jonnymind or when X need to do other things.<br />
jonnymind Suppose that I want to assing var to another reference.<br />
jonnymind i.e. because var is SEMANTICALLY a pointer to a falcon object that I decide somewhere else.<br />
jonnymind and I want to take another reference.<br />
jonnymind doing a thing like<br />
jonnymind proto.var = $v1 won't work, as would cause var to reference v<br />
jonnymind This is giving me an idea.<br />
jonnymind Handling references transparently is a *hard* thing.<br />
jonnymind error-prone and heavy<br />
billykater hm the problem will be that every program will have to explicitly decide what to do with references it gets handed. suppose you have a function with 3 parameters that get stored internally. each program writer now has to decide how references are handled<br />
jonnymind If we just had references as a language structure,<br />
jonnymind ...<br />
jonnymind but then, it would make no difference from a simple<br />
jonnymind class Ref(x)<br />
jonnymind  data = x<br />
jonnymind end<br />
jonnymind No, references must give you some automatism.<br />
billykater currently thinking about a way to handle it to not let them be the class you wrote above<br />
jonnymind In short, you shouldn't even know you have a reference.<br />
billykater class MyClass<br />
billykater     function func(x)<br />
billykater       self.x = x<br />
billykater     end<br />
billykater end<br />
billykater for the first c = MyClass(); c.func(ref) it will initialize the reference and the second c.func(someValue) would then set the reference to point to this value. this will become a nightmare for most programs.<br />
billykater References should really be used with care<br />
billykater or adding the reference as key to a dictionary. there can be some really interesting situations from this event<br />
billykater d = [ ref => "a" ]; ref = "b" ^^<br />
lumo_e what is wrong with our current model? (I mean 0.9.6.9)<br />
jonnymind Found<br />
jonnymind lumo_e: a couple of things,<br />
jonnymind for instance, the fact that you couldn't post references into objects or vector.<br />
jonnymind However, I found a solution.<br />
jonnymind The new engine has a method called Item::assign()<br />
jonnymind it keeps track of a few things. Mainly, of copy-on-use patterns.<br />
jonnymind It MUST be used when storing a value into a visible variable.<br />
jonnymind i.e. into an array element, a property or a symbol.<br />
jonnymind By just adding the dereference logic there...<br />
jonnymind :-) done<br />
jonnymind http://www.ideone.com/kUvg9<br />
billykater so assigning a reference to a reference will not make one reference hold the other one<br />
jonnymind Exactly.<br />
jonnymind Theoretically, it could be possible (but you still must mask references to yourself, the last test I checked in the paste).<br />
jonnymind But practically, it's a nightmare.<br />
jonnymind You can do that when you have explicit ways to access the referenced items;<br />
jonnymind for instance, in lisp is legal to reference symbols from symbols, and even reference your own symbol.<br />
jonnymind but you EXPLICITLY ask for dereference (which matches with evalueation).<br />
jonnymind like<br />
jonnymind (defq x 'x)<br />
jonnymind x now evaluates to ... x<br />
jonnymind But you always evaluate a single reference, and you are always told by the user when this evaluation must take place.<br />
jonnymind here it's different.<br />
jonnymind the different scheme of infix expressions suggest that references must be deep 1 step at worst.<br />
jonnymind i.e. references can't reference other references.<br />
jonnymind That makes things a lot easier in an infix-expression based language.<br />
</code>]]></description>
 <category>Falcon</category>
<comments>http://www.niccolai.cc/index.php?itemid=446</comments>
 <pubDate>Thu, 28 Jul 2011 09:39:27 -0700</pubDate>
</item><item>
 <title>Editline and readline</title>
 <link>http://www.niccolai.cc/index.php?itemid=444</link>
<description><![CDATA[As a part of the redesign of the new Falcon engine, I have cleaned the dependencies of the Falcon interactive mode to readline (GNU/GPL) and editline (BSD). For who doesn't know, these are two libraries that allow history navigation and a bunch of other features at a console prompt, which came pretty handy while trying code snippets and expressions in the interactive mode.<br />
<br />
Readline is widely available, embedded in bash and a bunch of other important system tools all over the POSIX world. Editline is a son of a lesser God, but it still does the job fine, and has a more detailed, extensible API. Unluckily, readline has superior support for internationalization, while utf-8 and wchar_t structures have been introduced in editline only recently, and they are supported only through the system-wide setlocale() facility. Finally, editline is hardly portable outside Posix world (but that's a minor problem, as MS-Windows console has now a "reduced" but still useable console line editor burned in).<br />
<br />
We'll have to improve the editline support. I think we can find volunteers helping out in this task, and then sending the patches upstream.]]></description>
 <category>General</category>
<comments>http://www.niccolai.cc/index.php?itemid=444</comments>
 <pubDate>Tue, 26 Jul 2011 15:27:42 -0700</pubDate>
</item><item>
 <title>Where to store symbols</title>
 <link>http://www.niccolai.cc/index.php?itemid=441</link>
<description><![CDATA[The problem with symbols in a programming language is in the fact that they represent a value, but are NOT a value. You can take them as flexible pointers to values, and this means that the real values must be stored somewhere else. <br />
<br />
There is also another problem: symbols are themselves data, and so, they themselves must reside somewhere. Since values and symbols are never in a 1:1 relationship, you need a way to ensure the following conditions at the same time:<br />
<ul><br />
<li>The value of a variable (named by a symbol) must stay alive at least as long as the symbol is accessible.</li><br />
<li>Symbols must stay alive as long as there is some grammar element or code referencing them.</li><br />
<li>Values, symbols and their container must be collected as soon as possible when they are not used anymore. In short, they must not leak.</li><br />
</ul><br />
<br />
<br />
<br />
Of course, cross referencing between values, symbols and grammar is possible, but error prone and so CPU intensive that you wouldn't want to use it. Using the garbage collector for that would be theoretically possible, but again extremely complex and CPU intensive (even if less than in the case of the reference option).<br />
<br />
Storing the symbols in their module (in the module where they are declared), or in the function where they are declared in case of local symbol, is a solution. The data generated by the module, or the choice between static and dynamic modules, will keep the module alive, and its functions and symbols with them.<br />
<br />
There are two problems with this scheme: first symbol importing from remote modules (i.e. in case of explicit import directive) and then, dynamic code generation, which can create code snippets -- beside functions; for instance, on-the-fly code compilation, if not auto-generation or script-based self-modification. In the first case, the host module importing symbol from the remote module will also want to reference and keep alive the referenced module (and its code with it), as long as it is alive; if it's a static module, possibly as long as the engine runs. <br />
<br />
In the second case, we have a real issue. We need to have variable names to access data created locally, but we can't store this names (symbols) anywhere else if not along the code they serves.<br />
<br />
I think the correct solution to symbols generated by dynamic code is to putting them in the SynTree where they belong. This means adding an optional Symbol Table to the SynTree class; and since a SynTree represents a code block at grammar level, this means that we would gain a long wished feature (by some user, actually, not by me) for free: variable scoping.]]></description>
 <category>General</category>
<comments>http://www.niccolai.cc/index.php?itemid=441</comments>
 <pubDate>Mon, 25 Jul 2011 14:13:19 -0700</pubDate>
</item>
  </channel>
</rss>
