Categories
HSQL

Week 6: Plots and TextMate Grammars

This week’s plans were to essentially work on and implement the PLOT statement, and then to see how to update the TextMate Grammar and add in the new syntax.

Plots

Continuing from last week, we have auto-imports and an AST for Plots. With some little code generation, we can have a functioning PLOT statement.

Here’s the implementation I ended up choosing out of the 3 possible ways:

PARALLEL(OUTPUT(,,NAMED('<name>')),<module>.<function template>);

PARALLEL allows for the actions to be executed in parallel, or rather, parallel given the best ordering that the ECL Compiler finds.

The code generation section was added in using this above idea, and care was taken to not emit the plot statement if the plot method was not found (and also raise an error) – as in such a case no template for the function call would be found.

Let’s take this sample code for reference

import marks;

-- lets get all the average marks of each subject
counting = select DISTINCT subid,AVG(marks) as avgmarks from marks.marks group by subid;
// join this to get the subject names
marksJoined = select * from counting join marks.subs on counting.subid=marks.subs.subid;

output counting title marksAvg;
// filter out what we need for plotting

marksPlottable = select subname, avgmarks from marksJoined;
// some different plots
PLOT from marksPlottable title hello type bar;
PLOT from marksPlottable title hello2 type bubble;

Doing some sample codegeneration after implementation, we get this translation:

IMPORT Visualizer;
IMPORT marks;
__r_action_0 := FUNCTION
__r_action_1 := TABLE(marks.marks,{ subid,avgmarks := AVE(GROUP,marks) },subid);
__r_action_2 := DEDUP(__r_action_1,ALL);
RETURN __r_action_2;
END;
counting := __r_action_0;
__r_action_3 := FUNCTION
somesource := JOIN(counting,marks.subs,LEFT.subid = RIGHT.subid);
__r_action_4 := TABLE(somesource,{ somesource });
RETURN __r_action_4;
END;
marksJoined := __r_action_3;
OUTPUT(counting,,NAMED('marksAvg'));
__r_action_5 := FUNCTION
__r_action_6 := TABLE(marksJoined,{ subname,avgmarks });
RETURN __r_action_6;
END;
marksPlottable := __r_action_5;
PARALLEL(OUTPUT(marksPlottable,,NAMED('hello')),Visualizer.MultiD.bar('hello',,'hello'));
PARALLEL(OUTPUT(marksPlottable,,NAMED('hello2')),Visualizer.TwoD.Bubble('hello2',,'hello2'));

Submitting this job to the local cluster, we can browse the Visualizations by going over to the workunit’s ‘Resources’ tab.

And, we have plots!

TextMate Grammars

The current Grammar that is being used is from the Javascript version, and is a bit finicky. When it was being set up, I used an approach similar to modelling a CFG – I mapped the ANTLR statements over to syntax mapping rules. That’s not the best idea.

I lookup and read a few grammars that were present for existing languages, giving some extra time to SQL. Turns out, most of them do not use that method above (Not a surprise to anyone).

When doing syntax highlighting, the developer also needs feedback as its being typed. If you wait for the whole statement to match, the whole statement will be in plain white until the ending ‘;’.

A much better way, is to recognize specific parts of the language, fast, and colour them accordingly.

Since HSQL has a lot of tokens, this is good and syntax highlighting can be done pretty effectively. Adding in the tokens for the language, we see some good highlighting, which the user can take advantage of as they are typing.

That took a while to get right, and as a result, this solves too issues at mind:

  1. Performance and complexity: The regexes and captures were difficult, long and annoying to read. By just identifying and colouring the tokens, the grammar is much simpler, and easy to read (and maintain).
  2. Tying into the ANTLR grammar: As the earlier TextMate grammar was identifying whole statements, the two grammars needed to be completely in sync, which is difficult considering the fact that the language is being actively worked on. Identifying token-wise simplifies things, and as we no longer dependent on the grammar, the grammar can be worked on and the syntax highlighting will still work very well. Only every now and then, new keywords that will be a part of the language need to be added into the TextMate grammar.

Wrapping up

Week 6 completes, and so do the midterm goals that we had planned for. The plan for the next week is roughly:

  1. Decide on the plan ahead
  2. Fix VSCode extension related bugs: Due to some breakage due to the addition of the File Providers in earlier weeks, quite a few bugsfixes have to be done for the HSQL extension.
  3. Add in DISTRIBUTE BY to Select – This would correspond to DISTRIBUTE in ECL
  4. Investigate grammar for WRITE TO variant of the OUTPUT statement for writing to files.
Categories
HSQL

Week 5: Imports, more imports and plots

This week’s plan is to work on imports, as a step leading to the implementation of the PLOT statement. The idea is to be able to locate ECL imports, and hint to the compiler what is contained in an ECL file.

DHSQL

An idea inspired by the relationship between javascript and TypeScript, the idea is to have “type definition” for ECL files if we need to mention what is contained in these files.

Let’s consider an example for an ECL file, a.ecl:

export a:= MODULE
  export t1:= DATASET([{1,2},{3,4},{4,5}],{integer c1,integer c2});
  export t2:=DATASET([{1,3},{3,5},{3,5}],{integer c3,integer c4});
END;

Essentially, two datasets being exported. Currently, HSQl resolves it to an AnyModule, meaning that it won’t offer any type suggestions for the module when imported. To solve this, we can add in a a.dhsql in the same folder, which is intended to look something like this:

declare t1 as TABLE (integer c1,integer c2);
declare t2 as TABLE (integer c3,integer c4);

Now, when doing an import a;, HSQL can lookup this definition file, and use it to get an idea of what has been defined in the ECL file. Now, implementing this requires another piece of work – import resolution and priority.

Import resolution and priority

ECL has an interesting property where a folder may also be imported, and it is treated as a module, where all its children files/folders are members of that module. Although beyond our scope (and priority) as of now, its important to remember this as many modules use this structure too.

Now, considering all this, when we say import a; it can refer to 5 different sources:

  1. HSQL – a.hsql
  2. DHSQL – a.dhsql
  3. ECL – a.ecl
  4. A plain folder – a/...

It seems that ECL files are preferred over folders, and for our case, we need to refer to the DHSQL file if present (and of course .hsql files get more priority). So this gives us a nice order to refer to files in our system.

Adding to this, the ECL compiler itself has an order for scanning directories. It searches the standard libraries, the bundles and then the current directory as required. As we need to replicate this order too, to find the order, the idea is to first query eclcc about these paths before obtaining them. Once we have these , we can start looking at the paths in this general order:

  1. Standard Library includes – This essentially includes the standard library.
  2. ECL Bundles – This includes any bundles that have been installed by the user.
  3. Current directory – This includes imports from the current directory.

This gives us a good way to search for files. However, we also need to add some additional layers to this, to cater to some important features:

  1. A standard library type definitions layer: placed right at the top, the idea is to add definitions for standard library functions in this way.
  2. Memory map: Placed above the current directory file map (or replacing it), this can perform as –
    1. An override for on-disk files. This is common for files that are open in an IDE, but are unsaved. In this case, these files are open for use.
    2. An in-memory store of files – This is useful for use if HSQL needs to be used as a compile server.

DHSQL works!

After adding in the AST Generator, we have to take slight care that Output generation is skipped for DHSQL files (because it is a declaration file). So after adding this in, we can have much better queries. Last week’s examples were shown with patched in code that would show warnings otherwise. Let’s see what this addition brings us:

Before: Warnings about types – HSQL can’t help in detecting errors
After: Now that we have a DHSQL File, HSQL can tell what’s wrong and right about the table

Nice! Now when a select statement doesn’t give any errors, it should ideally work without any issues really in ECL too, given that the DHSQL file is correct.

DHSQL – Declaring Plots

As a precursor to making the plot statement, the idea is to have DHSQL files emit methods for plots. Let’s take a look at the syntax that’s in my mind.

Consider we are making Visualizer.dhsql, the idea is to make it somewhat like this:

declare Bubble as plot on 'TwoD.Bubble(\'{0}\',,\'{0}\')';

I think the statement itself is self explanatory with enough familiarity to the Visualizer Bundle. The idea, is to supply HSQL with the little bit of information/macro on what it should call. {0} is the plot name here, and will be automatically supplied to it.

Picking up from the previous topic, we can add in plot methods by injecting a Visualizer.dhsql.

Lots of plot methods!

Now, when we import Visualizer, it should pull in these methods onto our Variable table list. But before that, there’s one more thing to do.

Auto Imports

Auto imports are the best way to simplify any language. Here, the idea is to automatically pull in Visualizer, and later ML_Core as we use their respective statements. Last time around, we set values in the Parser module and retrieved them. This had the disadvantage of needing to pull them out of the parser and push them back into the visitor where the processing is happening. An easier way, is using ANTLR locals. These are dynamically scoped variables that can be accessed by any child rule (read as: really really useful if you need to just set some flags).

Locals, declaring and referencing

As these are members of the ProgramContext type itself, it becomes super easy to reference and get the values out during visiting (and skips some hoops that we had).

Where does this bring us? To auto imports. If the flags are set by the time we start visiting, that means we can be assured that there’s a plot statement somewhere in the program (Because parsing is over, and this is set during parsing).

The culmination of the above work

Finally, it became possible to test the workings.
Although AST and codegen are not present at this point, it simply means the plot statement code will not be generated.

Using a debugger, we can break just before the statements are pushed out from the AST Generator. And the results are promising!

Look at all the plot methods!

With this, we have a general idea on how the PLOT statement will be represented in AST. With this, I stopped to continue in the next week.

Wrapping up – Week 6

Week 6 is the midweek. The goal was to set up atleast 2 plot statements, but I can comfortably say the whole set we showed above will be possible. So, for Week 6, the idea is to finish:

  1. Plot codegen and output – This involves using the template and generating the PLOT statement.
  2. Add the new PLOT and OUTPUT statement to the VSCode Textmate grammar. This requires some research into how Textmate grammars are actually made in VSCode (we’ve been doing a lot of straight implementations so far)
Categories
HSQL

Week 5: Imports, more imports and plots

This week’s plan is to work on imports, as a step leading to the implementation of the PLOT statement. The idea is to be able to locate ECL imports, and hint to the compiler what is contained in an ECL file.

DHSQL

An idea inspired by the relationship between javascript and TypeScript, the idea is to have “type definition” for ECL files if we need to mention what is contained in these files.

Let’s consider an example for an ECL file, a.ecl:

export a:= MODULE
  export t1:= DATASET([{1,2},{3,4},{4,5}],{integer c1,integer c2});
  export t2:=DATASET([{1,3},{3,5},{3,5}],{integer c3,integer c4});
END;

Essentially, two datasets being exported. Currently, HSQl resolves it to an AnyModule, meaning that it won’t offer any type suggestions for the module when imported. To solve this, we can add in a a.dhsql in the same folder, which is intended to look something like this:

declare t1 as TABLE (integer c1,integer c2);
declare t2 as TABLE (integer c3,integer c4);

Now, when doing an import a;, HSQL can lookup this definition file, and use it to get an idea of what has been defined in the ECL file. Now, implementing this requires another piece of work – import resolution and priority.

Import resolution and priority

ECL has an interesting property where a folder may also be imported, and it is treated as a module, where all its children files/folders are members of that module. Although beyond our scope (and priority) as of now, its important to remember this as many modules use this structure too.

Now, considering all this, when we say import a; it can refer to 5 different sources:

  1. HSQL – a.hsql
  2. DHSQL – a.dhsql
  3. ECL – a.ecl
  4. A plain folder – a/...

It seems that ECL files are preferred over folders, and for our case, we need to refer to the DHSQL file if present (and of course .hsql files get more priority). So this gives us a nice order to refer to files in our system.

Adding to this, the ECL compiler itself has an order for scanning directories. It searches the standard libraries, the bundles and then the current directory as required. As we need to replicate this order too, to find the order, the idea is to first query eclcc about these paths before obtaining them. Once we have these , we can start looking at the paths in this general order:

  1. Standard Library includes – This essentially includes the standard library.
  2. ECL Bundles – This includes any bundles that have been installed by the user.
  3. Current directory – This includes imports from the current directory.

This gives us a good way to search for files. However, we also need to add some additional layers to this, to cater to some important features:

  1. A standard library type definitions layer: placed right at the top, the idea is to add definitions for standard library functions in this way.
  2. Memory map: Placed above the current directory file map (or replacing it), this can perform as –
    1. An override for on-disk files. This is common for files that are open in an IDE, but are unsaved. In this case, these files are open for use.
    2. An in-memory store of files – This is useful for use if HSQL needs to be used as a compile server.

DHSQL works!

After adding in the AST Generator, we have to take slight care that Output generation is skipped for DHSQL files (because it is a declaration file). So after adding this in, we can have much better queries. Last week’s examples were shown with patched in code that would show warnings otherwise. Let’s see what this addition brings us:

Before: Warnings about types – HSQL can’t help in detecting errors
After: Now that we have a DHSQL File, HSQL can tell what’s wrong and right about the table

Nice! Now when a select statement doesn’t give any errors, it should ideally work without any issues really in ECL too, given that the DHSQL file is correct.

DHSQL – Declaring Plots

As a precursor to making the plot statement, the idea is to have DHSQL files emit methods for plots. Let’s take a look at the syntax that’s in my mind.

Consider we are making Visualizer.dhsql, the idea is to make it somewhat like this:

declare Bubble as plot on 'TwoD.Bubble(\'{0}\',,\'{0}\')';

I think the statement itself is self explanatory with enough familiarity to the Visualizer Bundle. The idea, is to supply HSQL with the little bit of information/macro on what it should call. {0} is the plot name here, and will be automatically supplied to it.

Picking up from the previous topic, we can add in plot methods by injecting a Visualizer.dhsql.

Lots of plot methods!

Now, when we import Visualizer, it should pull in these methods onto our Variable table list. But before that, there’s one more thing to do.

Auto Imports

Auto imports are the best way to simplify any language. Here, the idea is to automatically pull in Visualizer, and later ML_Core as we use their respective statements. Last time around, we set values in the Parser module and retrieved them. This had the disadvantage of needing to pull them out of the parser and push them back into the visitor where the processing is happening. An easier way, is using ANTLR locals. These are dynamically scoped variables that can be accessed by any child rule (read as: really really useful if you need to just set some flags).

Locals, declaring and referencing

As these are members of the ProgramContext type itself, it becomes super easy to reference and get the values out during visiting (and skips some hoops that we had).

Where does this bring us? To auto imports. If the flags are set by the time we start visiting, that means we can be assured that there’s a plot statement somewhere in the program (Because parsing is over, and this is set during parsing).

The culmination of the above work

Finally, it became possible to test the workings.
Although AST and codegen are not present at this point, it simply means the plot statement code will not be generated.

Using a debugger, we can break just before the statements are pushed out from the AST Generator. And the results are promising!

Look at all the plot methods!

With this, we have a general idea on how the PLOT statement will be represented in AST. With this, I stopped to continue in the next week.

Wrapping up – Week 6

Week 6 is the midweek. The goal was to set up atleast 2 plot statements, but I can comfortably say the whole set we showed above will be possible. So, for Week 6, the idea is to finish:

  1. Plot codegen and output – This involves using the template and generating the PLOT statement.
  2. Add the new PLOT and OUTPUT statement to the VSCode Textmate grammar. This requires some research into how Textmate grammars are actually made in VSCode (we’ve been doing a lot of straight implementations so far)