Announcing: Andl Workbench

Andl now has an interactive program for writing and experimenting with queries, called Andl Workbench.

Andl Workbench should be familiar to anyone who has used MySql Workbench or Microsoft SQL Server Management Studio. It shows the current contents of a database catalog, and allows queries to be written and tested. Here is what it looks like.

AndlWorkbench

Get the latest release form the Downloads page.

Leave a Comment

Filed under Release

Announcing: Andl with Thrift is here

Thrift is a platform-independent API technology, see http://thrift.apache.org/. It’s a “software framework for scalable cross-language services development, combining a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages”. And now with Andl.

Andl provides the following Thrift-related capabilities.

  1. The Andl compiler can generate a Thrift Interface Definition Language (IDL) file as an optional output.
  2. The Andl runtime performs data conversion to and from the format required by Thrift classes for wire transmission.

That’s all you really need. To put together a complete application involves the following steps.

  1. Write the database backend in Andl.
  2. Compile with the /t switch to generate the Thrift IDL. Compiled data and code are stored in a database catalog.
  3. Run the Thrift IDL compiler to generate an API of callable functions in the target language.
  4. The release includes a generic server written in C# using Thrift classes, which can be modified to suit specific needs. It simply needs to load the catalog.
  5. The release also includes sample client programs using both Thrift classes and the target language API functions, but containing no Andl  code or functions at all.
  6. The developer should modify the client programs or create new ones that call the target language API according to the needs of the application.

And that’s it. Mostly it is just Thrift and the Thrift documentation applies. The end result is that you can write an Andl program that exposes a set of callable entry points, and then call them from any Thrift-supported language.

Check the Downloads page for the current release.

 

Leave a Comment

Filed under Language, Release

Alice and the Relational Algebra

This is a summary largely based on the ‘Alice’ book: http://webdam.inria.fr/Alice/.

The non-redundant set of operations for a maximally effective Relational Algebra that is restricted to safe, domain independent, non-recursive, type safe queries is:

  • Select (restrict/where)
  • Project
  • Rename
  • Join (natural)
  • Union

This set is referred to as SPRJU. This set is incomplete but ‘safe’ because it excludes negation. It can be completed by adding:

  • Minus (set difference).

This set could be referred to as SPRJUM, and from this foundation all the other operations of the Relational Algebra can be created. That includes other set operations, and other kinds of join such as Antijoin or Not Matching.

To this the authors propose the addition of

  • While (recursion)

The complete set could be referred to as SPRJUMW. It is able to handle recursive joins, similar to the SQL Recursive Common Table Expression.

This is the endpoint for domain-independent queries. There is nothing to add (or take away). This is a complete Relational Algebra, capable of the full range of safe, domain-independent queries (including recursive queries).

However, this Algebra is unable to perform a range of familiar tasks based on computation, which means that it cannot return any attribute values that are not already in the database and it cannot perform computation-like activities such as counting. It is not Turing Complete, and cannot be made so.

Computation requires abandoning domain-independence. To compute, one needs a type system. The SELECT of SQL and the EXTEND of Tutorial D require computation and a type system. These issues are not directly addressed in this book or anywhere else that I know of.

Some useful extensions that lie outside the domain-independent Relational Algebra include:

  • Introducing an attribute type system and calculated attributes (EXTEND in Tutorial D, SELECT in SQL)
  • Introducing algorithmic or calculated relations (see Hall-Hitchcock-Todd or Appendix A)
  • An extended While (see book for details).

Appendix A shows that the addition of calculated relations can provide a substitute for Select/Restrict/Where. Operators that provide recursion can substitute for While. Ordering or relaxing type safety provide further choices. At some point there are just too many options available. However, aggregation is not adequately treated in any of the above. The Alice book has a brief section of little practical value.

It is my view that this is an adequate theoretical foundation for a Relational Algebra that is complete within the limits of safe, domain-independent, type safe queries. A foundation for the various methods of introducing computation is still lacking, especially with respect to aggregation and ordering.

Leave a Comment

Filed under Relational Algebra, TTM

Recursive Queries: Sudoku Solver

This Sudoku solver uses a number of recursive queries, first to generate fixed data structures and then to drive the search for a solution.

This solver simply applies the rules.

  1. Every row, column and box (3×3) must have the digits 1 to 9 once and once only.
  2. If only one digit can go in a cell then it must go there.
  3. If a digit can go in only one place in a row, column or box then it must go there.

Here is the code.

First, the fixed data structures.

digits := {{ sdigit:= '1', ndigit := 1}} recurse( {{ sdigit:=text(ndigit+1), ndigit:=ndigit+1 }} [?(ndigit <= 9)] )
digitsx := digits union {{ sdigit := '.', ndigit := 0 }}
units := {{ index := 0, row := 0, col := 0, box := 0 }} recurse( 
        {{ index := index + 1, 
          row := (index + 1) div 9, 
          col := (index + 1) mod 9, 
          box := (index + 1) div 3 mod 3 + (index + 1) div 27 * 3 }} [?(index <= 80)] )
poss := units [{ index }] join digits [{ ndigit }]
possu := units join digits [{ ndigit }]

Now some useful functions.

showb(t:text) => do {
     seq(11)[{ N, line:=
         if(N mod 4 = 3,
            fill('-', 9),
            right(left(t, 9 + (N - N div 4) * 9), 9))}]
 }  
// Show a set of knowns. First fill out all index values, then convert to text
 showunk(k:poss) => do {
 t := (k union (units ajoin k)[{ index, ndigit := 0}]) join digitsx[{ ndigit, sdigit }]
 showb(t [$(index)][{ fold(&, sdigit) }])
}

The original raw data

inp := {{ sud := '53..7....6..195....98....6.8...6...34..8.3..17...2...6.6....28....419..5....8..79' }}
 //inp := {{ sud := '1....7.9..3..2...8..96..5....53..9...1..8...26....4...3......1..4......7..7...3..' }}
 inp
 board := ((units join inp) [{ * sud, sdigit := right(left(sud, index + 1), 1) }] compose digitsx) [{ index, ndigit }]
 knowns := board [?( ndigit <> 0)]
 'Knowns=' & knowns.count
 showunk(knowns)

This the solver, which recurses as long as it can make progress. After this you have to guess.

solution := knowns recurse(
 do {
 // start with the 729 possiblities, progressively remove conflicts with knowns
 knownsu := knowns join units
 allowedu := possu ajoin knownsu[{ index }]
 ajoin knownsu[{ row, ndigit }]
 ajoin knownsu[{ col, ndigit }]
 ajoin knownsu[{ box, ndigit }]

// algorithm 1 - a cell with only one possible digit must be that digit
 new1 := allowedu [{ index, tot:=fold(+,1) }] [?(tot=1)] join allowedu

// algorithm 2 - a digit with only one place in a unit must go there
 new2a := allowedu [{ ndigit, row, tot:=fold(+,1) }] [?(tot=1)] join allowedu
 new2b := allowedu [{ ndigit, col, tot:=fold(+,1) }] [?(tot=1)] join allowedu
 new2c := allowedu [{ ndigit, box, tot:=fold(+,1) }] [?(tot=1)] join allowedu

 new1[{ index, ndigit }] union new2a union new2b union new2c
 }
)
showunk(solution)
And here is the grid, before and after solving.
N      | line
------------------
     0 | 53..7....
     1 | 6..195...
     2 | .98....6.
     3 | ---------
     4 | 8...6...3
     5 | 4..8.3..1
     6 | 7...2...6
     7 | ---------
     8 | .6....28.
     9 | ...419..5
    10 | ....8..79

N      | line
------------------
     0 | 534678912
     1 | 672195348
     2 | 198342567
     3 | ---------
     4 | 859761423
     5 | 426853791
     6 | 713924856
     7 | ---------
     8 | 961537284
     9 | 287419635
    10 | 345286179

			

Leave a Comment

Filed under Code sample, Language

Recursive Queries: the Mandelbrot Set

Recursive queries can be used to solve a variety of problems that require multiple passes, not just recursive data structures. Here is an implementation of the Mandelbrot set as ASCII art.

xaxis := {{ x:=-2.0 }} recurse( {{ x:=x+0.05 }} [?(x<1.2)] )
yaxis := {{ y:=-1.0 }} recurse( {{ y:=y+0.1 }} [?(y<1.1)] )
m := ({{ iter:=0, x:=0, y:=0 }} join xaxis[{ cx:=x }] join yaxis[{ cy:=y }]) 
    recurse( {{ iter:=iter+1, x := x*x-y*y+cx, y:=2*x*y+cy, cx, cy, }} [?(x*x+y*y<4.0 and iter<28)] )
m.count
m2 := m[{ iter := fold(max,iter), cx, cy }] [$(cy,cx)]
m2.count
a := m2 [ { cy, t := fold(&, right(left(' .+*#', 1 + iter div 6), 1)) }]
a

This query iterates each point up to 28 times to find those that lie on the border of the Mandelbrot set. Each point that is still within range is iterated until no more can be found. The number of iterations selects which character to display.

Here is the result.

cy     | t
--------------------------------------------------------------
  -1.0 |                                     ....#            
  -0.9 |                                    ..#*..            
  -0.8 |                                  ..+####+.           
  -0.7 |                             .......+####+...   +     
  -0.6 |                            ..##+###########*.++++    
  -0.5 |                           .+.##################*.    
  -0.4 |               .............*###################+.*   
  -0.3 |               ..+*.+#+....*#####################+.   
  -0.2 |              ...+#######++#######################.   
  -0.1 |           ...++*################################.    
   0.0 |  #############################################...    
   0.1 |           ...++*################################.    
   0.2 |              ...+#######++#######################.   
   0.3 |               ..+*.+#+....*#####################+.   
   0.4 |               .............*###################+.*   
   0.5 |                           .+.##################*.    
   0.6 |                            ..##+###########*.++++    
   0.7 |                             .......+####+...   +     
   0.8 |                                  ..+####+.           
   0.9 |                                    ..#*..            
   1.0 |                                     ....#

 

Leave a Comment

Filed under Code sample, Language

Recursive Queries

Andl now supports recursive queries. This can be seen as a direct extension of the Relational Algebra for the situation where a query cannot be satisfied in a single operation.

Imagine a company organisation chart. One query will find every employee who reports to a given boss, but it takes a succession of queries to find out who reports to that employee, and who reports to them, until at last we reach the workers and no-one reports to them.

Here is the data for a simple orgchart (data thanks to SQLite).

name  | boss
-------------
Alice |
Bob   | Alice
Cindy | Alice
Dave  | Bob
Emma  | Bob
Fred  | Cindy
Gail  | Cindy

Here is the Andl code to query it.

def orgchart:db(csv)
orgchart
ua := {{ name:= 'Alice', level := 0 }} recurse( {{ boss := name, level := level+1 }} compose orgchart)
ua
ua [{ t:=fill('.', level*3) & name }]

And here is the result. The recurse function repeatedly evaluates the join with successive tuples starting from the seed, until no more can be found.

name  | level
--------------
Alice |      0
Bob   |      1
Cindy |      1
Dave  |      2
Emma  |      2
Fred  |      2
Gail  |      2
t
----------
Alice
...Bob
...Cindy
......Dave
......Emma
......Fred
......Gail

Leave a Comment

Filed under Code sample, Language

Sample Andl code: the Dbix CD Sample

One thing I knew and has only been reinforced in early feedback about Andl: you can’t just talk about a new language, you have to show it.

So here is my version of a sample application I found on CPAN: http://search.cpan.org/dist/DBIx-Class/lib/DBIx/Class/Manual/Example.pod. They’re not strictly comparable, so just take this is as a sample of what Andl code looks like for creating some data and executing some simple queries.

// translation of sample application from http://search.cpan.org/dist/DBIx-Class/lib/DBIx/Class/Manual/Example.pod

// create the database relvars
artist  := {{ artistid:0, name:'' }}
cd      := {{ cdid:0, artistid:0, title:'', year:0 }}
track   := {{ trackid:0, cdid:0, title:'' }}

// some data in temporary relvars, with auto-generated ordinals
$artist_data := {{name:''}
  ( 'Michael Jackson' ), 
  ( 'Eminem' ),
} [{ *artistid := ord() }]

$cd_data := {{ title:'', name:'' }
  ( 'Thriller',                'Michael Jackson' ),
  ( 'Bad',                     'Michael Jackson' ),
  ( 'The Marshall Mathers LP', 'Eminem' ),
} [{ *cdid := ord() }]

$track_data := {{ title:'', cd:'' }
   ( 'Beat It'         , 'Thriller' ),
   ( 'Billie Jean'     , 'Thriller' ),
   ( 'Dirty Diana'     , 'Bad' ),
   ( 'Smooth Criminal' , 'Bad' ),
   ( 'Leave Me Alone'  , 'Bad' ),
   ( 'Stan'            , 'The Marshall Mathers LP' ),
   ( 'The Way I Am'    , 'The Marshall Mathers LP' ),
 } [{ *trackid := ord() }]

// update the database relvars
artist := union $artist_data
cd := union ($cd_data join artist) [{ title, cdid, artistid, year:=0}]
track := union ($track_data join cd [{ *cd := title }]) [{ trackid, title, cdid }]

// functions to answer various queries
get_tracks_by_cd(t) => cd[ ?(title = t) { *title } ] join track
get_tracks_by_artist(a) => (artist[ ?(name = a) { *name }] join cd) [{ cdid }] join track
get_cd_by_track(t) => track [ ?(title = t) { cdid } ] join cd
get_cds_by_artist(a) => artist [?(name = a) { artistid } ] join cd
get_artist_by_track(t) => (track [?(title = t) { cdid }] join cd) [{ artistid }] join artist
get_artist_by_cd(t) => (cd [?(title = t) { cdid }] join cd) [{ artistid }] join artist

// first show the raw data

crlf := h'd a'
output := crlf & "=== Sample data ===" & crlf
output := artist.pp
output := crlf.pp
output := cd.pp
output := crlf.pp
output := track.pp

// now do the queries

show(title, data:{{ str:'' }}) => do {
    output := title & crlf & data[ {fold(&, "  " & str & crlf)} ]
}

output := crlf & "=== Query results ===" & crlf
show("Track title:", get_tracks_by_cd('Bad') [ {str:=title} ])
show("Track title:", get_tracks_by_artist('Michael Jackson') [ {str:=title} ])
show("CD title:", get_cd_by_track('Stan') [ {str:=title} ])
show("CD title:", get_cds_by_artist('Michael Jackson') [ {str:=title} ])
show("Artist:", get_artist_by_track('Dirty Diana') [ {str:=name} ])
show("Artist:", get_artist_by_cd('The Marshall Mathers LP') [ {str:=name} ])

The output from compiling and running this script looks as follows.

=== Sample data ===

artistid | name           
--------------------------
       0 | Michael Jackson
       1 | Eminem         

text: '
'
cdid   | artistid | title                   | year  
----------------------------------------------------
     0 |        0 | Thriller                |      0
     1 |        0 | Bad                     |      0
     2 |        1 | The Marshall Mathers LP |      0

text: '
'
trackid | cdid   | title          
----------------------------------
      0 |      0 | Beat It        
      1 |      0 | Billie Jean    
      2 |      1 | Dirty Diana    
      3 |      1 | Smooth Criminal
      4 |      1 | Leave Me Alone 
      5 |      2 | Stan           
      6 |      2 | The Way I Am   


=== Query results ===

Track title:
  Dirty Diana
  Smooth Criminal
  Leave Me Alone

Track title:
  Beat It
  Billie Jean
  Dirty Diana
  Smooth Criminal
  Leave Me Alone

CD title:
  The Marshall Mathers LP

CD title:
  Thriller
  Bad

Artist:
  Michael Jackson

Artist:
  Eminem

Leave a Comment

Filed under Code sample

Announcement: Andl on SQLite is here!

Andl can now use Sqlite as a backend database. In other words, the entire language, type system and relational operations now work on a native SQLite database.

The language supported is identical to the in-memory version already released. The type system, expression evaluation and relational algebra are identical. There are no (known or intended) incompatibilities or restrictions, other than as below. The performance is reasonable (considering the stage of development), but a couple of areas need some work.

All relation and attribute types are supported, including DEE and DUM (relations with no attributes), Relation Valued Attributes, Tuple Valued Attributes, User Defined Types and arbitrary aggregation using FOLD(). There is one area not yet implemented due to challenges in the platform/SQL, and that is Ordered Aggregation (grouping). It’s just plain hard!

The implementation now also contains a Catalog, which stores compiled functions, user-defined types, application variables of all types and links to database relvars. Any program can create its own new catalog, import an existing one, and/or save its changes. The database can be local, by persisting the in-memory relvars to a folder, or it can be SQLite.

For those interested, the strategy is a mix of SQL generation and a runtime Virtual Machine. The implementation strategy contains only a modest amount of Sqlite specific code. The exact same method should work with any of the main RDBMS. I would plan to tackle Postgres next (requires Java Interop), and then perhaps SQL Server.

A release is available now from the Downloads page.

2 Comments

Filed under Backend

Announcement: Andl on SQLite

This is a preliminary announcement that Andl can now use Sqlite as a backend database. In other words, it has a preliminary implementation of The Third Manifesto Prescriptions 13, 14, 16 and 17 with respect to database relvars stored in Sqlite.

The language supported is identical to the in-memory version already released. The type system, expression evaluation and relational algebra are identical. There are no (known or intended) incompatibilities or restrictions, other than as below. The performance is reasonable (considering the stage of development), but a couple of areas need some work.

All relation and attribute types are supported, including DEE and DUM (relations with no attributes), Relation Valued Attributes, Tuple Valued Attributes and User Defined Types. There are however two areas not yet implemented due to challenges in the platform/SQL. They’re just plain hard!

  1. Aggregation (FOLD)
  2. Ordered aggregation (grouping)

For those interested, the strategy is a mix of SQL generation and a runtime Virtual Machine. The implementation strategy contains only a modest amount of Sqlite specific code. The exact same method should work with any of the main RDBMS. I would plan to tackle Postgres next (requires Java Interop), and then perhaps SQL Server.

I shall make a release available soon, hopefully when I have solved the above, it’s a bit more robust, and it performs a bit better.

Leave a Comment

Filed under Release

Why do we need a new database language?

As it stands, The Third Manifesto satisfies an academic need. It sets out in great detail a proposed foundation for database programming, along with a language and type system, with the clear intention of replacing and surpassing SQL.

But the question is: what purpose does it serve in the wider IT community? Who has a problem that it will solve? Who needs products built to TTM guidelines?

I think the ‘smoking gun’ of current business application development is the ‘object relational impedance mismatch’. The plethora of ‘object relational mappers’ or ORMs out there provide evidence for that view. The story is something like this.

  1. The ‘object relational impedance mismatch’ is a real impediment to productive application development. The ORM layer consumes large amounts of programmer time and energy to develop and maintain, it’s a major source of bugs and a major source of performance issues. The ORM is a solution to a problem that should not exist. It’s time for it to go.
  2. Attempts to resolve this on the database side (by creating ‘object databases’) have been a dismal failure, at least for general purpose application development. The ‘NoSQL’ databases have done better, at least for a certain range of applications, but there is no sign that either is capable of taking over as the next generation.
  3. Relational theory is a proven foundation for building robust databases over a wide range of size and function. Relational databases are here for the long haul.
  4. Therefore the ORM problem must be resolved on the language side (by creating ‘relational languages’). The essential business logic of any application has to be expressed in a language that has native access to relational data, without the need to translate the data into objects and back again.
  5. The first attempt at such a language was SQL, and it has failed to deliver.
    1. SQL at the 1992 level is widely used as a data sublanguage in conjunction with an ORM. It cannot be used to develop business logic.
    2. SQL since 1996 has included SQL/PSM, which can be used to develop business logic (often referred to as stored procedures). However actual implementations of SQL/PSM very widely in compliance and capabilities; there are multiple partial and incompatible implementations, and in many environments there is none (eg Sqlite on Android).
    3. SQL is a very old language (since the 1970s) and has many technical deficiencies including an awkward syntax, difficulties in fully expressing the Relational Algebra and limited capabilities for extension and defining types.
    4. SQL does not provide or is incompatible with modern language development systems: IDE, debugger, version control, build systems, etc.
  6. So the need is for a modern language that
    1. Can be used instead of, as well as or alongside SQL
    2. Can be used to code the essential business logic of applications
    3. Has native access to relational data and data types
    4. Has a sophisticated extensible type system
    5. Plays well with other languages and technologies
    6. Has no ‘object relational impedance mismatch’ (by design)
    7. Is competitive with other modern languages in its use of and compatibility with modern language development systems
    8. Is available as the same language on all possible platforms.

That’s what The Third Manifesto describes and that’s what Andl aspires to be.

Leave a Comment

Filed under Pondering, TTM