Announcement: Andl on SQLite is here!

Andl can now use Sqlite as a backend database. In other words, the entire language, type system and relational operations now work on a native SQLite database.

The language supported is identical to the in-memory version already released. The type system, expression evaluation and relational algebra are identical. There are no (known or intended) incompatibilities or restrictions, other than as below. The performance is reasonable (considering the stage of development), but a couple of areas need some work.

All relation and attribute types are supported, including DEE and DUM (relations with no attributes), Relation Valued Attributes, Tuple Valued Attributes, User Defined Types and arbitrary aggregation using FOLD(). There is one area not yet implemented due to challenges in the platform/SQL, and that is Ordered Aggregation (grouping). It’s just plain hard!

The implementation now also contains a Catalog, which stores compiled functions, user-defined types, application variables of all types and links to database relvars. Any program can create its own new catalog, import an existing one, and/or save its changes. The database can be local, by persisting the in-memory relvars to a folder, or it can be SQLite.

For those interested, the strategy is a mix of SQL generation and a runtime Virtual Machine. The implementation strategy contains only a modest amount of Sqlite specific code. The exact same method should work with any of the main RDBMS. I would plan to tackle Postgres next (requires Java Interop), and then perhaps SQL Server.

A release is available now from the Downloads page.

2 Comments

Filed under Backend

Announcement: Andl on SQLite

This is a preliminary announcement that Andl can now use Sqlite as a backend database. In other words, it has a preliminary implementation of The Third Manifesto Prescriptions 13, 14, 16 and 17 with respect to database relvars stored in Sqlite.

The language supported is identical to the in-memory version already released. The type system, expression evaluation and relational algebra are identical. There are no (known or intended) incompatibilities or restrictions, other than as below. The performance is reasonable (considering the stage of development), but a couple of areas need some work.

All relation and attribute types are supported, including DEE and DUM (relations with no attributes), Relation Valued Attributes, Tuple Valued Attributes and User Defined Types. There are however two areas not yet implemented due to challenges in the platform/SQL. They’re just plain hard!

  1. Aggregation (FOLD)
  2. Ordered aggregation (grouping)

For those interested, the strategy is a mix of SQL generation and a runtime Virtual Machine. The implementation strategy contains only a modest amount of Sqlite specific code. The exact same method should work with any of the main RDBMS. I would plan to tackle Postgres next (requires Java Interop), and then perhaps SQL Server.

I shall make a release available soon, hopefully when I have solved the above, it’s a bit more robust, and it performs a bit better.

Leave a Comment

Filed under Release

Why do we need a new database language?

As it stands, The Third Manifesto satisfies an academic need. It sets out in great detail a proposed foundation for database programming, along with a language and type system, with the clear intention of replacing and surpassing SQL.

But the question is: what purpose does it serve in the wider IT community? Who has a problem that it will solve? Who needs products built to TTM guidelines?

I think the ‘smoking gun’ of current business application development is the ‘object relational impedance mismatch’. The plethora of ‘object relational mappers’ or ORMs out there provide evidence for that view. The story is something like this.

  1. The ‘object relational impedance mismatch’ is a real impediment to productive application development. The ORM layer consumes large amounts of programmer time and energy to develop and maintain, it’s a major source of bugs and a major source of performance issues. The ORM is a solution to a problem that should not exist. It’s time for it to go.
  2. Attempts to resolve this on the database side (by creating ‘object databases’) have been a dismal failure, at least for general purpose application development. The ‘NoSQL’ databases have done better, at least for a certain range of applications, but there is no sign that either is capable of taking over as the next generation.
  3. Relational theory is a proven foundation for building robust databases over a wide range of size and function. Relational databases are here for the long haul.
  4. Therefore the ORM problem must be resolved on the language side (by creating ‘relational languages’). The essential business logic of any application has to be expressed in a language that has native access to relational data, without the need to translate the data into objects and back again.
  5. The first attempt at such a language was SQL, and it has failed to deliver.
    1. SQL at the 1992 level is widely used as a data sublanguage in conjunction with an ORM. It cannot be used to develop business logic.
    2. SQL since 1996 has included SQL/PSM, which can be used to develop business logic (often referred to as stored procedures). However actual implementations of SQL/PSM very widely in compliance and capabilities; there are multiple partial and incompatible implementations, and in many environments there is none (eg Sqlite on Android).
    3. SQL is a very old language (since the 1970s) and has many technical deficiencies including an awkward syntax, difficulties in fully expressing the Relational Algebra and limited capabilities for extension and defining types.
    4. SQL does not provide or is incompatible with modern language development systems: IDE, debugger, version control, build systems, etc.
  6. So the need is for a modern language that
    1. Can be used instead of, as well as or alongside SQL
    2. Can be used to code the essential business logic of applications
    3. Has native access to relational data and data types
    4. Has a sophisticated extensible type system
    5. Plays well with other languages and technologies
    6. Has no ‘object relational impedance mismatch’ (by design)
    7. Is competitive with other modern languages in its use of and compatibility with modern language development systems
    8. Is available as the same language on all possible platforms.

That’s what The Third Manifesto describes and that’s what Andl aspires to be.

Leave a Comment

Filed under Pondering, TTM

The Intended Role of Andl

The intended role of Andl is to be the implementation language for the data model of an application. That needs some explanation.

Andl does what SQL does, but it is not SQL. It is possible to code the business model of an application in SQL dialects such as PL/PSM (SQL Standard), PL/SQL (Oracle) or Transact-SQL (Microsoft) but SQL has many problems (see here) and few people make this choice. Andl aims to provide a language free of these problems.

Andl has been designed to be used as follows. First, an application developer takes a set of business requirements that comprise a data model, a set of operations on that model, and a set of requirements for users to interact with the model. The data model is implemented in the form of tables in a relational database management system, and the user interface and related business requirements is implemented using some chosen front end technology: web, desktop or mobile as needed. Andl is used to implement the operations on the data model, and the means by which the user interface queries and updates the data in the model.

The effect is that the ‘object relational mismatch’ disappears, and there is no longer any need for an ‘object relational mapper’. No objects are needed between the data model and the user interface, just data as individual items, rows or tables.

There is still a role for a general purpose programming language like C# or Java, acting as glue to join various parts together and also in implementing various parts of the type system that Andl depends on. There is still a role for a front end technology based on languages such as HTML and JavaScript since Andl does not offer a user interface.

But for the heart of the business application, for operations on the data and for implementing the rules, there is Andl.

Leave a Comment

Filed under Rationale

What exactly is so bad about SQL?

That’s the question I’ve been asked: What exactly is so bad about SQL?. It’s not easy to produce a satisfying answer.

Most of the SQL that most people use is roughly SQL-1992, with perhaps a few bits from SQL_2003 like WITH. That’s a data sub-language, and while you can only write a small part of an application in this language, most of the critical logic has to be written in a general purpose object-oriented programming language. That’s where the ‘object relational impedance mismatch’ comes from.

Here is a list of the ‘Fatal Flaws’ of this kind of SQL, taken from the Askew Wall by Hugh Darwen.

  • Anonymous columns (partly addressed in 1992)
  • FROM clause restricted to named tables (fixed in 1992)
  • Duplicate column names
  • Order of columns is significant
  • Duplicate rows
  • NULL
  • Failure to support degenerate cases (e.g. columnless tables)
  • Failure to support “=“ properly
  • and lots more, and probably to come.

Chris Date and Hugh Darwen wrote a book called Database, Types and the Relational Model. It contains the rational for the Third Manifesto, and also an assessment of how well SQL stacks up against the Manifesto. The version of SQL targeted is SQL:2003 (with PL/PSM). The book does an extensive analysis of this version of SQL against TTM, and identifies sins of omission and commission. Obviously SQL inherits many legacy features and the consequences of decisions made years ago, however many of the gaps appear to have been filled in. Rather than focussing on the bad things one can do in SQL, my question comes down to these.

  1. If you set out to code in a TTM compliant style, how close can you get in SQL:2003?
  2. If a TTM language generated SQL:2003 as a backend target, how close could it get to TTM compliance?

Obviously in either case:

  1. NULL should never be used (and some extra code is needed to avoid it arising during evaluations, eg in aggregates)
  2. Some comparisons need to be coded carefully to avoid mistakes (eg trailing spaces in CHAR)
  3. All queries should have DISTINCT in effect to avoid duplicate rows
  4. Column attributes should be unique and in the same order in all tables (that have the same columns)
  5. Keys should be defined as compliant candidate keys.
  6. Tuple level operations should be avoided (FETCH, WHERE CURRENT)
  7. Much care should be taken to use certain syntactic features and not use others.

The only failures (from that list) that I can find are:

  1. There are severe limitations in the type system (possreps, components, selectors, constraints, tuple types, relation types).
  2. No assignment (or multiple assignment) for relation types; emulation via DELETE/INSERT runs into the multiple assignment problem.
  3. No relation-valued attributes (but might be emulated using MULTISET)
  4. No public relvars (can only share database tables)
  5. No nested transactions; some other issues around transactions.

I should point out here that this only applies directly to SQL:2003, but in broad terms it probably applies to later versions of the standard and implementations that claim substantial compliance with the standard (perhaps with different syntax).

So apart from the type system, the horrible syntax and the plethora of ways to shoot yourself in the foot, SQL:2003 is not all that bad! My answer to my two questions is:

  1. Quite close, apart from the seriously deficient type system.
  2. Very close indeed, if the compiler can provide its own type system on top of that provided by SQL, its own syntax, and then generate bog standard SQL (or some dialect to suit a particular engine).

You can write good code in SQL:2003 or later, but it’s not easy. A better language sitting on top of SQL might make it a whole lot easier. Is that where Andl fits?

Leave a Comment

Filed under Backend, Rationale

Could SQLite be the engine for Andl?

I have been looking into using SQLite as the ‘backend’ for Andl. It has:

  • Most of ‘standard SQL’
    • CREATE, ALTER, DROP TABLE
    • TRANSACTIONS
    • SELECT, INSERT, UPDATE, DELETE
    • INDEXES, VIEWS, TRIGGERS
  • Minimal types: null, integer, real, text, blob
  • Parameters for prepared statements
  • Minimal functions, including aggregates
    • Minimal (open) expressions
    • User-defined functions in C

It does not have:

  • A ‘real’ type system, including relational types
  • Stored code: statements, functions, etc
  • Session variables

The results of a reasonably detailed examination of this 155,000 lines of dense C code:

  • Hand-coded lexer
  • LALR parser
  • No real symbol table, just the schema, some fixed tables and the parse tree.
  • Builds an AST-like tree structure called Parse.
  • Progressively converts that into a VM opcode tree called Vdbe.
  • The VM tree is manipulated directly by the query planner based on stats (if available) and execution time estimates.
  • Relational operators (JOIN, etc) are implemented as nested loops in VM opcodes (there is no pipe).
  • A ‘prepare’ operation returns a prepared statement which is simply the VM tree.
  • An ‘execute’ operation binds variable values.
  • A ‘step’ operation runs the VM program until a row is available.

[If you want to see the VM opcodes and the join algorithms, just use “EXPLAIN query”.]

My problem is that the Parse structure is tightly bound to the SQL language, and the Vdbe level sits after the query planner. There is no obvious layer at which it’s possible to create VM programs independent of SQL but still taking advantage of JOIN and other algorithms.

This does not look like a good prospect. I shall keep looking.

Leave a Comment

Filed under Backend, Pondering

A Paraphrase of the Third Manifesto

The Third Manifesto is the highly authoritative result of over two decades of work by CJ Date and Hugh Darwen. It sets out their view on the future for database management systems, and on a language in particular. To a large extent Andl is based on that work.

But TTM (as it is known) is not an easy read. It is written in academic language for an academic audience, and not everyone will find that accessible. This TTM Paraphrase is my attempt to express the ideas of TTM in language more familiar and more accessible to the general IT reader. It is the document I use when I want to quickly remind myself of some key points and terminology. It’s my ‘cheat sheet’ for TTM.

So I’m publishing this draft in the hope that it will be useful, and to solicit feedback. It is not a substitute for, an improvement on or even a refinement of TTM. At best it may help some people to understand TTM. I know writing it has helped me.

The TTM paraphrase, initial draft release, with minor updates: TTM-para-d26.

Leave a Comment

Filed under TTM

Progress report

Andl has reached the six month mark. I started working on it in October 2014 and the first check in of some actual code was on the 27th. Now it’s April 2015, so where am I up to?

First, how big is it? Basic statistics: Andl is currently about 7500 lines of C# code, 1500 lines of Andl code and other bits and pieces. I’ve probably written twice that to get it to here.

What can it do? A lot. Most of the things that most people write in SQL can easily be handled by Andl code. In my opinion it’s a nice little language of a rather functional kind, and I’ve been surprised at how powerful the relational paradigm is, even when it’s the only collection type and the problem doesn’t look like data.

It implements large chunks of The Third Manifesto, but definitely not all.

  • The type system is reasonably complete, but there are no type constraints and no subtyping. There are components, selectors and getters, but not alternative ‘possreps’.
  • Support for the Relational Algebra and a host of related capabilities is complete for relations. RA support for tuples is indirect, requiring them to be wrapped in a relation first.
  • Relvars can be persisted, but there are no relvar constraints, no real concept of a database and no transactions.

The net effect is that it implements the TTM requirements for a D language, but excluding RM Prescriptions 14, 15, 16, 17, 23, 24 and 25, and OO Prescriptions 4 and 5. It’s a work in progress.

I have written two scripts of sample code to show off most of what Andl can do.

Leave a Comment

Filed under Backend, Rationale

Sample Code 2

This is a page of sample code, showing basic relational capabilities. It will be part of the download, when available.

// Andl samples -- relational
// Aim is to show an example of every feature
// Also useful as a quick smoke test

// ===== Relational types =====

// Tuples - not used much
{}
{name := 'Smith', age := 17}
{age := 17, name := 'Smith'}

// Relations - used heavily
{{:}}       // empty relation, no attributes, no tuples
{{:}{}}     // ditto, one tuple
{{}}        // same, with type derived by inference
{{},{},{},{},{},{},{},{}}        // exactly the same value (duplicates discarded)

// all the same -- order does not matter
{{name := 'Smith', age := 17}}
{{name:'',age:0}{'Smith', 17}}
{{name:text,age:number}{'Smith', 17}}

// all the same -- differences in order and syntax
{{name := 'Smith', age := 17},{name := 'Jones', age := 35},{age :=199,name:='Frankenstein' }}
{{name := 'Smith', age := 17},{age :=199,name:='Frankenstein' },{age := 35, name := 'Jones'}}
{{name:,age:0}{'Smith', 17}{'Jones', 35}{'Frankenstein',199 }}

// Built in functions
sequence(5)     // relation of N integers {{ N:number }}

r1 := {{name:,age:0}{'Smith', 17}{'Jones', 35}{'Frankenstein',199 }}
r1.schema       // relation of attributes
r1.count        // cardinality
r1.degree       // degree

//===== Basic operations =====

// Load some data from CSV files
S := source('csv:', 'S.csv')
P := source('csv:', 'P.csv')
SP := source('csv:', 'SP.csv')
S
S.schema
S.count
S.degree

//==== monadics =====
// all monadic operations are inside [], in a fixed sequence ?$%{}

// restriction - remove rows
S [ ?(CITY = 'Paris') ]
S [ ?(STATUS > 15 or CITY = 'London') ]

// --- rename - change column names
// rename all
S [ { F1 := S#, F2 := SNAME, F3 := STATUS, F4 := CITY }]
// rename some, the * means keep the rest unchanged
S [ { * F1 := SNAME }]

// --- projection - remove columns
// name all to be kept
S [ { S#, SNAME, CITY }]
// now * means keep all but the ones named
S [ { * STATUS }]

// --- extension - add new columns
// Here * means keep all, add new ones
S [ { * Initial := left(SNAME, 1) }]

// --- combine all three
S [ { CITY, F := STATUS, Initial := left(SNAME, 1) }]
S [ { * SNAME, Initial := left(SNAME, 1) }]

// --- aggregated projection - projection with totalling
S [ { CITY, 
    total := fold(+,STATUS), 
    average := fold(+,STATUS)/fold(+,1) 
} ]

// Note: fold() is only allowed in projection, but looks nicer in a function
sum(n:0) => fold(+,n)
ave(n:0) => fold(+,n)/fold(+,1)
S [ { CITY, total := sum(STATUS), average := ave(STATUS) } ]

// --- ordered extension - means extension with access to other rows
// ordered on CITY but no grouping, so all in one group
S [ $(CITY) { *  
    ord:=ord(),     // unique index based on input, not output
    ordg:=ordg(),   // ord value for first member of group
    lag:=lag(STATUS,1),     // previous value in group, or default
    lead:=lead(STATUS,1),   // next value in group, or default
    nth:=nth(STATUS,1),     // nth value in group, or default
} ]
// ordered and grouped on CITY
S [ $(%CITY) { *  
    ord:=ord(),
    ordg:=ordg(),
    lag:=lag(STATUS,1), 
    lead:=lead(STATUS,1), 
    nth:=nth(STATUS,1), 
} ]
// ordered and grouped on CITY descending, with subtotalling/running sum
S [ $(%-CITY) { *  
    ord:=ord(),
    ordg:=ordg(),
    lag:=lag(STATUS,1), 
    lead:=lead(STATUS,1), 
    nth:=nth(STATUS,1), 
    sum:=fold(+,STATUS),    // running sum within group
    ave:=fold(+,STATUS)/fold(+,1),
} ]

// Ordered used just for display sort
P[$(WEIGHT)]
P[$(COLOR,-WEIGHT)]     // descending

// --- lift anonymous value out of singleton relation
S [ { sum(STATUS) } ]
S [ { ave(STATUS) } ]

//--- nested relation
nr1 := {{ name := 'S', contents := S }}
nr1
nr2 := {
    { name := 'S1', contents := S [?( CITY = 'London')] },
    { name := 'S2', contents := S [?( CITY = 'Paris')] },
    { name := 'S2', contents := S [?( CITY = 'Athens')] } }
nr2
// retrieve one row as relation
nr2 [?(name='S1') { contents }]
// put the relation back together again using fold and union
nr2 [ { fold(union,contents) } ]

//==== dyadics =====

// prepare some subsets
S3 := S [?( S# = 'S3')]    // one single supplier S3
SX := S [?( S# <> 'S3')]   // all suppliers except S3 to make this work better
SY := S [?( S# <> 'S1')]   // all suppliers except S1

// set membership -- all true
S3 sub S        // subset
S sup SX        // superset
S3 sep SX       // separate

// joins

S join SP       // natural join preserves all columns for matching tuples
S compose SP    // projects onto non-common attributes
S semijoin SP   // projects onto left and common attributes
S divide SP     // projects onto left only attributes
S rsemijoin SP  // projects onto right and common attributes
S rdivide SP    // projects onto right only attributes

// antijoins

SX ajoin SP      // antijoin preserves all left attributes for non-matching
SX ajoinl SP     // projects onto left only attributes
SX rajoin SP     // reverse antijoin has right and common attributes
SX rajoinr SP    // projects onto right only attributes

// set operations

SX union SY      // combines all tuples
SX intersect SY  // keep common tuples
SX symdiff SY    // keep non-common tuples
SX minus SY      // keep left minus right
SX rminus SY     // keep right minus left

// all set operations project onto common attributes
SZ := {{ S#:='S99', STATUS:= 999, CITY:='Paris' }}
S union SZ
S minus SZ

// ===== Advanced Usage =====

// --- Nest: replace each tuple of S by one converted into a singleton relation
// {{*}} means 'the current tuple as a singleton relation'
ES1 := S [{ embed := {{*}} }]
ES1

// --- Unnest: using fold union and lift -- advanced usage!
ES1 [{ fold(union,embed) }]

// --- Image relation -- extend S with a partion of SP, removing common fields
// note that S5 gets a partition that is an empty relation
ES2 := S [{ * partition := ( {{*}} rdivide SP) }]
ES2

// Report total no of parts and qty (including S5 which supplies no parts)
ES2 [{ S#, parts:=partition.count, qtys:=partition[{sum(QTY)}] }]

// --- Transitive closure
// MM has tuples reprenting part/subpart assemblies
MM := source('csv:','MM.csv')
MM

// define a relational type
xyt := {{x:='',y:=''}}
// define a recursive function that takes a relation of that type as an argument
tranclo:xyt(xy:xyt) => do {
        ttt := xy[{*z := y}] compose xy[{*z := x}] union xy
        if(ttt = xy, ttt, tranclo(ttt))
    }
// call it with MM as argument, renaming attributes to match
tranclo(MM[ {x:=MAJOR_P#, y:= MINOR_P# } ]) [{MAJOR_P#:=x, MINOR_P#:=y }]

// ===== Updates =====

// Define the 3 updates

// Insert: argument is relation of same heading
up1 => S := union {{ S#:='S9', SNAME:='Moriarty', STATUS:=99, CITY:='Timbuktu' }}
// Delete: read as replace matching rows by nothing
up2 => S := [ ?(S#='S3') ]
// Update: make changes to matching rows
up3 => S := [ ?(S#='S4') { *STATUS:= -10 } ]
// Now perform each update in turn
S // original
up1
S // add S9
up2
S // delete S3
up3
S // update STATUS of S4

// Persistence
// any relvar starting with ^ is persisted

^S := S

// end

Leave a Comment

Filed under Code sample

Sample Code 1

This is a page of sample code, showing basic non-relational capabilities. It will be part of the download, when available.

// Andl samples -- scalar
// Aim is to show an example of every feature
// Also useful as a quick smoke test

//=== Tokens ===
// A free expression is evaluated immediately
'hello world!'
// Single and double quoted strings are concatenated immediately
"Hel""lo" ' World' "!"
// d-string is decimal Unicode char
"Hello" & d'32 87' & "orld"
// h-string is hex Unicode char
"Hello" & h'20 57' & "orld"
// t-string is a Time
t'2015/12/31'
// i-string is an identifier
i'x( )' := 'hello world!'
i'x(' h'20' ')'

//=== Expressions ===
// Logical: all true
true = not false
3 <> 4
'ABC' > 'abc'
t'2015/12/31' < t'01/01/2016'
'Z' >= 'A' and (3<=4 or (true xor false))

// Numeric: all 42
5-8*6/4+7**2
(58 div 8) * (86 mod 8)
(9 or 7) and (9 xor 3) xor 32

// String: mostly Hello World!
'Hello' & " " & 'World!'
trim('Hello ') & ' ' & trim(' World') & trim('   !   ')
fill(' ',10) & fill('Hello Planet???', 5) & fill(' World!', 35)
before(after("@@@>>>Hello World!<<<@@@", '>>>'), '<<<')
toupper('h') & tolower('ELLO') & toupper(' w') & tolower('ORLD!')
// String: other
'hello'.length = length('world')
"Interpolate date: " & t'2015/12/31' & " number:" & 12345 & " logical:" & (2=2)

// Date: 
d1 := dateymd(2015,1,31)
"Date: " & d1 & " Year:" & d1.year & " month:" & d1.month & " day:" & d1.day & " dow:" & d1.dow

// Special: if(), only evaluates one of its arguments
if(true,'Hello World!', "goodbye!")
if(2>2,1/0, 7*6)

//=== Statements ===
// Assignment -- evaluated once
v1 := 'Hello World!'
v1
// Assignment to out sends direct to output
out := v1
// Deferred assignment -- evaluated every time
v2 => v1
v2

// do block creates a local scope and returns value of last expression (statement returns void)
do {
  v2 := v1
  v2
}
// Deferred evaluation wiht do block
v3 => do {
  v2 := v1
  v2
}
v3
// Deferred evaluation with do block and arguments
v4(a) => do {
  v2 := a
  v2
}
// Arguments with literal types (default is text)
v5(a:'',b:0,c:false,d:d1) => do {
  a & b & c & d
}
v5(v1,42,true,dateymd(2015,1,1))
// Arguments with named types
v6(a:text,b:number,c:bool,d:date) => do {
  a & b & c & d
}
v6(v1,42,true,dateymd(2015,1,1))
// Recursion, function name must be typed
fact:0(n:0) => if(n<=1,1,n*fact(n-1))
fact(20)

// ===== Types =====

u1 := ut1 { n:=42, t:=v1, d:= d1 }
"n:" & u1.n & " t:" & u1.t & " d:" & u1.d
u2 := ut1 { n:=41, t:="!"&v1, d:= dateymd(d1.year+1,d1.month,d1.day) }
"n:" & u2.n & " t:" & u2.t & " d:" & u2.d
// Comparison left-to-right
u1 > u2
// Deferred function
f7(u:ut1) => do {
    "n:" & u.n & " t:" & u.t & " d:" & u.d
}
f7(u1)
f7(u2)

// done

Leave a Comment

Filed under Code sample