Category Archives: Backend

Support for PostgreSQL released

Andl now supports PostgreSQL as a backend database.

The latest release of Andl now makes it possible to use PostgreSQL as a backend database, with Andl relations stored as native PostgreSQL tables and Andl queries translated into PostgreSQL SQL for execution.

All features of Andl work identically on all supported platforms, where implemented. Open functions and aggregations (evaluated from within a query) are fully supported, as are Andl native types and user-defined types. Currently the only features not implemented are ‘while’ queries (similar to SQL common table expression recursive) and ordered queries (similar to SQL window functions).

Andl scripts can be compiled and Andl functions stored in the Andl catalog for later execution (similar to SQL stored procedures). Andl is implemented as a PostgreSQL installed language, so it is possible to call Andl functions from SQL. The server-based access to those functions using Web API, REST or Thrift interfaces will be part of a later release.

The Github project is here. The release is here.

 

Leave a Comment

Filed under Backend, Release

Postgres meet Andl

Andl is coming to Postgres.

Postgres is “the world’s most advanced open source database”. It is a highly capable database server, and is ordinarily accessed using SQL queries. Postgres is also highly customisable, with the ability to add new functions, new data types and even new languages. There are currently several language implementations for Postgres, including PL/SQL (similar to SQL/PSM), Python and Perl. The aim of this project is to add Andl to that list, but as more than ‘just another language’.

The core purpose of Andl is to provide a relational language that is not SQL, and that can do things that SQL cannot.

  • Andl code can execute on any platform for which there is an implementation (in memory database, Sqlite or Postgres), providing identical results.
  • A compiled Andl system provides a set of useful functions stored in a catalog, which can be called directly from other languages using native data types, with no intervening mapper.
  • The Andl Thrift server allows calls from any language supported by Thrift (Java, C++, C#, Python, Perl, etc).
  • The Andl Web server allows calls using Web API or REST conventions and JSON data types.
  • Application programs do not require any Andl code, any data conversions or any relational mapper. They just call functions using their own standard data types. The main purpose of an application is to provide a user interface, leaving the rest to Andl.

Here is an overview of the steps required.

  1. Use CREATE FUNCTION to install a new language handler for plandl.
  2. The language handler plandl is a DLL written in C, which calls a C++ function, which in turn uses COM to start up the CLR runtime and load the Andl Postgres entry point. Similar capabilities can be provided for Mono.
  3. The Andl entry point connects to its catalog (a table in the Postgres database) and initialises the compiler and runtime. It also creates a compile function.
  4. The compile function is called passing in Andl source code. The code is compiled and executed, creating types, operators and relations in the catalog.
  5. Andl compiled code is executed using its own runtime engine. Relational expressions are converted into equivalent SQL and passed back to the Postgres SPI interface for execution.
  6. All Andl operators are registered as Postgres functions, and may be called recursively from with relational expressions.
  7. Applications call Andl operators by connecting to the Thrift or Web servers.

The Postgres implementation is still some way from being released, but progress has been encouraging so far. The first 5 steps are largely complete and operational.

2 Comments

Filed under Backend

Announcement: Andl on SQLite is here!

Andl can now use Sqlite as a backend database. In other words, the entire language, type system and relational operations now work on a native SQLite database.

The language supported is identical to the in-memory version already released. The type system, expression evaluation and relational algebra are identical. There are no (known or intended) incompatibilities or restrictions, other than as below. The performance is reasonable (considering the stage of development), but a couple of areas need some work.

All relation and attribute types are supported, including DEE and DUM (relations with no attributes), Relation Valued Attributes, Tuple Valued Attributes, User Defined Types and arbitrary aggregation using FOLD(). There is one area not yet implemented due to challenges in the platform/SQL, and that is Ordered Aggregation (grouping). It’s just plain hard!

The implementation now also contains a Catalog, which stores compiled functions, user-defined types, application variables of all types and links to database relvars. Any program can create its own new catalog, import an existing one, and/or save its changes. The database can be local, by persisting the in-memory relvars to a folder, or it can be SQLite.

For those interested, the strategy is a mix of SQL generation and a runtime Virtual Machine. The implementation strategy contains only a modest amount of Sqlite specific code. The exact same method should work with any of the main RDBMS. I would plan to tackle Postgres next (requires Java Interop), and then perhaps SQL Server.

A release is available now from the Downloads page.

2 Comments

Filed under Backend

What exactly is so bad about SQL?

That’s the question I’ve been asked: What exactly is so bad about SQL?. It’s not easy to produce a satisfying answer.

Most of the SQL that most people use is roughly SQL-1992, with perhaps a few bits from SQL_2003 like WITH. That’s a data sub-language, and while you can only write a small part of an application in this language, most of the critical logic has to be written in a general purpose object-oriented programming language. That’s where the ‘object relational impedance mismatch’ comes from.

Here is a list of the ‘Fatal Flaws’ of this kind of SQL, taken from the Askew Wall by Hugh Darwen.

  • Anonymous columns (partly addressed in 1992)
  • FROM clause restricted to named tables (fixed in 1992)
  • Duplicate column names
  • Order of columns is significant
  • Duplicate rows
  • NULL
  • Failure to support degenerate cases (e.g. columnless tables)
  • Failure to support “=“ properly
  • and lots more, and probably to come.

Chris Date and Hugh Darwen wrote a book called Database, Types and the Relational Model. It contains the rational for the Third Manifesto, and also an assessment of how well SQL stacks up against the Manifesto. The version of SQL targeted is SQL:2003 (with PL/PSM). The book does an extensive analysis of this version of SQL against TTM, and identifies sins of omission and commission. Obviously SQL inherits many legacy features and the consequences of decisions made years ago, however many of the gaps appear to have been filled in. Rather than focussing on the bad things one can do in SQL, my question comes down to these.

  1. If you set out to code in a TTM compliant style, how close can you get in SQL:2003?
  2. If a TTM language generated SQL:2003 as a backend target, how close could it get to TTM compliance?

Obviously in either case:

  1. NULL should never be used (and some extra code is needed to avoid it arising during evaluations, eg in aggregates)
  2. Some comparisons need to be coded carefully to avoid mistakes (eg trailing spaces in CHAR)
  3. All queries should have DISTINCT in effect to avoid duplicate rows
  4. Column attributes should be unique and in the same order in all tables (that have the same columns)
  5. Keys should be defined as compliant candidate keys.
  6. Tuple level operations should be avoided (FETCH, WHERE CURRENT)
  7. Much care should be taken to use certain syntactic features and not use others.

The only failures (from that list) that I can find are:

  1. There are severe limitations in the type system (possreps, components, selectors, constraints, tuple types, relation types).
  2. No assignment (or multiple assignment) for relation types; emulation via DELETE/INSERT runs into the multiple assignment problem.
  3. No relation-valued attributes (but might be emulated using MULTISET)
  4. No public relvars (can only share database tables)
  5. No nested transactions; some other issues around transactions.

I should point out here that this only applies directly to SQL:2003, but in broad terms it probably applies to later versions of the standard and implementations that claim substantial compliance with the standard (perhaps with different syntax).

So apart from the type system, the horrible syntax and the plethora of ways to shoot yourself in the foot, SQL:2003 is not all that bad! My answer to my two questions is:

  1. Quite close, apart from the seriously deficient type system.
  2. Very close indeed, if the compiler can provide its own type system on top of that provided by SQL, its own syntax, and then generate bog standard SQL (or some dialect to suit a particular engine).

You can write good code in SQL:2003 or later, but it’s not easy. A better language sitting on top of SQL might make it a whole lot easier. Is that where Andl fits?

Leave a Comment

Filed under Backend, Rationale

Could SQLite be the engine for Andl?

I have been looking into using SQLite as the ‘backend’ for Andl. It has:

  • Most of ‘standard SQL’
    • CREATE, ALTER, DROP TABLE
    • TRANSACTIONS
    • SELECT, INSERT, UPDATE, DELETE
    • INDEXES, VIEWS, TRIGGERS
  • Minimal types: null, integer, real, text, blob
  • Parameters for prepared statements
  • Minimal functions, including aggregates
    • Minimal (open) expressions
    • User-defined functions in C

It does not have:

  • A ‘real’ type system, including relational types
  • Stored code: statements, functions, etc
  • Session variables

The results of a reasonably detailed examination of this 155,000 lines of dense C code:

  • Hand-coded lexer
  • LALR parser
  • No real symbol table, just the schema, some fixed tables and the parse tree.
  • Builds an AST-like tree structure called Parse.
  • Progressively converts that into a VM opcode tree called Vdbe.
  • The VM tree is manipulated directly by the query planner based on stats (if available) and execution time estimates.
  • Relational operators (JOIN, etc) are implemented as nested loops in VM opcodes (there is no pipe).
  • A ‘prepare’ operation returns a prepared statement which is simply the VM tree.
  • An ‘execute’ operation binds variable values.
  • A ‘step’ operation runs the VM program until a row is available.

[If you want to see the VM opcodes and the join algorithms, just use “EXPLAIN query”.]

My problem is that the Parse structure is tightly bound to the SQL language, and the Vdbe level sits after the query planner. There is no obvious layer at which it’s possible to create VM programs independent of SQL but still taking advantage of JOIN and other algorithms.

This does not look like a good prospect. I shall keep looking.

Leave a Comment

Filed under Backend, Pondering

Progress report

Andl has reached the six month mark. I started working on it in October 2014 and the first check in of some actual code was on the 27th. Now it’s April 2015, so where am I up to?

First, how big is it? Basic statistics: Andl is currently about 7500 lines of C# code, 1500 lines of Andl code and other bits and pieces. I’ve probably written twice that to get it to here.

What can it do? A lot. Most of the things that most people write in SQL can easily be handled by Andl code. In my opinion it’s a nice little language of a rather functional kind, and I’ve been surprised at how powerful the relational paradigm is, even when it’s the only collection type and the problem doesn’t look like data.

It implements large chunks of The Third Manifesto, but definitely not all.

  • The type system is reasonably complete, but there are no type constraints and no subtyping. There are components, selectors and getters, but not alternative ‘possreps’.
  • Support for the Relational Algebra and a host of related capabilities is complete for relations. RA support for tuples is indirect, requiring them to be wrapped in a relation first.
  • Relvars can be persisted, but there are no relvar constraints, no real concept of a database and no transactions.

The net effect is that it implements the TTM requirements for a D language, but excluding RM Prescriptions 14, 15, 16, 17, 23, 24 and 25, and OO Prescriptions 4 and 5. It’s a work in progress.

I have written two scripts of sample code to show off most of what Andl can do.

Leave a Comment

Filed under Backend, Rationale