This the first of a series of thoughts about implementing a language capability similar to Andl (which is in turn based on The Third Manifestoon C# for use by programs written in C# (as against Andl which has entirely its own type system). The intention is not to focus on C# specifics, but issues relevant to any strongly-typed language, with perhaps an OO focus.
The Relational Algebra can be seen as dealing with streams of tuples, where
- A tuple value is assumed to be a native type (plain old object/record/class/struct) already existing in the language
- The input stream source will create the values; two streams of the same tuple type might have different native types.
- The output stream is also a native type, not necessarily the same as any input. Client software needs familiar data sources.
- The native type for those values cannot be relied upon to provide value semantics, or anything else required by the implementation.
Consider the implementation of a simple algorithm such as Union:
- There are three types (left input, right input, output) which should be the same tuple type but may be different native types
- Removing duplicates can be done with a hash set, which in turn requires a hashing function that depends only on tuple type (not native type).
- If native types are used in implementation algorithms, they need to be augmented post hoc with additional operations (equality, hashing, cloning). That’s hard.
- If the implementation uses its own augmented native types then every input tuple has to be copied, but outputs can be consumed directly by clients. Dynamically creating native types is hard too.
- The implementation could convert all tuple data into dicts (hashes) or arrays of values; but this requires copy/conversion on both input and output. This is probably the easiest (but not where I started out).
In C# much of the implementation requires reflection. The Andl implementation is similar to (3).
As it stands, The Third Manifesto satisfies an academic need. It sets out in great detail a proposed foundation for database programming, along with a language and type system, with the clear intention of replacing and surpassing SQL.
But the question is: what purpose does it serve in the wider IT community? Who has a problem that it will solve? Who needs products built to TTM guidelines?
I think the ‘smoking gun’ of current business application development is the ‘object relational impedance mismatch’. The plethora of ‘object relational mappers’ or ORMs out there provide evidence for that view. The story is something like this.
- The ‘object relational impedance mismatch’ is a real impediment to productive application development. The ORM layer consumes large amounts of programmer time and energy to develop and maintain, it’s a major source of bugs and a major source of performance issues. The ORM is a solution to a problem that should not exist. It’s time for it to go.
- Attempts to resolve this on the database side (by creating ‘object databases’) have been a dismal failure, at least for general purpose application development. The ‘NoSQL’ databases have done better, at least for a certain range of applications, but there is no sign that either is capable of taking over as the next generation.
- Relational theory is a proven foundation for building robust databases over a wide range of size and function. Relational databases are here for the long haul.
- Therefore the ORM problem must be resolved on the language side (by creating ‘relational languages’). The essential business logic of any application has to be expressed in a language that has native access to relational data, without the need to translate the data into objects and back again.
- The first attempt at such a language was SQL, and it has failed to deliver.
- SQL at the 1992 level is widely used as a data sublanguage in conjunction with an ORM. It cannot be used to develop business logic.
- SQL since 1996 has included SQL/PSM, which can be used to develop business logic (often referred to as stored procedures). However actual implementations of SQL/PSM very widely in compliance and capabilities; there are multiple partial and incompatible implementations, and in many environments there is none (eg Sqlite on Android).
- SQL is a very old language (since the 1970s) and has many technical deficiencies including an awkward syntax, difficulties in fully expressing the Relational Algebra and limited capabilities for extension and defining types.
- SQL does not provide or is incompatible with modern language development systems: IDE, debugger, version control, build systems, etc.
- So the need is for a modern language that
- Can be used instead of, as well as or alongside SQL
- Can be used to code the essential business logic of applications
- Has native access to relational data and data types
- Has a sophisticated extensible type system
- Plays well with other languages and technologies
- Has no ‘object relational impedance mismatch’ (by design)
- Is competitive with other modern languages in its use of and compatibility with modern language development systems
- Is available as the same language on all possible platforms.
That’s what The Third Manifesto describes and that’s what Andl aspires to be.
Filed under Pondering, TTM
I have been looking into using SQLite as the ‘backend’ for Andl. It has:
- Most of ‘standard SQL’
- CREATE, ALTER, DROP TABLE
- SELECT, INSERT, UPDATE, DELETE
- INDEXES, VIEWS, TRIGGERS
- Minimal types: null, integer, real, text, blob
- Parameters for prepared statements
- Minimal functions, including aggregates
- Minimal (open) expressions
- User-defined functions in C
It does not have:
- A ‘real’ type system, including relational types
- Stored code: statements, functions, etc
- Session variables
The results of a reasonably detailed examination of this 155,000 lines of dense C code:
- Hand-coded lexer
- LALR parser
- No real symbol table, just the schema, some fixed tables and the parse tree.
- Builds an AST-like tree structure called Parse.
- Progressively converts that into a VM opcode tree called Vdbe.
- The VM tree is manipulated directly by the query planner based on stats (if available) and execution time estimates.
- Relational operators (JOIN, etc) are implemented as nested loops in VM opcodes (there is no pipe).
- A ‘prepare’ operation returns a prepared statement which is simply the VM tree.
- An ‘execute’ operation binds variable values.
- A ‘step’ operation runs the VM program until a row is available.
[If you want to see the VM opcodes and the join algorithms, just use “EXPLAIN query”.]
My problem is that the Parse structure is tightly bound to the SQL language, and the Vdbe level sits after the query planner. There is no obvious layer at which it’s possible to create VM programs independent of SQL but still taking advantage of JOIN and other algorithms.
This does not look like a good prospect. I shall keep looking.
I start by observing that scalar values in D/TD have a type, that every value can belong to only one type, so that the type of any given value or of a scalar variable can be determined by inspection. No type declaration is required.
Then I note that tuple values have a name and a value for each attribute, and again the type of each value and thus the type of the tuple can be determined by inspecting its values. Again no type declaration is required.
The type of any relation value can likewise be determined by inspecting the values of the attributes of any of its tuples. With one exception. It is not possible to determine the type of a relation that has attributes but is empty (contains no tuples) by inspecting its values, because there are none.
This leads to an interesting special case for a compiler that is trying to perform type inference. But apart from that, I am left with the question: how does an empty set have a type? If I consider the set of odd numbers divisible by two, and the set of circles with corners, I see that both are empty sets. But are they the same empty set, or do they preserve a ‘type’ which keeps them somehow distinct?
What does an empty relation signify? In what sense is it meaningful to have a relation that claims to have attributes of some type(s) but asserts no facts?
If we had some ham we could have ham and eggs, if we had some eggs. Or so my father used to say.