The Third Manifesto by C. J. Date and Hugh Darwen is the culmination of more than two decades of work on defining a foundation for the future of managing data. The authors view SQL and the relational database management systems that use it as deeply flawed, and have set down their views on a language to address those flaws. The title reflects two prior attempts (manifestos) by other authors.
The intention here is to paraphrase, generalise and shorten the wording and to use terminology familiar to a general IT audience, while retaining the original intent, meaning and numbering. This has involved some reorganisation and some inclusion of background material from other writings to ensure that the all the terms used can be understood within one document.
Filed under Rationale, TTM
The intended role of Andl is to be the implementation language for the data model of an application. That needs some explanation.
Andl does what SQL does, but it is not SQL. It is possible to code the business model of an application in SQL dialects such as PL/PSM (SQL Standard), PL/SQL (Oracle) or Transact-SQL (Microsoft) but SQL has many problems (see here) and few people make this choice. Andl aims to provide a language free of these problems.
Andl has been designed to be used as follows. First, an application developer takes a set of business requirements that comprise a data model, a set of operations on that model, and a set of requirements for users to interact with the model. The data model is implemented in the form of tables in a relational database management system, and the user interface and related business requirements is implemented using some chosen front end technology: web, desktop or mobile as needed. Andl is used to implement the operations on the data model, and the means by which the user interface queries and updates the data in the model.
The effect is that the ‘object relational mismatch’ disappears, and there is no longer any need for an ‘object relational mapper’. No objects are needed between the data model and the user interface, just data as individual items, rows or tables.
But for the heart of the business application, for operations on the data and for implementing the rules, there is Andl.
That’s the question I’ve been asked: What exactly is so bad about SQL?. It’s not easy to produce a satisfying answer.
Most of the SQL that most people use is roughly SQL-1992, with perhaps a few bits from SQL_2003 like WITH. That’s a data sub-language, and while you can only write a small part of an application in this language, most of the critical logic has to be written in a general purpose object-oriented programming language. That’s where the ‘object relational impedance mismatch’ comes from.
Here is a list of the ‘Fatal Flaws’ of this kind of SQL, taken from the Askew Wall by Hugh Darwen.
- Anonymous columns (partly addressed in 1992)
- FROM clause restricted to named tables (fixed in 1992)
- Duplicate column names
- Order of columns is significant
- Duplicate rows
- Failure to support degenerate cases (e.g. columnless tables)
- Failure to support “=“ properly
- and lots more, and probably to come.
Chris Date and Hugh Darwen wrote a book called Database, Types and the Relational Model. It contains the rational for the Third Manifesto, and also an assessment of how well SQL stacks up against the Manifesto. The version of SQL targeted is SQL:2003 (with PL/PSM). The book does an extensive analysis of this version of SQL against TTM, and identifies sins of omission and commission. Obviously SQL inherits many legacy features and the consequences of decisions made years ago, however many of the gaps appear to have been filled in. Rather than focussing on the bad things one can do in SQL, my question comes down to these.
- If you set out to code in a TTM compliant style, how close can you get in SQL:2003?
- If a TTM language generated SQL:2003 as a backend target, how close could it get to TTM compliance?
Obviously in either case:
- NULL should never be used (and some extra code is needed to avoid it arising during evaluations, eg in aggregates)
- Some comparisons need to be coded carefully to avoid mistakes (eg trailing spaces in CHAR)
- All queries should have DISTINCT in effect to avoid duplicate rows
- Column attributes should be unique and in the same order in all tables (that have the same columns)
- Keys should be defined as compliant candidate keys.
- Tuple level operations should be avoided (FETCH, WHERE CURRENT)
- Much care should be taken to use certain syntactic features and not use others.
The only failures (from that list) that I can find are:
- There are severe limitations in the type system (possreps, components, selectors, constraints, tuple types, relation types).
- No assignment (or multiple assignment) for relation types; emulation via DELETE/INSERT runs into the multiple assignment problem.
- No relation-valued attributes (but might be emulated using MULTISET)
- No public relvars (can only share database tables)
- No nested transactions; some other issues around transactions.
I should point out here that this only applies directly to SQL:2003, but in broad terms it probably applies to later versions of the standard and implementations that claim substantial compliance with the standard (perhaps with different syntax).
So apart from the type system, the horrible syntax and the plethora of ways to shoot yourself in the foot, SQL:2003 is not all that bad! My answer to my two questions is:
- Quite close, apart from the seriously deficient type system.
- Very close indeed, if the compiler can provide its own type system on top of that provided by SQL, its own syntax, and then generate bog standard SQL (or some dialect to suit a particular engine).
You can write good code in SQL:2003 or later, but it’s not easy. A better language sitting on top of SQL might make it a whole lot easier. Is that where Andl fits?
Andl has reached the six month mark. I started working on it in October 2014 and the first check in of some actual code was on the 27th. Now it’s April 2015, so where am I up to?
First, how big is it? Basic statistics: Andl is currently about 7500 lines of C# code, 1500 lines of Andl code and other bits and pieces. I’ve probably written twice that to get it to here.
What can it do? A lot. Most of the things that most people write in SQL can easily be handled by Andl code. In my opinion it’s a nice little language of a rather functional kind, and I’ve been surprised at how powerful the relational paradigm is, even when it’s the only collection type and the problem doesn’t look like data.
It implements large chunks of The Third Manifesto, but definitely not all.
- The type system is reasonably complete, but there are no type constraints and no subtyping. There are components, selectors and getters, but not alternative ‘possreps’.
- Support for the Relational Algebra and a host of related capabilities is complete for relations. RA support for tuples is indirect, requiring them to be wrapped in a relation first.
- Relvars can be persisted, but there are no relvar constraints, no real concept of a database and no transactions.
The net effect is that it implements the TTM requirements for a D language, but excluding RM Prescriptions 14, 15, 16, 17, 23, 24 and 25, and OO Prescriptions 4 and 5. It’s a work in progress.
I have written two scripts of sample code to show off most of what Andl can do.
These are questions to be resolved because my vision for Andl depends on them. The questions are: how to build on the work of TTM and TD to better address the ability to do:
- Complex aggregation (including things like
- statistical variance
- running sum
- aggregations across ‘outer joins’.
- Deep ‘self joins’, that is relations that form graphs and trees (and including aggregation on them, such as pricing a bill of materials).
- Deep nesting, that is relations that are not in first normal form, with RVAs (relation valued attributes) that in turn include RVAs to form trees and graphs in a single relation.
- Relational (joinable) functions.
- Relational Updates (not so SQL-like).
- Quantifiable restriction (equivalents for SQL TOP/LIMIT; list parts obtainable from at least 2 suppliers)
I think the ground work has been laid already, and all I need to do is to draw it together.
A really good paper. Definitely worth a read. See here: http://shaffner.us/cs/papers/tarpit.pdf. Ben Moseley and Peter Marks.
Complexity is the single major difficulty in the successful development of large-scale software systems. Following Brooks we distinguish accidental from essential difficulty, but disagree with his premise that most complexity remaining in contemporary systems is essential. We identify common causes of complexity and discuss general approaches which can be taken to eliminate them where they are accidental in nature. To make things more concrete we then give an outline for a potential complexity-minimizing approach based on functional programming and Codd’s relational model of data.
There is nothing in the paper that should have surprised a relational audience. Indeed they might have sat back and quietly nodded or applauded at the appropriate places. Mind you, it’s pretty basic stuff and lacks higher order operations.
There was quite a bit to puzzle or even offend a TTM advocate, in the comments about a suitable language. The type system, the structure and even the scope and purpose of the language would be hard to reconcile with D. Some of these are mentioned on p63.
I saw a Feeder as a way to obtain data from a non-relational source, which will necessarily result in the execution of an INSERT/UPDATE/DELETE operation. An Observer would be a way for external logic to execute as a consequence of a change in relational state (the issue of trigger vs polling is unimportant). The result could be as simple as updating a screen display or something more complex like sending an email or synchronising with another system. These are very MVC-like concepts.
My main disappointment with the paper is that the (hoped for) final section is missing or severely truncated. The expose of problems is excellent as far as it goes, but the fragments of concrete solution presented are unsatisfying. This paper got some attention here: http://lambda-the-ultimate.org/node/1446. Moseley released some source code, but does not seem to have worked on this much since about 2006. See: https://groups.google.com/forum/?fromgroups#!topic/frp-discuss/BNmBgtqRUFY.
In a nutshell, he captures what I would like to do, but doesn’t help all that much with solving the problem of how to do it.
Much of TTM and related writings deals with what’s wrong with SQL and how it should be done better. SQL is an essential part of the communications between applications and databases, in conjunction with an ORM of some kind. Which kind of points to a similar role for the language D. At least that’s the way it has seemed to me.
My question is: would it be better to think of D not so much as a replacement for SQL as the language in which to code an application data model?
Using a slightly modified version of my 4 layers, and just thinking about a modest web app:
- UI access: coded in HTML, CSS and JS.
- Glue code: written in GP language: Java, C#, ruby, etc
- Data model: coded in GP language.
- DBMS access: coded in GP language and SQL.
By data model here I mean the totality of the state of the business data that models the application, both transient and persistent. The idea would be to code layer 3 entirely in a suitable D. It would draw together data from a variety of sources, and is free to use SQL and a DBMS for persisting or retrieving data.
And that leaves me pondering two questions.
- What specific D features are required to fully implement a data model? Scoped constructs like functions or modules are vital, I think.
- What should the API between application and data model look like? POCOs rather than relvars I think.
All of a sudden this sounds like a bigger project.
Filed under Rationale, TTM
I’d like to ask what TTM really needs of a type system. My opinion:
- Is a named set of values and operations on those values.
- The set of values is fixed, but the set of operations may be extensible.
- An attribute of a tuple belongs to a type, and every value it can take on is a value of that type. (No NULLs unless NULL happens to be a value of that type).
- Values are associated with an underlying implementation-defined bit pattern, and every value has a unique bit pattern. Not every bit pattern is a value.
A ‘type system’:
- Is an interoperable set of types, of sufficient scope for its intended purpose.
- A minimalistic abstract type system consists of Number, Character, Binary, Tuple and Relation. All of these are ‘infinite’.
- A practical concrete type system includes members consistent or compatible with common programming language and SQL types, such as Boolean, Integer, Float, Decimal, Time, etc. All of these are ‘finite’.
- An open type system is extensible by some means to add members that are valuable in a particular problem domain. Otherwise it is closed.
- Provides conversions from one type to another, which may be ‘implicit’ (preserves all values) or ‘explicit’ (some values lost or altered).
Note that all tuples are of Tuple type and all relations of Relation type. The header is not part of the type, so it must be part of the value.
- Is a composition of literal values, variable values, attribute values and operations, which can be evaluated to return a single value (closed), multiple values (open) or a (possibly ordered) collection of values (array).
- Provides support for the chosen Relational Algebra, in particular for creating, extending and filtering tuples and relations (NEW, EXTEND, WHERE).
- Provides support for an external programming language to evaluate expressions and retrieve values and arrays of values.
That’s a summary of where I’m up to. I don’t see a need for type inheritance, given implicit conversions. I don’t see a need for building the extension capabilities into a new language. The above looks complete, consistent and implementable.
I’m happy with the Rel approach using Java to extend the type system. In that case the type system comprises the built in types, type definitions in Tutorial D code and the jar file that extends the type system.