An Extended Relational Algebra

The following is a comprehensive set of algebraic operations that comprise an Extended Relational Algebra. They are atomic, in the sense that they each carry out precisely one operation and can be combined as needed. There is some redundancy, by design.

Each operation is a function with one or two relation values as arguments and returns one as its result. Each operation is generic, and has to be specialised for each use. It is specialised by:

  • The heading of each of its relation arguments
  • One or more attribute names, where the operation involves adding, removing or substituting attributes.
  • A function name and appropriate arguments, as attribute names or literal values, where the operation requires a computation.

Each operation returns a value with a heading determined entirely by the specialisation as set out above. The ERA depends on a library of pre-defined functions that are compatible with the types of the various attributes. The library and those types are specified separately. The ERA has type system, and no means to access tuple or attribute values.

The ERA also includes an assignment operation, which updates the value of a pseudo-variable, identified by a literal value.

Operator Specialisers Description
select<args,bfn>(r) Arguments to Boolean function Restricts set membership
remove<name>(r) An attribute to remove Removes an attributes
rename<name1,name2>(r) An attribute and its new name Renames one attribute
extend<args,fn,name>(r) Arguments to function, name for new value Appends one attribute
replace<args,fn,name>(r) Arguments to function, name of replaced value Replaces the value of one attribute
join(r1,r2) Auto for join Natural join
semijoin(r1,r2) Auto for join Natural semijoin
antijoin(r1,r2) Auto for join Natural antijoin
compose(r1,r2) Auto for join Join removing join attributes
union(r1,r2) None Set union
minus(r1,r2) None Set minus
intersect(r1,r2) None Set intersection
difference(r1,r2) None Set difference
aggregate<names,arg,afn>(r) Grouping attributes, argument to aggregating function Replace one attribute by grouped aggregation
while<rfn>(r) Relational function Fixed point recursion
var<lit> Literal name of pseudo-variable Reference to a stored value
assign<lit>(r)
insert<lit>(r) delete<lit,args,bfn>(r) replace<lit,args,fn,name>(r)
Literal name of pseudo-variable Arguments to Boolean function Arguments to function, name of replaced value Update a stored value

Leave a Comment

Filed under Language, Relational Algebra

Andl Adds Relational Algebra to Knime

Knime is a visual programming language for analysing and displaying data. It models the flow of data in tables as a graphical workflow, made up of nodes and edges. To learn more about Knime, go here: https://www.knime.com/.

AndlRaKnime provides a set of nodes that implement the Relational Algebra. Using these nodes allows you to perform SQL-like queries on a wide variety of non-SQL data sources, such as CSV files or data retrieved online. For more about the Relational Algebra see here: https://en.wikipedia.org/wiki/Relational_algebra.

The set of nodes is currently:

  • Selection (as a Boolean expression)
  • Projection
  • Join (and Semijoin, Antijoin)
  • Rename
  • Union (and Minus, Intersection, Difference)
  • New Value (as a JEXL expression).

For more about JEXL see here: http://commons.apache.org/proper/commons-jexl/.

Sample workflows are included to demonstrate each of these these capabilities. Here is an example.

Knime workflow

An early version of AndlRaKnime has been released for comment and feedback.

You can find it here: https://github.com/david-pfx/AndlRaKnime

Leave a Comment

Filed under Knime, Relational Algebra

A Paraphrase of The Third Manifesto

The Third Manifesto by C. J. Date and Hugh Darwen is the culmination of more than two decades of work on defining a foundation for the future of managing data. The authors view SQL and the relational database management systems that use it as deeply flawed, and have set down their views on a language to address those flaws. The title reflects two prior attempts (manifestos) by other authors.

The intention here is to paraphrase, generalise and shorten the wording and to use terminology familiar to a general IT audience, while retaining the original intent, meaning and numbering. This has involved some reorganisation and some inclusion of background material from other writings to ensure that the all the terms used can be understood within one document.

Read more

2 Comments

Filed under Rationale, TTM

First release of Andl.NET

Andl.NET is a pure implementation of the Relational Algebra for manipulating
relational data in any .NET language. While Andl itself is a full programming
language, Andl.NET relies on another language such as C# and simply provides
the higher order queries.

Andl.NET can perform relational queries at or beyond the capabilities
of any SQL dialect. It can do all the ordinary things like select, where
and join but it can also do generative queries, self-joins, complex
aggregations, subtotals and running totals (a bit like SQL recursive
common table expressions and windowing).

Andl.NET has its own in-memory database so it can provide a complete
application backend for any kind of user interface on any platform. It can
easily be used to program a data model as a set of tables just like SQL, but
including all the access routines, without the need for an Object Relational
Mapper.

Andl.NET can retrieve data from Csv, Txt, Sql, Odbc and Oledb but does not
provide any persistence mechanism. (Left as an exercise for the caller!)

The core feature of Andl.NET is an implementation of the generic interface
IRelatable<T>. This provides a series of extension methods similar in flavour
to Linq’s IEnumerable<T> and IQueryable<T>, but directly implementing the core
features of the Relational Algebra.

The main differences from SQL are:
* all joins are natural (by attribute name)
* relations (tables) have no duplicate tuples (rows)
* data values cannot be null.

The main differences from Linq are:
* provides sources, stores and updates as well as queries
* many additional relational operations

Sample programs are included to demonstrate these capabilities. Familiarity
with Linq will help in reading them.

A future release of Andl.NET will generate SQL so that queries can be
executed on a relational database backend.

The release is here.

The licence is Licence.

Leave a Comment

Filed under Andl.Net, Release

Translating SQL into Andl.NET

I have just completed translating a series of SQL exercises into Andl.NET. It really is rather straightforward, surprisingly so.

Here is the source code. It should be self-explanatory. The original is here: https://web.njit.edu/~hassadi/Dbase_Courses/CIS631/Ex_03.html

      //----------------------------------------------------------------------
      // Q1. Get suppliers names who supply part 'P2'.
      // SQL>select sname from s, sp where s.s#=sp.s# and sp.p#='P2'; 
      // SQL>select distinct s.sname from s where s.s# IN (select sp.s# from sp where sp.p# ='P2'); 

      Show("Q1. Get suppliers names who supply part 'P2'",
        SP.Where(sp => sp.Pno == "P2")
          .Join(S, (sp, s) => new { s.Sname })
          .AsEnumerable()
          .Select(s => $"{s.Sname}")
        );

      Show("Q1. Get suppliers names who supply part 'P2'",
        S .Where(s => SP
            .Where(sp => sp.Pno == "P2")
            .Select(sp => new { sp.Sno })
            .Contains(new { Sno = s.Sno }))
          .AsEnumerable()
          .Select(s => $"{s.Sname}")
        );

Continue reading

Leave a Comment

Filed under Andl.Net, Code sample

Andl.Net – making progress

Working surprisingly well. As purely a query language, it has:

  • Equals, subset, distinct
  • Where
  • Select (combines Rename, Project, Extend)
  • Set ops: union, minus, intersect, symdiff
  • Join and Antijoin
  • Order by (for export purposes)
  • Skip Take

The code looks like this:

      class TT1 : NdlTuple<TT1> {
      public string A2;
      public int A1;
      public decimal A3;
      public static TT1[] R1() {
        return new TT1[] {
          new TT1 { A1 = 42, A2 = "hello", A3 = 1.2m },
          new TT1 { A1 = 42, A2 = "world", A3 = 1.2m },
        };
      }
      public static NdlRelation<TT1> NDR_1() {
        return new NdlRelation<TT1>(R1());
      }
    }

    var v1 = TT1.NDR_1();
    var v2 = v1.Where(t => t.A1 == 42);
    var v3 = v2.Select(t => new TT2 { A1 = t.A1, A2 = "constant", A3 = t.A3 });

I’m making heavy use of NUnit and finding plenty of bugs. No updates yet; no data import; no aggregation; no while/recursion; no ordered queries. But it all seems surprisingly possible, even without a lot of help from the compiler. And much easier than writing compilers.

Leave a Comment

Filed under Andl.Net

Andl.Net – the type system

To recap: this is about attempting to reproduce the essential capabiities of Andl (based on TTM/D) as a set of extensions to a standard OO language. I am using C#, but my comments should apply equally to Java, C++, Rust, Dlang, Go, etc.

In my opinion the TTM/D type system can be substantially reproduced as follows.

  1. Scalar types: the requirements are satisfied by any type that is immutable and provides value semantics. In practice this means a class in which:
  • All members are (recursively) scalar, tuple or relation types
  • Members can only be set to a value at construction time
  • Operator ‘=’ implements equality of value
  • The hash value depends only on the value (therefore is constant)
  1. Tuple types: the requirements are satisfied by a type that is immutable and has value semantics, and also:
  • All members are scalar types or (recursively) tuple types
  • Operator ‘=’ implements equality of value by member name (only if same heading)
  • The hash value depends only on the value (therefore is constant)
  • A value can be created by member-wise copy of another tuple type (only if same heading)
  1. Relation types: the requirements are satisfied by a type that:
  • Can provide an enumerator over tuple types for others to use
  • All operations on relations are then implemented by enumerating streams of tuples (pipelining).
  1. Relvar: an object of relational type with the additional feature that:
  • It can accept an enumeration over tuple types and apply it to a persistence store.

Tuple types are the core of the system. In the absence of compiler support, every tuple type consumed or emitted by any relational operator has to be individually declared, but the common functionality comes from inheriting a base class and related interfaces.

Relation types are interesting. A lengthy sequence of operations of the Relational Algebra is just a set of chained functions calls and enumerators. It does not in fact cause any processing of data until a caller either (a) retrieves data to be sent elsewhere or (b) commits a transaction. A relvar is a relation type from which an enumerator can be obtained and which accepts updates. It is not a collection, just an adapter to someone else’s collection.

I have a crude but working implementation (and I have learned a lot about C# generics). Still hard to tell whether it leads anywhere, but I should be able to hook it up to the Andl database backend and then we’ll see. An Sql implementation can then generate Sql and enumerate the result set instead of the raw data (Andl does that).

 

Leave a Comment

Filed under Andl.Net

Andl.Net — is it possible?

This the first of a series of thoughts about implementing a language capability similar to Andl (which is in turn based on The Third Manifestoon C# for use by programs written in C# (as against Andl which has entirely its own type system). The intention is not to focus on C# specifics, but issues relevant to any strongly-typed language, with perhaps an OO focus.

The Relational Algebra can be seen as dealing with streams of tuples, where

  • A tuple value is assumed to be a native type (plain old object/record/class/struct) already existing in the language
  • The input stream source will create the values; two streams of the same tuple type might have different native types.
  • The output stream is also a native type, not necessarily the same as any input. Client software needs familiar data sources.
  • The native type for those values cannot be relied upon to provide value semantics, or anything else required by the implementation.

Consider the implementation of a simple algorithm such as Union:

  • There are three types (left input, right input, output) which should be the same tuple type but may be different native types
  • Removing duplicates can be done with a hash set, which in turn requires a hashing function that depends only on tuple type (not native type).

Possibilities:

  1. If native types are used in implementation algorithms, they need to be augmented post hoc with additional operations (equality, hashing, cloning). That’s hard.
  2. If the implementation uses its own augmented native types then every input tuple has to be copied, but outputs can be consumed directly by clients. Dynamically creating native types is hard too.
  3. The implementation could convert all tuple data into dicts (hashes) or arrays of values; but this requires copy/conversion on both input and output. This is probably the easiest (but not where I started out).

In C# much of the implementation requires reflection. The Andl implementation is similar to (3).

Leave a Comment

Filed under Andl.Net, Pondering

Lambdas and functions as values

Andl now has ‘functions as values’, and lambda literals.

A lambda is the literal form of a value of type ‘function’. It corresponds to a defined function but can be treated as a first class value: it can be

  • stored in a variable, component member of a user-defined type or attribute of a tuple
  • passed as a parameter
  • executed by the postfix operator of parentheses enclosing an argument list.

The syntactic definition is similar to a function definition omitting the name.

// function literal
def funlit(def(a:'') => a & a)
funlit(‘ab’)
// function variable
var funval := def def(a:'') => a & a
funval(‘ab’)
// direct call on function value
(def def(a:'') => a & a)(‘ab’)

In this example, funlit, funval and the literal lambda can be called interchangeably. The value can be stored in a variable, user type component or tuple attribute and can be passed as a parameter. There is no type declaration as such, just literals of the type. Treating the function call on values as a postfix operator produces an odd asymmetry:

funval(‘ab’) // legal
funlit(‘ab’) // legal
(funval)(‘ab’) // legal
(funlit)(‘ab’) // illegal! [Built in and user-defined operators are still second class.]

Please note that these are individual function values, not methods. They have access to local and global scope, but there is no object or instance scope.

The latest release provides the working system and samples. Download from github.

Leave a Comment

Filed under Language, Release

Support for PostgreSQL released

Andl now supports PostgreSQL as a backend database.

The latest release of Andl now makes it possible to use PostgreSQL as a backend database, with Andl relations stored as native PostgreSQL tables and Andl queries translated into PostgreSQL SQL for execution.

All features of Andl work identically on all supported platforms, where implemented. Open functions and aggregations (evaluated from within a query) are fully supported, as are Andl native types and user-defined types. Currently the only features not implemented are ‘while’ queries (similar to SQL common table expression recursive) and ordered queries (similar to SQL window functions).

Andl scripts can be compiled and Andl functions stored in the Andl catalog for later execution (similar to SQL stored procedures). Andl is implemented as a PostgreSQL installed language, so it is possible to call Andl functions from SQL. The server-based access to those functions using Web API, REST or Thrift interfaces will be part of a later release.

The Github project is here. The release is here.

 

Leave a Comment

Filed under Backend, Release