Category Archives: Relational Algebra

An Extended Relational Algebra

The following is a comprehensive set of algebraic operations that comprise an Extended Relational Algebra. They are atomic, in the sense that they each carry out precisely one operation and can be combined as needed. There is some redundancy, by design.

Each operation is a function with one or two relation values as arguments and returns one as its result. Each operation is generic, and has to be specialised for each use. It is specialised by:

  • The heading of each of its relation arguments
  • One or more attribute names, where the operation involves adding, removing or substituting attributes.
  • A function name and appropriate arguments, as attribute names or literal values, where the operation requires a computation.

Each operation returns a value with a heading determined entirely by the specialisation as set out above. The ERA depends on a library of pre-defined functions that are compatible with the types of the various attributes. The library and those types are specified separately. The ERA has type system, and no means to access tuple or attribute values.

The ERA also includes an assignment operation, which updates the value of a pseudo-variable, identified by a literal value.

Operator Specialisers Description
select<args,bfn>(r) Arguments to Boolean function Restricts set membership
remove<name>(r) An attribute to remove Removes an attributes
rename<name1,name2>(r) An attribute and its new name Renames one attribute
extend<args,fn,name>(r) Arguments to function, name for new value Appends one attribute
replace<args,fn,name>(r) Arguments to function, name of replaced value Replaces the value of one attribute
join(r1,r2) Auto for join Natural join
semijoin(r1,r2) Auto for join Natural semijoin
antijoin(r1,r2) Auto for join Natural antijoin
compose(r1,r2) Auto for join Join removing join attributes
union(r1,r2) None Set union
minus(r1,r2) None Set minus
intersect(r1,r2) None Set intersection
difference(r1,r2) None Set difference
aggregate<names,arg,afn>(r) Grouping attributes, argument to aggregating function Replace one attribute by grouped aggregation
while<rfn>(r) Relational function Fixed point recursion
var<lit> Literal name of pseudo-variable Reference to a stored value
assign<lit>(r)
insert<lit>(r) delete<lit,args,bfn>(r) replace<lit,args,fn,name>(r)
Literal name of pseudo-variable Arguments to Boolean function Arguments to function, name of replaced value Update a stored value

Leave a Comment

Filed under Language, Relational Algebra

Andl Adds Relational Algebra to Knime

Knime is a visual programming language for analysing and displaying data. It models the flow of data in tables as a graphical workflow, made up of nodes and edges. To learn more about Knime, go here: https://www.knime.com/.

AndlRaKnime provides a set of nodes that implement the Relational Algebra. Using these nodes allows you to perform SQL-like queries on a wide variety of non-SQL data sources, such as CSV files or data retrieved online. For more about the Relational Algebra see here: https://en.wikipedia.org/wiki/Relational_algebra.

The set of nodes is currently:

  • Selection (as a Boolean expression)
  • Projection
  • Join (and Semijoin, Antijoin)
  • Rename
  • Union (and Minus, Intersection, Difference)
  • New Value (as a JEXL expression).

For more about JEXL see here: http://commons.apache.org/proper/commons-jexl/.

Sample workflows are included to demonstrate each of these these capabilities. Here is an example.

Knime workflow

An early version of AndlRaKnime has been released for comment and feedback.

You can find it here: https://github.com/david-pfx/AndlRaKnime

Leave a Comment

Filed under Knime, Relational Algebra

Alice and the Relational Algebra

This is a summary largely based on the ‘Alice’ book: http://webdam.inria.fr/Alice/.

The non-redundant set of operations for a maximally effective Relational Algebra that is restricted to safe, domain independent, non-recursive, type safe queries is:

  • Select (restrict/where)
  • Project
  • Rename
  • Join (natural)
  • Union

This set is referred to as SPRJU. This set is incomplete but ‘safe’ because it excludes negation. It can be completed by adding:

  • Minus (set difference).

This set could be referred to as SPRJUM, and from this foundation all the other operations of the Relational Algebra can be created. That includes other set operations, and other kinds of join such as Antijoin or Not Matching.

To this the authors propose the addition of

  • While (recursion)

The complete set could be referred to as SPRJUMW. It is able to handle recursive joins, similar to the SQL Recursive Common Table Expression.

This is the endpoint for domain-independent queries. There is nothing to add (or take away). This is a complete Relational Algebra, capable of the full range of safe, domain-independent queries (including recursive queries).

However, this Algebra is unable to perform a range of familiar tasks based on computation, which means that it cannot return any attribute values that are not already in the database and it cannot perform computation-like activities such as counting. It is not Turing Complete, and cannot be made so.

Computation requires abandoning domain-independence. To compute, one needs a type system. The SELECT of SQL and the EXTEND of Tutorial D require computation and a type system. These issues are not directly addressed in this book or anywhere else that I know of.

Some useful extensions that lie outside the domain-independent Relational Algebra include:

  • Introducing an attribute type system and calculated attributes (EXTEND in Tutorial D, SELECT in SQL)
  • Introducing algorithmic or calculated relations (see Hall-Hitchcock-Todd or Appendix A)
  • An extended While (see book for details).

Appendix A shows that the addition of calculated relations can provide a substitute for Select/Restrict/Where. Operators that provide recursion can substitute for While. Ordering or relaxing type safety provide further choices. At some point there are just too many options available. However, aggregation is not adequately treated in any of the above. The Alice book has a brief section of little practical value.

It is my view that this is an adequate theoretical foundation for a Relational Algebra that is complete within the limits of safe, domain-independent, type safe queries. A foundation for the various methods of introducing computation is still lacking, especially with respect to aggregation and ordering.

Leave a Comment

Filed under Relational Algebra, TTM