Category Archives: Relational Algebra

Formal Definitions for an Extended Relational Algebra

1.    Introduction

These are the formal definitions of an Extended Relational Algebra in a style similar to Algebra A in Appendix A of DTATRM by D&D. The operators of this ERA are used to construct shorthands, intended to form the basis of a relational programming language. It should be noted that Appendix A takes the reduction path from language shorthand to algebra operation. Here the path is taken in the opposite direction, defining algebraic operations from which the shorthands of a language may be constructed.

These operators do not provide or require any particular type system. Any use of the algebra is dependent on an external provider of a type system and functions over those types. Shorthands such as GROUP and UNGROUP, WRAP and UNWRAP require that relations and tuples be provided by that type system.

The naming conventions are adapted from Appendix A. In what follows:

A is an attribute name

T is a type

v is a value

<A,T> is an attribute (member of a heading)

<A,T,v> is an attribute value (member of a tuple)

r is a relation

Hr is the heading of r

Br is the body of r

tr is a tuple of r

2.    Basics from Appendix A

These are the basic 5 definitions adapted from Appendix A.

a)      NOT

Given a relation r then s = NOT(r) is defined as follows.

Hs = Hr

Bs = { ts : ∃ tr ∉ Br, ts = tr

The NOT operator yields the complement s of a given relation r. The heading of s is the heading of r. The body of s contains every tuple with that heading that is not in the body of r.

The NOT operator is used in shorthands that involve negation, such as ANTIJOIN, MINUS.

b)      REMOVE

Given a relation r and a type T such that <A,T> ∈ Hr then s = REMOVE(A) is defined as follows.

Hs = Hr minus { <A,T> }

Bs = { ts : ∃ tr ∉ Br, ∃ v ∈ T, <A,T,v> ∈ tr, ts = tr minus { <A,T,v> }

The REMOVE operator yields a relation s formed by removing a given attribute A from a given relation r. The operation is equivalent to taking the projection of r over all of its attributes except A. The heading of s is the heading of r minus the ordered pair <A,T>. The body of s contains every tuple that conforms to the heading of s and is a subset of some tuple of r.

The REMOVE operator is used in shorthands that remove one or more attributes, such as PROJECT.

c)      RENAME

Given a relation r, a type T such that <A,T> ∈ Hr and no type T such that <B,T> ∈ Hr then s = RENAME(A,B) is defined as follows.

Hs = ( Hr minus { <A,T> } ) union { <B,T> }

Bs = { ts : ∃ tr ∈ Br, ∃ v ∈ T, <A,T,v> ∈ tr,

   ts = ( tr minus { <A,T,v> } ) union { <B,T,v> } }

The RENAME operator yields a relation s that differs from a given relation r only in the name of one of its attributes, which is changed from A to B. The heading of s is the heading of r except that the ordered pair <A,T> is replaced by the ordered pair <B,T>. The body of s consists of every tuple of the body of r, except that in each such tuple the triple <A,T,v> is replaced by the triple <B,T,v>.

The RENAME operator is used in shorthands that rename one or more attributes, such as RENAME.

d)      AND

Given relations r1 and r2 such that <A,T1> ∈ Hr1 and <A,T2> ∈ Hr2 implies T1 = T2 then s = AND(r1,r2) is defined as follows.

Hs = Hr1 union Hr2

Bs = { ts : ∃ tr1 ∈ Br1, ∃ tr2 ∈ Br2, ts = tr1 union tr2 }

The AND operator is relational conjunction, which is usually known as a natural join. The heading of s is the union of the headings of r1 and r2. The body of s contains every tuple that conforms to the heading of s and is a superset of both some tuple in the body of r1 and some tuple in the body of r2.

The AND operator is used in join shorthands such as JOIN, SEMIJOIN, ANTIJOIN, COMPOSE, TIMES.

e)      OR

Given relations r1 and r2 such that <A,T1> ∈ Hr1 and <A,T2> ∈ Hr2 implies T1 = T2 then s = OR(r1,r2) is defined as follows.

Hs = Hr1 union Hr2

Bs = { ts : ∃ tr1, ∃ tr2, ( tr1 ∈ Br1 or tr2 ∈ Br2), ts = tr1 union tr2 }

The OR operator is relational disjunction, being a generalization of what is usually known as union. In the special case where the given relations r1 and r2 have the same heading, the result s is in fact the union of those two relations in the traditional sense. The heading of s is the union of the headings of r1 and r2. The body of s contains every tuple that conforms to the heading of s and is a superset of either some tuple in the body of r1 or some tuple in the body of r2.

The OR operator is used in set shorthands such as UNION, MINUS, SYMDIFF, INSERT.

3.    Relational Constant

This is the formalisation of a relational constant or ‘relcon’, and relies on some previously defined function f. Separate formalisations are provided for monadic and dyadic functions, in both predicate and value-returning forms. Extending them to n-adic functions is left as an exercise.

a)      Monadic predicate

Given attribute names X, types Tx and function f with the type signature Tx->Boolean then s = RELCON(X,f) is defined as follows.

Hs = {<X,Tx>}

Bs = { ts : ∃ vx ∈ Tx, f(vx) = true, ts = {<X,Tx,vx>} }

b)      Monadic value

Given attribute names X,Y, types Tx,Ty and function f with the type signature Tx->Ty then s = RELCON(X,Y,f) is defined as follows.

Hs = {<X,Tx>, <Y,Ty>}

Bs = { ts : ∃ vx ∈ Tx, ∃ vy ∈ Ty, f(vx) = vy,

   ts = {<X,Tx,vx>, <Y,Ty,vy>} }

c)      Dyadic predicate

Given attribute names X,Y, types Tx,Ty and function f with the type signature Tx->Ty->Boolean then s = RELCON(X,Y,f) is defined as follows.

Hs = {<X,Tx>, <Y,Ty>}

Bs = { ts : ∃ vx ∈ Tx, ∃ vy ∈ Ty, f(vx,vy) = true),

   ts = {<X,Tx,vx>, <Y,Ty,vy} }

d)      Dyadic value

Given attribute names X,Y,Z, types Tx,Ty,Tz and function f with the type signature Tx->Ty->Tz then s = RELCON(X,Y,Z,f) is defined as follows.

Hs = {<X,Tx>, <Y,Ty>, <Z,Tz>}

Bs = { ts : ∃ vx ∈ Tx, ∃ vy ∈ Ty, ∃ vz ∈ Tz, f(vx,vy) = vz),

   ts = {<X,Tx,vx>, <Y,Ty,vy, <Z,Tz,vz>} }

[Note: for the relcon PLUS in App-A, types Tx, Ty and Tz are all INTEGER, and the function f is the scalar operator “+”.]

The RELCON operator predicate forms are used together with AND in shorthands such as WHERE/RESTRICT.

The RELCON operator value-returning forms are used together with AND in shorthands such as EXTEND, SUMMARIZE, UPDATE and DELETE.

4.    Aggregation

This is the formalisation of simple aggregation, similar to SUM, MAX or MIN but allowing for a wider range of functions. It relies on some previously defined aggregating function f. An aggregating function has an argument which is a list of values (a ‘bag’ or ‘multiset’ rather than a ‘set’) and returns a single value. Extending this approach to more complex aggregations is left as an exercise.

Given a relation r, a type T such that <A,T> ∈ Hr and an aggregating function f with the type signature T[]->Ta then s = AGGREGATE(r,A,f) is defined as follows.

Hs = ( Hr minus {<A,T>} ) union {<A,Ta>}

Bs = { ts : ∃ v ∈ T, ∃ tr ∈ r, ∃ <A,T,v> ∈ tr, v ∈ v[],

   ts = tr minus {<A,T,v>} union {<A,Ta,f(v[])} }

Note: the term v[] should be read as: a list of the values of v for some value of ts.

The AGGREGATION operator is used in shorthands such as SUMMARIZE.

5.    Transitive Closure

As transitive closure cannot be defined directly in set-builder notation, this formalisation defines it indirectly by means of a recurrence relation. This consists of a starting value (given) and a sequence of successors (defined by set-builder). The transitive closure is the fix-point union of that sequence.

The starting value (‘seed’) represents known edges in a directed graph; the end value is all the possible paths through the graph.

a)      Simple Transitive Closure

Given a relation r with the heading {<A,T>,<B,T>} for some type T then the successor relation r’ is defined as follows.

Hr’ = Hr

Br’ = { tr’ : tr’ ∈ Br or

   ∃ v1 ∈ T, ∃ v2 ∈ T, ∃ v3 ∈ T, ∃ v4 ∈ T,

   ∃ tr1 ∈ Br, tr1 = {<A,T,v1>, <B,T,v2>},

   ∃ tr2 ∈ Br, tr2 = {<A,T,v2>, <B,T,v3>},

   tr’ = {<A,T,v1>, <B,T,v3>} }

Then the transitive closure s = TCLOSE(r) is defined as follows.

S = r1 U r2 U r3 … r¥

This is a linear recurrence, which can be shown to reach a fix-point termination.

The TCLOSE operator is used in a shorthand of the same name.

b)      Extended Transitive Closure

In this case each tuple is associated with a value, and this definition relies on some previously defined dyadic function f that takes values of that type as its argument and return value. The value computed represents in some sense the cost of that path.

Given a relation r with heading {<A,T>,<B,T>,<C,Tv>} for some types T and Tv, and a dyadic function f with the type signature Tv->Tv->Tv then the successor relation r’ is defined as follows.

Hr’ = Hr
Br’ = { tr’ : tr’ ∈ Br or
 ( ∃ v1 ∈ T, ∃ v2 ∈ T, ∃ v3 ∈ T, ∃ v4 ∈ T,
   ∃ w1 ∈ Tv, ∃ w2 ∈ Tv,
   ∃ tr1 ∈ Br, tr1 = {<A,T,v1>, <B,T,v2>, <C,Tv,w1>},
   ∃ tr2 ∈ Br, tr2 = {<A,T,v2>, <B,T,v3>, <C,Tv,w2>},
   tr’ = {<A,T,v1>, <B,T,v3>, <C,Tv,f(w1,w2)} ) }

Then the extended transitive closure s = ETCLOSE(r,f) is defined as follows.

s = r1 U r2 U r3 … r¥

This is a linear recurrence, which can be shown to reach a fix-point termination.

The ETCLOSE operator is used in combination with AGGREGATE in a shorthand to perform Generalised Transitive Closure as defined by D&D (RM VSS 5).

Leave a Comment

Filed under Language, Relational Algebra

An Extended Relational Algebra

The following is a comprehensive set of algebraic operations that comprise an Extended Relational Algebra. They are atomic, in the sense that they each carry out precisely one operation and can be combined as needed. There is some redundancy, by design.

Each operation is a function with one or two relation values as arguments and returns one as its result. Each operation is generic, and has to be specialised for each use. It is specialised by:

  • The heading of each of its relation arguments
  • One or more attribute names, where the operation involves adding, removing or substituting attributes.
  • A function name and appropriate arguments, as attribute names or literal values, where the operation requires a computation.

Each operation returns a value with a heading determined entirely by the specialisation as set out above. The ERA depends on a library of pre-defined functions that are compatible with the types of the various attributes. The library and those types are specified separately. The ERA has type system, and no means to access tuple or attribute values.

The ERA also includes an assignment operation, which updates the value of a pseudo-variable, identified by a literal value.

Operator Specialisers Description
select<args,bfn>(r) Arguments to Boolean function Restricts set membership
remove<name>(r) An attribute to remove Removes an attributes
rename<name1,name2>(r) An attribute and its new name Renames one attribute
extend<args,fn,name>(r) Arguments to function, name for new value Appends one attribute
replace<args,fn,name>(r) Arguments to function, name of replaced value Replaces the value of one attribute
join(r1,r2) Auto for join Natural join
semijoin(r1,r2) Auto for join Natural semijoin
antijoin(r1,r2) Auto for join Natural antijoin
compose(r1,r2) Auto for join Join removing join attributes
union(r1,r2) None Set union
minus(r1,r2) None Set minus
intersect(r1,r2) None Set intersection
difference(r1,r2) None Set difference
aggregate<names,arg,afn>(r) Grouping attributes, argument to aggregating function Replace one attribute by grouped aggregation
while<rfn>(r) Relational function Fixed point recursion
var<lit> Literal name of pseudo-variable Reference to a stored value
assign<lit>(r)
insert<lit>(r) delete<lit,args,bfn>(r) replace<lit,args,fn,name>(r)
Literal name of pseudo-variable Arguments to Boolean function Arguments to function, name of replaced value Update a stored value

Leave a Comment

Filed under Language, Relational Algebra

Andl Adds Relational Algebra to Knime

Knime is a visual programming language for analysing and displaying data. It models the flow of data in tables as a graphical workflow, made up of nodes and edges. To learn more about Knime, go here: https://www.knime.com/.

AndlRaKnime provides a set of nodes that implement the Relational Algebra. Using these nodes allows you to perform SQL-like queries on a wide variety of non-SQL data sources, such as CSV files or data retrieved online. For more about the Relational Algebra see here: https://en.wikipedia.org/wiki/Relational_algebra.

The set of nodes is currently:

  • Selection (as a Boolean expression)
  • Projection
  • Join (and Semijoin, Antijoin)
  • Rename
  • Union (and Minus, Intersection, Difference)
  • New Value (as a JEXL expression).

For more about JEXL see here: http://commons.apache.org/proper/commons-jexl/.

Sample workflows are included to demonstrate each of these these capabilities. Here is an example.

Knime workflow

An early version of AndlRaKnime has been released for comment and feedback.

You can find it here: https://github.com/david-pfx/AndlRaKnime

Leave a Comment

Filed under Knime, Relational Algebra

Alice and the Relational Algebra

This is a summary largely based on the ‘Alice’ book: http://webdam.inria.fr/Alice/.

The non-redundant set of operations for a maximally effective Relational Algebra that is restricted to safe, domain independent, non-recursive, type safe queries is:

  • Select (restrict/where)
  • Project
  • Rename
  • Join (natural)
  • Union

This set is referred to as SPRJU. This set is incomplete but ‘safe’ because it excludes negation. It can be completed by adding:

  • Minus (set difference).

This set could be referred to as SPRJUM, and from this foundation all the other operations of the Relational Algebra can be created. That includes other set operations, and other kinds of join such as Antijoin or Not Matching.

To this the authors propose the addition of

  • While (recursion)

The complete set could be referred to as SPRJUMW. It is able to handle recursive joins, similar to the SQL Recursive Common Table Expression.

This is the endpoint for domain-independent queries. There is nothing to add (or take away). This is a complete Relational Algebra, capable of the full range of safe, domain-independent queries (including recursive queries).

However, this Algebra is unable to perform a range of familiar tasks based on computation, which means that it cannot return any attribute values that are not already in the database and it cannot perform computation-like activities such as counting. It is not Turing Complete, and cannot be made so.

Computation requires abandoning domain-independence. To compute, one needs a type system. The SELECT of SQL and the EXTEND of Tutorial D require computation and a type system. These issues are not directly addressed in this book or anywhere else that I know of.

Some useful extensions that lie outside the domain-independent Relational Algebra include:

  • Introducing an attribute type system and calculated attributes (EXTEND in Tutorial D, SELECT in SQL)
  • Introducing algorithmic or calculated relations (see Hall-Hitchcock-Todd or Appendix A)
  • An extended While (see book for details).

Appendix A shows that the addition of calculated relations can provide a substitute for Select/Restrict/Where. Operators that provide recursion can substitute for While. Ordering or relaxing type safety provide further choices. At some point there are just too many options available. However, aggregation is not adequately treated in any of the above. The Alice book has a brief section of little practical value.

It is my view that this is an adequate theoretical foundation for a Relational Algebra that is complete within the limits of safe, domain-independent, type safe queries. A foundation for the various methods of introducing computation is still lacking, especially with respect to aggregation and ordering.

Leave a Comment

Filed under Relational Algebra, TTM