Introducing Projections

Published by cce on 2011-02-18

This past week we committed provisional code for projections, a virtual table which consists of distinct values from a particular expression. This is comparable to SQL’s DISTINCT / GROUP BY.

Background

In HTSQL, each query typically starts with a driving table such that rows in the output correspond to records from the driving table. For example, /department returns one row per record in the department table. Links from this driving table are either singular or plural, where a plural link requires an aggregate function. In the example below, we have two links: a singular link from department to school, and a plural link from department to course.

Projections

The projection feature creates virtual table where each record in the table corresponds to distinct values for a particular set of expressions. For instance, the following query returns distinct degree values from the program table.

degree
ba
bs
ct
ma
ms
pb
ph

This virtual table has one plural link which leads back to seed table; this link is represented by the caret ^. For example, the following query shows the number of programs for each distinct degree.

degree count(^)
ba 10
bs 10
ct 4
ma 7
ms 6
pb 1
ph 1

Here the asterisk * returns all columns from the projected table, which is degree. The caret ^ indicates a link to the underlying seed table, in this case program. Hence, count(^) returns the number of programs corresponding to each degree.

For projections, we introduce several terms. The set of distinct expressions is called the kernel . The original table is called the seed. The plural link from the projected table back to the seed table is called the complement.

Additional Examples

In the following example, we return schools and for each school, the count of distinct degrees offered by that school.

This query returns distinct values of (year of birth, gender) combinations. For each combination, the number of students and the average number of enrollments.

This last query returns distinct degrees, but only for programs in the school of art. For each distinct degree, the count of those programs and count of associated students.

Open Questions

The use of ^ and * may be less than intuitive. Currently you can not refer to the kernel nor the complement by name. But, it seems natural that an expression like (program^degree) should have two fields, program which refers to the complement and degree which refers to the kernel. Finally, it may be helpful to have a functional form for the projection operator, called distinct().

With these changes, /(program^degree){*,count(^)} could be written:

/distinct(program{degree}) {degree, count(program)}

We are also considering a short-cut syntax:

/program{^degree, count(^)}
blog comments powered by Disqus