Data Flows

A central concept in HTSQL is data flow, a sequence of homogeneous values.

HTSQL is a data flow transformation language. Every HTSQL expression operates on flows; that is, it accepts an input flow and transforms it into an output flow.

(diagram: input flow -> expression -> output flow)

The initial input flow consists of a single empty record. HTSQL processor interprets the given HTSQL query as a sequence (or rather a directed graph) of flow operations, which it applies one by one. The resulting flow is then displayed to the user in a tabular form.

Different operations affect the flow in various ways: multiply it, or remove elements from it, apply a scalar function to each element, etc. In the next sections, we discuss different types of flow operations.

Scalar Expressions

A simplest example of a flow operation is an application of some expression to each value in a flow. The output flow consists of a results of the expression.

That kind of expression does not change the number of elements in the flow; we call such expressions scalar.

In this example, a scalar expression (3+4)*6 is applied to the initial flow; the value of this expression forms the resulting flow.

(diagram: [] -> (3+4)*6 -> 42)

code count(department)
art 1
bus 3
edu 2
eng 4

In this example, two scalar expressions code and count(department) are applied to the flow consisting of school records. For each school entity, they extract the value of the attribute code and the number of associated departments.

(diagram: [art], [bus], [edu], [eng], ...
-> {‘art’,2}, {‘bus’,3}, {‘edu’,2}, {‘eng’,4}, ...)

A scalar expression is an example of a singular expression; one which does not increase the number of elements in the flow, as opposed to a plural expression, which may produce more output elements than in the input flow.

Filtering

A sieve expression filters the input flow leaving only those elements which satisfy the given condition.

A sieve expression takes one argument: a scalar logical expression called the filter. It applies the filter to each element of the input flow. The output flow consists of those elements of the input flow for which the filter is evaluated to TRUE value.

code name campus
eng School of Engineering north
la School of Arts and Humanities old
mus School of Music & Dance south
ns School of Natural Sciences old

(diagram)

In this example, the sieve expression evaluates a filter condition count(department)>3 for each record from the school class; those records for which this condition is valid generate the output of the query.

A sort expression reorders elements in the flow according to a given argument.

code name campus
edu College of Education old
ph Public Honorariums
art School of Art & Design old
la School of Arts and Humanities old

(diagram)

In this example, the school records are ordered in the ascending order with respect to the value of name attribute.

A truncation operation makes a slice of the input flow.

code name campus
art School of Art & Design old
bus School of Business south
edu College of Education old

(diagram)

In this case, we take the top 3 records from the school class.

Aggregates

An aggregate function converts a plural expression into a scalar.

The argument of an aggregate function must be a plural expression. Then for each element of the input flow, the aggregate evaluates the respective sub-flow and applies a set function to the result to generate a scalar value.

(diagram)

In this example, count() aggregates produces the number of elements in the flow generated by expression school.

code max(course.credits)
acc 6
arthis 6
astro 5
be 8

(diagram)

In this example, max(course.credits) starts with evaluating the flow department.course.credits. Then for each department record of the input flow, max() finds the maximum value in the respective sub-flow.

Projection

A projection expression takes a scalar argument called the kernel. The output flow of projection consists of all unique values of the kernel as it runs over the input flow.

campus
north
old
south

(diagram)

The output of this query consists of all distinct values of school.campus attribute.