HTSQL Grammar

A valid input of an HTSQL processor is called an HTSQL query.

In a regular mode of operation, an HTSQL processor is started as a web service and accepts queries as HTTP GET requests. However, an HTSQL query can also be executed using a command-line utility htsql-ctl or via internal Python API.

Encoding

An HTSQL query is a string of characters in UTF-8 encoding. Octets composing the string could be written literally or percent-encoded. A percent-encoded octet is serialized as three characters: % followed by two hexdecimal digits encoding the octet value.

‘HTSQL’ ‘HTSQL’ ‘HTSQL’
HTSQL HTSQL HTSQL

Percent-encoding is useful for transmitting an HTSQL query via channels that forbid certain characters in literal form. The list of characters that should be encoded depends on the channel type, but the percent (%) character itself must always be percent-encoded.

‘%25’
%

A NUL character cannot appear in an HTSQL query, neither in literal nor in percent-encoded form.

The HTSQL processor decodes percent-encoded octets before parsing the query. As a consequence, a percent-encoded punctuation or operator character still plays its syntax role.

Lexical Structure

An HTSQL query is parsed into a sequence of tokens. The following tokens are recognized.

Name

A sequence of alphanumeric characters that does not start with a digit.

code name campus
art School of Art & Design old
bus School of Business south
edu College of Education old

Number

A numeric literal: integer, decimal and exponential notations are recognized.

60 2.125 271828e-5
60 2.125 2.71828

String

A string literal enclosed in single quotes; any single quote character should be doubled.

‘HTSQL’ ‘O’‘Reilly’
HTSQL O’Reilly

Symbol

A valid symbol in the HTSQL grammar; that includes operators and punctuation characters. Some symbols are represented by more than one character (e.g. <=, !~).

Individual tokens may be separated by whitespace characters.

See htsql.tr.scan.Scanner for detailed description of HTSQL tokens.

Syntax Structure

A sequence of HTSQL tokens must obey the HTSQL grammar.

An HTSQL query starts is a segment expression optionally followed by a sequence of commands.

The following table lists HTSQL operations in the order of precedence, lowest to highest.

Operation Description Example Input Output
/ T segment /school  
x :fn infix function call 'HTSQL':length 5
x :fn y 1/3 :round 2 0.33
x :fn (y,z,...) 'HTSQL':slice(1,-1) 'TSQ'
x +, x - sorting direction program{degree+}  
T ? p sieve program?degree='ms'  
T ^ x projection program^degree  
T {x,y,...} selection school{code,name}  
p | q logical OR true|false true
p & q logical AND true&false false
! p logical NOT !true false
x = y, x != y, comparison 2+2=4 true
x == y, x !== y 'HTSQL'==null false
x ~ y, x !~ y 'HTSQL'~'SQL' true
x < y, x <= y, 12<7 false
x > y, x >= y 12>=7 true
x + y, x - y addition, subtraction 'HT'+'SQL' 'HTSQL'
x * y, x / y multiplication, division 12*7 84
- x negation -42  
x -> T attachment 'south' -> school{campus}  
T := x assignment num_prog := count(program)  
S . T composition school.program  
T [id] locator course[comp.304]  
@ T detachment @school  
{x,y,...} record {'bs','ms'}  
(...) grouping (7+4)*2 22
* wildcard selection school.*  
* number school.*1  
^ projection complement count(^)  
$ name reference $code  
fn (...) function call round(1/3,2) 0.33
name   school  
number   60, 2.125, 271828e-5  
string   'HTSQL'  

A command starts with / followed by a function application in infix notation.

code,name,campus
art,School of Art & Design,old
bus,School of Business,south
…

Next we describe individual syntax elements.

Atomic Expressions

An atomic expression is a basic syntax unit. HTSQL recognizes the following atoms.

Identifier

An identifier is a sequence of characters which contains Latin letters, underscores (_), decimal digits and those Unicode characters that are classified as alphanumeric. An identifier must not start with a digit.

In HTSQL, identifiers are case-insensitive.

Identifiers are used to refer to database entities such as tables and attributes, to define calculated attributes, and to call functions.

name,count(department)
School of Art & Design,1
School of Business,3
…

In this example, four identifiers school, name, count and department represent respectively a table, a table attribute, a built-in function and a table link.

Literal

HTSQL supports two types of literal values: quoted and unquoted.

An unquoted (or numeric) literal is a number written in integer, decimal or exponential notation.

60 2.125 271828e-5
60 2.125 2.71828

The range of allowed numeric values depends on the database backend. The type of a numeric literal is determined from notation: literals written in integer, decimal and exponential notation are assigned to integer, decimal and float data type respectively.

A quoted literal is a (possibly empty) sequence of arbitrary characters enclosed in single quotes. Any single quote in the value must be doubled.

‘HTSQL’ ‘O’‘Reilly’
HTSQL O’Reilly

The data type of a quoted literal is inferred from the context in which the literal is used; the default data type is string.

Wildcard

A wildcard selection (*) selects all output columns of the table.

code name campus code name school_code
bus School of Business south acc Accounting bus
la School of Arts and Humanities old arthis Art History la
ns School of Natural Sciences old astro Astronomy ns

When followed by an integer literal N, a wildcard selects N-th output column of the table. N starts from 1 and should not exceed the number of output columns.

name count(department)
School of Engineering 4
School of Arts and Humanities 6
School of Music & Dance 4

Complement

A projection complement (^) represents a complement link from a projection to the projected flow.

Do not confuse a projection complement with a binary projection operator, which is also represented with the ^ character.

degree count(^)
ba 10
bs 10
ct 4

In this example, the first and the second occurrences of ^ indicate a projection operator and a projection complement respectively.

Grouping

Any expression enclosed in parentheses ((...)) is treated syntactically as a single atom. Use grouping to override the default operator precedence.

Do not confuse a grouping operation with a function call, which also uses parentheses.

Record

A comma-separated list of expressions enclosed in curly brackets ({...}) is called a record expression. Some functions and operators accept records as a way to specify multiple values.

code name campus
eng School of Engineering north
ns School of Natural Sciences old

Reference

A reference is an identifier preceded by a dollar sign ($). A reference is used to access a value defined in a different naming scope.

department_code no title credits description
acc 315 Financial Accounting 5 Integration of the conceptual and computational aspects of asset, liability and stockholders equity accounting.
acc 511 Audit 5 This course considers legal, procedural and ethical aspects of audit in accounting.
acc 620 Accounting Internship 6 Supervised internship in a business or nonprofit organization on an accounting position.

In this example, a reference $avg_credits is defined in the root scope, but accessed in the scope of course.

For a more detailed description of references, see the section on naming scopes.

Function Calls

HTSQL has a large library of built-in functions and can be extended with user-defined functions.

A function call is represented as a function name followed by (, a comma-separated list of arguments, and ).

A function may accept no arguments, but the parentheses are still required.

2013-01-31

For functions with at least one argument, HTSQL supports an alternative infix call notation. In this notation, the expression starts with the first argument followed by : and a function name, and then the rest of the arguments. The trailing arguments must be enclosed in parentheses if their number is greater than one.

today() :year 1/3 :round 2 ‘HTSQL’ :slice(1,-1)
2013 0.33 TSQ

This example could be equivalently expressed as

year(today()) round(1/3,2) slice(‘HTSQL’,1,-1)
2013 0.33 TSQ

Infix function calls are composable and have the lowest precedence among the operators.

‘h’+’t’+’t’+’p’ :replace(‘tp’,’sql’) :upper
HTSQL

A function which argument is a segment expression is called a command.

code,name,campus
art,School of Art & Design,old
bus,School of Business,south
…

To use infix call notation with a command, prepend the : indicator with /.

code,name,campus
art,School of Art & Design,old
bus,School of Business,south
…

Consider the difference between regular infix function call and a command application.

code,name,campus
art,School of Art & Design,old

This example could be equivalently expressed as

code,name,campus
art,School of Art & Design,old

For a list and description of built-in functions, see Functions and Operators.

Operators

An HTSQL operator is denoted by a special character or a sequence of characters (e.g. +, <=). HTSQL has infix, prefix and postfix operators, and some operators admit all three forms.

The current version of HTSQL does not support user-defined operators; future versions may add this ability.

In HTSQL, the order in which operators are applied is determined by operator precedence. For example, multiplication and division operators have a higher precedence than addition and subtraction.

Some HTSQL operators are composable (e.g. arithmetic operators) and some are not (e.g. equality operators). We call the former associative and the latter non-associative.

Below we describe the syntax of HTSQL operators. For a more comprehensive description, see Functions and Operators.

Logical Operators

HTSQL supports the following logical operators:

logical OR
p | q
logical AND
p & q
logical NOT
! p

In this list, the operators are sorted by the order of precedence, from lowest to highest. All logical operators are left-associative.

true|false true&false !false
true false true

Comparison Operators

HTSQL supports the following comparison operators:

equality operators
x = y, x != y, x == y, x !== y
containing operators
x ~ y, x !~ y
ordering operators
x < y, x <= y, x > y, x >= y
2+2=4 ‘HTSQL’~’SQL’ 12>7&7>=2
true true true

All comparison operators have the same precedence and are not associative.

Future versions of HTSQL may make ordering operators left-associative to express between operation (e.g. a <= x <= b).

Arithmetic Operators

HTSQL supports the usual set of arithmetic operators:

addition
x + y
subtraction
x - y
multiplication
x * y
division
x / y
negation
- x

Arithmetic operators have standard precedence and associativity.

Flow Operators

HTSQL supports specialized operators to work with flow expressions:

sieve
T ? p
projection
T ^ x
selection
T {x,y,...}

The sieve operator (T ? p) produces rows of T satisfying condition p.

code name campus
bus School of Business south
mus School of Music & Dance south

The projection operator (T ^ x) produces a flow of unique values of x as it ranges over T. Do not confuse the projection operator with a projection complement.

degree
ba
bs
ct

The selection operator (T {x,y,...}) specifies output columns.

code name
art School of Art & Design
bus School of Business
edu College of Education

Sieve, projection and selection operators have the same precedence and are left-associative.

campus avg(school.count(department))
north 4.0
old 5.0
south 3.5

Composition

HTSQL supports a flow composition operator:

T . x

The composition operator evaluates x for each row of the flow T. The values of x form the resulting flow.

code name school_code
acc Accounting bus
econ Economics bus
mm Management & Marketing bus

The composition operator is left-associative.

Locator

The locator operation selects an individual entity by its identity:

T [id]

The identity is a sequence of attributes and links which uniquely identify an entity. Values of separate attributes are separated by a period (.). An individual value could be left unquoted when it consists of digits, alphabetical characters or dash (-). Values could be grouped using pairs of brackets ([]) or parentheses (()).

code name school_code
comp Computer Science eng
department_code no title credits description
comp 515 Software Design 3 Study of good software development techniques: UML, object-oriented design, design patterns, GUI design principles, testing, debugging and profiling.
department_code course_no year season section instructor_code class_seq
comp 515 2010 spring 001 rmejia 1405

Attachment and Detachment

HTSQL has two operators for generating ad-hoc links:

detachment
@ T
attachment
x -> T

The detachment operator (@ T) generates an ad-hoc link to T by associating each row from the input flow with all rows from T.

department_code no title credits description
acc 315 Financial Accounting 5 Integration of the conceptual and computational aspects of asset, liability and stockholders equity accounting.
acc 511 Audit 5 This course considers legal, procedural and ethical aspects of audit in accounting.
acc 620 Accounting Internship 6 Supervised internship in a business or nonprofit organization on an accounting position.

The precedence of the detachment operator is higher than of any other operator. Therefore, to apply the operator to any expression other than an identifier or a function call, use parentheses.

The attachment operator (x -> T) generates ad-hoc link to T by associating each row from the input flow with all rows from T such that values of expression x evaluated against respective rows coincide.

name dob
Virginia Hester 1986-03-20
Marvin Hall 1986-03-20
Douglas Robles 1986-03-20

Segment Operator

A segment expression is a prefix operator:

/ T

A segment expression converts a flow to a list value.

name department
name
School of Art & Design Studio Art
School of Business Accounting
Economics
Management & Marketing

Sorting Decorators

The following postfix decorators indicate ascending and descending sorting order respectively:

x +, x -
department_code no title credits description
acc 620 Accounting Internship 6 Supervised internship in a business or nonprofit organization on an accounting position.
acc 315 Financial Accounting 5 Integration of the conceptual and computational aspects of asset, liability and stockholders equity accounting.
acc 511 Audit 5 This course considers legal, procedural and ethical aspects of audit in accounting.

Sorting decorators have the same precedence as infix function call.

Sorting decorators are only meaningful when used as arguments of the sort() function and in a selector expression.

Assignment

An assignment expression has the form:

T := x

The left side of an assignment expression indicates the name and formal parameters (if any) of a calculated attribute. It must be an identifier, a reference or a function call and can be preceded by an optional dot-separated sequence of identifiers.

The right side of an assignment is an arbitrary expression indicating the value of a calculated attribute.

name num_dept
School of Engineering 4
School of Arts and Humanities 6
School of Music & Dance 4

An assignment expression could be used only as an argument of functions define() and where(), or in a selector expression.