$Revision: 5039 $
Copyright 2010-2012 Fred Toussi. Permission is granted to distribute this document without any alteration under the terms of the HSQLDB license. Additional permission is granted to the HSQL Development Group to distribute this document with or without alterations under the terms of the HSQLDB license.
2012-08-06 00:10:58+0100
Table of Contents
HyperSQL data access and data change statements are fully compatible with the latest SQL:2008 Standard. There are a few extensions and some relaxation of rules, but these do not affect statements that are written to the Standard syntax. There is full support for classic SQL, as specified by SQL-92, and many enhancements added in later versions of the standard.
An SQL statement can executed in two ways. One way is to use the
java.sql.Statement
interface. The Statement object
can be reused to execute completely different SQL statements.
Alternatively a PreparedStatment
can be used to
execute an SQL statement repeatedly, and the statements can be
parameterized. Using either form, if the SQL statement is a query
expression, a ResultSet
is returned.
In SQL, when a query expression (SELECT or similar SQL statement) is
executed, an ephemeral table is created. When this table is returned to
the application program, it is returned as a result set, which is accessed
row-by-row by a cursor. A JDBC ResultSet
represents
an SQL result set and its cursor.
The minimal definition of a cursor is a list of rows with a position that can move forward. Some cursors also allow the position to move backwards or jump to any position in the list.
An SQL cursor has several attributes. These attributes depend on the
query expression. Some of these attributes can be overridden by specifying
qualifiers in the SQL statement or by specifying values for the parameters
of the JDBC Statement
or
PreparedStatement
.
The columns of the rows of the result set are determined by the
query expression. The number of columns and the type and name
characteristics of each column are known when the query expression is
compiled and before its execution. This metadata information remains
constant regardless of changes to the contents of the tables used in the
query expression. The metadata for the JDBC
ResultSet
is in the form of a
ResultSetMetaData
object. Various methods of the
ResultSetMetaData
interface return different
properties of each column of the
ResultSet
.
A result set may contain 0 or more rows. The rows are determined by the execution of the query expression.
The setMaxRows(int)
method of JDBC
Statement
allows limiting the number of rows
returned by the statement. This limit is conceptually applied after the
result has been built, and the excess rows are discarded.
A cursor is either scrollable or not. Scrollable cursors allow accessing rows by absolute or relative positioning. No-scroll cursors only allow moving to the next row. The cursor can be optionally declared with the SQL qualifiers SCROLL, or NO SCROLL. The JDBC statement parameter can be specified as: TYPE_FORWARD_ONLY and TYPE_SCROLL_INSENSITIVE. The JDBC type TYPE_SCROLL_SENSITIVE is not supported by HSQLDB.
The default is NO SCROLL or TYPE_FORWARD_ONLY.
When a JDBC ResultSet
is opened, it is
positioned before the first row. Using the
next()
method the position is moved to the
first row. While the ResultSet
is positioned on a
row, various getter methods can be used to access the columns of the
row.
The result returned by some query expressions is updatable. HSQLDB supports core SQL updatability features, plus some enhancements from the SQL optional features.
A query expression is updatable if it is a SELECT from a single underlying base table (or updatable view) either directly or indirectly. A SELECT statement featuring DISTINCT or GROUP BY or FETCH, LIMIT, OFFSET is not updatable. In an updatable query expression, one or more columns are updatable. An updatable column is a column that can be traced directly to the underlying table. Therefore, columns that contain expressions are not updatable. Examples of updatable query expressions are given below. The view V is updatable when its query expression is updatable. The SELECT statement from this view is also updatable:
SELECT A, B FROM T WHERE C > 5 SELECT A, B FROM (SELECT * FROM T WHERE C > 10) AS TT WHERE TT.B <10 CREATE VIEW V(X,Y) AS SELECT A, B FROM T WHERE C > 0 AND B < 10 SELECT X FROM V WHERE Y = 5
If a cursor is declared with the SQL qualifier, FOR
UPDATE OF <column name list>
, then only the stated
columns in the result set become updatable. If any of the stated columns
is not actually updatable, then the cursor declaration will not
succeed.
If the SQL qualifier, FOR UPDATE is used, then all the updatable columns of the result set become updatable.
If a cursor is declared with FOR READ ONLY, then it is not updatable.
In HSQLDB, if FOR READ ONLY or FOR UPDATE is not used then all the updatable columns of the result set become updatable. This relaxes the SQL standard rule that in this case limits updatability to only simply updatable SELECT statements (where all columns are updatable).
In JDBC, CONCUR_READ_ONLY or CONCUR_UPDATABLE can be specified for
the Statement
parameter. CONCUR_UPDATABLE is
required if the returning ResultSet is to be updatable. If
CONCUR_READ_ONLY, which is the default, is used, then even an updatable
ResultSet becomes read-only.
When a ResultSet
is updatable, various
setter methods can be used to modify the column values. The names of the
setter methods begin with "update". After all the updates on a row are
done, the updateRow()
method must be called to
finalise the row update.
An updatable ResultSet
may or may not be
insertable-into. In an insertable ResultSet
, all
columns of the result are updatable and any column of the base table
that is not in the result must be a generated column or have a default
value.
In the ResultSet
object, a special
pseudo-row, called the insert row, is used to populate values for
insertion into the ResultSet
(and consequently,
into the base table). The setter methods must be used on all the
columns, followed by a call to
insertRow()
.
Individual rows from all updatable result sets can be deleted one
at a time. The deleteRow()
is called when the
ResultSet
is positioned on a row.
While using an updatable ResultSet to modify data, it is recommended not to change the same data using another ResultSet and not to execute SQL data change statements that modify the same data.
The sensitivity of the cursor relates to visibility of changes made to the data by the same transaction but without using the given cursor. While the result set is open, the same transaction may use statements such as INSERT or UPDATE, and change the data of the tables from which the result set data is derived. A cursor is SENSITIVE if it reflects those changes. It is INSENSITIVE if it ignores such changes. It is ASENSITIVE if behaviour is implementation dependent.
The SQL default is ASENSITIVE, i.e., implantation dependent.
In HSQLDB all cursors are INSENSITIVE. They do not reflect changes to the data made by other statements.
A cursor is holdable if the result set is not automatically closed when the current transaction is committed. Holdability can be specified in the cursor declaration using the SQL qualifiers WITH HOLD or WITHOUT HOLD.
In JDBC, holdability is specified using either of the following values for the Statement parameter: HOLD_CURSORS_OVER_COMMIT, or CLOSE_CURSORS_AT_COMMIT.
The SQL default is WITHOUT HOLD.
The JDBC default for HSQLDB result sets is WITH HOLD for read-only result sets and WITHOUT HOLD for updatable result sets.
If the holdability of a ResultSet
is
specified in a conflicting manner in the SQL statement and the JDBC
Statement
object, the JDBC setting takes
precedence.
The autocommit property of a connection is a feature of JDBC and
ODBC and is not part of the SQL Standard. In autocommit mode, all
transactional statements are followed by an implicit commit. In
autocommit mode, all ResultSet
objects are
read-only and holdable.
The JDBC settings, ResultSet.CONCUR_READONLY and ResultSet.CONCUR_UPDATABLE are the available alternatives for read-only or updatability. The default is ResultSet.CONCUR_READONLY.
The JDBC settings, ResultSet.TYPE_FORWARD_ONLY, ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.TYPE_SCROLL_SENSITIVE are the available alternatives for both scrollability (navigation) and sensitivity. HyperSQL does not support ResultSet.TYPE_SCROLL_SENSITIVE. The two other alternatives can be used for both updatable and read-only result sets.
The JDBC settings ResultSet.CLOSE_CURSORS_AT_COMMIT and ResultSet.HOLD_CURSORS_OVER_COMMIT are the alternatives for the lifetime of the result set. The default is ResultSet.CLOSE_CURSORS_AT_COMMIT. The other setting can only be used for read-only result sets.
Examples of creating statements for updatable result sets are given below:
Connection c = newConnection(); Statement st; c.setAutoCommit(false); st = c.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_UPDATABLE); st = c.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE);
When a JDBC PreparedStatement or CallableStatement is used with an SQL statement that contains dynamic parameters, the data types of the parameters are resolved and determined by the engine when the statement is prepared. The SQL Standard has detailed rules to determine the data types and imposes limits on the maximum length or precision of the parameter. HyperSQL applies the standard rules with two exceptions for parameters with String and BigDecimal Java types. HyperSQL ignores the limits when the parameter value is set, and only enforces the necessary limits when the PreparedStatement is executed. In all other cases, parameter type limits are checked and enforce when the parameter is set.
In the example below the setString() calls do not raise an exception, but one of the execute() statements does.
// table definition: CREATE TABLE T (NAME VARCHAR(12), ...) Connection c = newConnection(); PreparedStatement st = c.prepareStatement("SELECT * FROM T WHERE NAME = ?"); // type of the parameter is VARCHAR(12), which limits length to 12 characters st.setString(1, "Eyjafjallajokull"); // string is longer than type, but no exception is raised here set.execute(); // executes with no exception and does not find any rows // but if an UPDATE is attempted, an exception is raised st = c.prepareStatement("UPDATE T SET NAME = ? WHERE ID = 10"); st.setString(1, "Eyjafjallajokull"); // string is longer than type, but no exception is raised here st.execute(); // exception is thrown when HyperSQL checks the value for update
All of the above also applies to setting the values in new and updated rows in updatable ResultSet objects.
JDBC parameters can be set with any compatible type, as supported by the JDBC specification. For CLOB and BLOB types, you can use streams, or create instances of BLOB or CLOB before assigning them to the parameters. You can even use CLOB or BLOB objects returned from connections to other RDBMS servers. The Connection.createBlob() and createClob() methods can be used to create the new LOBs. For very large LOB's the stream methods are preferable as they use less memory.
For array parameters, you must use a
java.sql.Array
object that contains the array
elements before assigning to JDBC parameters. The
Connection.createArrayOf(...)
method can be used to create
a new object, or you can use an Array returned from connections to other
RDBMS servers.
Data change statements, also called data manipulation statements (DML) such as INSERT, UPDATE, MERGE can be called with different executeUpdate() methods of java.sql.Statement and java.sql.PreparedStatement. Some of these methods allow you to specify how values for generated columns of the table are returned. These methods are documented in the JavaDoc for org.hsqldb.jdbc.JDBCStatement and org.hsqldb.jdbc.JDBCPreparedStatement. HyperSQL can return not just the generated columns, but any set of columns of the table. You can use this to retrieve the columns values that may be modified by a BEFORE TRIGGER on the table.
The JDBC CallableStatement interface is used to call Java or SQL procedures that have been defined in the database. The SQL statement in the form of CALL procedureName ( ... ) with constant value arguments or with parameter markers. Note that you must use a parameter marker for OUT and INOUT arguments of the procedure you are calling. The OUT arguments should not be set before executing the callable statement.
After executing the statement, you can retrieve the OUT and INOUT parameters with the appropriate getXXX() method.
Procedures can also return one or more result sets. You should call the getResultSet() and getMoreResults() methods to retrieve the result sets one by one.
SQL functions can also return a table. You can call such functions the same way as procedures and retrieve the table as a ResultSet.
The methods of the JDBC ResultSet interface can be used to return values and to convert value to different types as supported by the JDBC specification.
When a CLOB and BLOB object is returned from a ResultSet, no data is transferred until the data is read by various methods of java.sql.CLOB and java.sql.BLOB. Data is streamed in large blocks to avoid excessive memory use.
Array objects are returned as instances of java.sql.Array.
The DECLARE CURSOR statement is used within an SQL PROCEDURE body. In the early releases of HyperSQL 2.0, the cursor is used only to return a result set from the procedure. Therefore the cursor must be declared WITH RETURN and can only be READ ONLY.
DECLARE CURSOR
declare cursor statement
<declare cursor> ::= DECLARE <cursor
name>
[ { SENSITIVE | INSENSITIVE | ASENSITIVE } ] [ { SCROLL |
NO SCROLL } ]
CURSOR [ { WITH HOLD | WITHOUT HOLD } ] [ { WITH RETURN |
WITHOUT RETURN } ]
FOR <query expression>
[ FOR { READ ONLY | UPDATE [ OF <column name list>
] } ]
The query expression is a SELECT statement or similar, and is
discussed in the rest of this chapter. In the example below a cursor is
declared for a SELECT statement. It is later opened to create the result
set. The cursor is specified WITHOUT HOLD, so the result set is not kept
after a commit. Use WITH HOLD to keep the result set. Note that you need
to declare the cursor WITH RETURN as it is returned by the
CallableStatement
.
DECLARE thiscursor SCROLL CURSOR WITHOUT HOLD WITH RETURN FOR SELECT * FROM INFORMATION_SCHEMA.TABLES; -- OPEN thiscursor;
The syntax elements that can be used in data access and data change statements are described in this section. The SQL Standard has a very extensive set of definitions for these elements. The BNF definitions given here are sometimes simplified.
Literals are used to express constant values. The general type of a literal is known by its format. The specific type is based on conventions.
unicode escape elements
unicode escape elements
<Unicode escape specifier> ::= [ UESCAPE
<quote><Unicode escape character><quote> ]
<Unicode escape value> ::= <Unicode 4 digit
escape value> | <Unicode 6 digit escape value> | <Unicode
character escape value>
<Unicode 4 digit escape value> ::= <Unicode
escape
character><hexit><hexit><hexit><hexit>
<Unicode 6 digit escape value> ::= <Unicode
escape character><plus sign>
<hexit><hexit><hexit><hexit><hexit><hexit>
<Unicode character escape value> ::= <Unicode
escape character><Unicode escape character>
<Unicode escape character> ::= a single
character than a <hexit> (a-f, A-F, 0-9), <plus sign>,
<quote>, <double quote>, or <white
space>
character literal
character literal
<character string literal> ::= [
<introducer><character set specification> ] <quote> [
<character representation>... ] <quote> [ {
<separator> <quote> [ <character representation>... ]
<quote> }... ]
<introducer> ::=
<underscore>
<character representation> ::= <nonquote
character> | <quote symbol>
<nonquote character> ::= any character apart
from the quote symbol.
<quote symbol> ::=
<quote><quote>
<national character string literal> ::= N
<quote> [ <character representation>... ] <quote> [ {
<separator> <quote> [ <character representation>... ]
<quote> }... ]
<Unicode character string literal> ::= [
<introducer><character set specification> ]
U<ampersand><quote> [ <Unicode representation>... ]
<quote> [ { <separator> <quote> [ <Unicode
representation>... ] <quote> }... ] <Unicode escape
specifier>
<Unicode representation> ::= <character
representation> | <Unicode escape value>
The type of a character literal is CHARACTER. The length of the string literal is the character length of the type. If the quote character is used in a string, it is represented with two quote characters. Long literals can be divided into multiple quoted strings, separated with a space or end-of-line character.
Unicode literals start with U& and can contain ordinary characters and unicode escapes. A unicode escape begins with the backslash ( \ ) character and is followed by four hexadecimal characters which specify the character code.
Example of character literals are given below:
'a literal' ' string seperated' ' into parts' 'a string''s literal form with quote character' U&'Unicode string with Greek delta \0394 and phi \03a6 letters'
binary literal
binary literal
<binary string literal> ::= X <quote> [
<space>... ] [ { <hexit> [ <space>... ] <hexit>
[ <space>... ] }... ] <quote> [ { <separator>
<quote> [ <space>... ] [ { <hexit> [ <space>...
] <hexit> [ <space>... ] }... ] <quote> }...
]
<hexit> ::= <digit> | A | B | C | D | E |
F | a | b | c | d | e | f
The type of a binary literal is BINARY. The octet length of the binary literal is the length of the type. Case-insensitive hexadecimal characters are used in the binary string. Each pair of characters in the literal represents a byte in the binary string. Long literals can be divided into multiple quoted strings, separated with a space or end-of-line character.
X'1abACD34' 'Af'
bit literal
bit literal
<bit string literal> ::= B <quote> [
<bit> ... ] <quote> [ { <separator> <quote> [
<bit>... ] <quote> }... ]
<bit> ::= 0 | 1
The type of a binary literal is BIT. The bit length of the bit literal is the length of the type. Digits 0 and 1 are used to represent the bits. Long literals can be divided into multiple quoted strings, separated with a space or end-of-line character.
B'10001001' '00010'
numeric literal
numeric literal
<signed numeric literal> ::= [ <sign> ]
<unsigned numeric literal>
<unsigned numeric literal> ::= <exact numeric
literal> | <approximate numeric literal>
<exact numeric literal> ::= <unsigned
integer> [ <period> [ <unsigned integer> ] ] |
<period> <unsigned integer>
<sign> ::= <plus sign> | <minus
sign>
<approximate numeric literal> ::=
<mantissa> E <exponent>
<mantissa> ::= <exact numeric
literal>
<exponent> ::= <signed
integer>
<signed integer> ::= [ <sign> ]
<unsigned integer>
<unsigned integer> ::=
<digit>...
The type of an exact numeric literal without a decimal point is INTEGER, BIGINT, or DECIMAL, depending on the value of the literal (the smallest type that can represent the value is the type).
The type of an exact numeric literal with a decimal point is DECIMAL. The precision of a decimal literal is the total number of digits of the literal. The scale of the literal is the total number of digits to the right of the decimal point.
The type of an approximate numeric literal is DOUBLE. An approximate numeric literal always includes the mantissa and exponent, separated by E.
12 34.35 +12E-2
boolean literal
boolean literal
<boolean literal> ::= TRUE | FALSE |
UNKNOWN
The boolean literal is one of the specified keywords.
datetime and interval literal
datetime and interval literal
<datetime literal> ::= <date literal> |
<time literal> | <timestamp literal>
<date literal> ::= DATE <date
string>
<time literal> ::= TIME <time
string>
<timestamp literal> ::= TIMESTAMP <timestamp
string>
<date string> ::= <quote> <unquoted
date string> <quote>
<time string> ::= <quote> <unquoted
time string> <quote>
<timestamp string> ::= <quote>
<unquoted timestamp string> <quote>
<time zone interval> ::= <sign> <hours
value> <colon> <minutes value>
<date value> ::= <years value> <minus
sign> <months value> <minus sign> <days
value>
<time value> ::= <hours value>
<colon> <minutes value> <colon> <seconds
value>
<interval literal> ::= INTERVAL [ <sign> ]
<interval string> <interval qualifier>
<interval string> ::= <quote> <unquoted
interval string> <quote>
<unquoted date string> ::= <date
value>
<unquoted time string> ::= <time value> [
<time zone interval> ]
<unquoted timestamp string> ::= <unquoted
date string> <space> <unquoted time
string>
<unquoted interval string> ::= [ <sign> ]
{ <year-month literal> | <day-time literal>
}
<year-month literal> ::= <years value> [
<minus sign> <months value> ] | <months
value>
<day-time literal> ::= <day-time interval>
| <time interval>
<day-time interval> ::= <days value> [
<space> <hours value> [ <colon> <minutes value>
[ <colon> <seconds value> ] ] ]
<time interval> ::= <hours value> [
<colon> <minutes value> [ <colon> <seconds
value> ] ] | <minutes value> [ <colon> <seconds
value> ] | <seconds value>
<years value> ::= <datetime
value>
<months value> ::= <datetime
value>
<days value> ::= <datetime
value>
<hours value> ::= <datetime
value>
<minutes value> ::= <datetime
value>
<seconds value> ::= <seconds integer
value> [ <period> [ <seconds fraction> ]
]
<seconds integer value> ::= <unsigned
integer>
<seconds fraction> ::= <unsigned
integer>
<datetime value> ::= <unsigned
integer>
The type of a datetime or interval type is specified in the literal. The fractional second precision is the number of digits in the fractional part of the literal. Details are described in the SQL Language chapter
DATE '2008-08-08' TIME '20:08:08' TIMESTAMP '2008-08-08 20:08:08.235' INTERVAL '10' DAY INTERVAL -'08:08' MINUTE TO SECOND
References are identifier chains, which can be a single identifiers or identifiers chains composed of single identifiers chained together with the period symbol.
identifier chain
identifier chain
<identifier chain> ::= <identifier> [ {
<period> <identifier> }... ]
<basic identifier chain> ::= <identifier
chain>
A period-separated chain of identifiers. The identifiers in an identifier chain can refer to database objects in a hierarchy. The possible hierarchies are as follows. In each hierarchy, elements from the start or the end can be missing, but the order of elements cannot be changed.
catalog, schema, database object
catalog, schema, table, column
correlation name, column
Examples of identifier chain are given below:
SELECT MYCAT.MYSCHEMA.MYTABLE.MYCOL FROM MYCAT.MYSCHEMA.MYTABLE DROP TABLE MYCAT.MYSCHEMA.MYTABLE CASCADE ALTER SEQUENCE MYCAT.MYSCHEMA.MYSEQUENCE RESTART WITH 100
column reference
column reference
<column reference> ::= <basic identifier
chain> | MODULE <period> <qualified identifier>
<period> <column name>
Reference a column or a routine variable.
SQL parameter reference
SQL parameter reference
<SQL parameter reference> ::= <basic
identifier chain>
Reference an SQL routine parameter.
contextually typed value specification
contextually typed value specification
<contextually typed value specification> ::=
<null specification> | <default
specification>
<null specification> ::=
NULL
<default specification> ::=
DEFAULT
Specify a value whose data type or value is inferred from its context. DEFAULT is used for assignments to table columns that have a default value, or to table columns that are generated either as an IDENTITY value or as an expression. NULL can be used only in a context where the type of the value is known. For example, a NULL can be assigned to a column of the table in an INSERT or UPDATE statement, because the type of the column is known. But if NULL is used in a SELECT list, it must be used in a CAST statement.
Value expression is a general name for all expressions that return a value. Different types of expressions are allowed in different contexts.
value expression primary
value expression primary
<value expression primary> ::= <parenthesized
value expression> | <nonparenthesized value expression
primary>
<parenthesized value expression> ::= <left
paren> <value expression> <right
paren>
<nonparenthesized value expression primary> ::=
<unsigned value specification> | <column reference> |
<set function specification> | <scalar subquery> | <case
expression> | <cast specification> | <next value
expression> | <current value expression> | <routine
invocation>
Specify a value that is syntactically self-delimited.
value specification
value specification
<value specification> ::= <literal> |
<general value specification>
<unsigned value specification> ::= <unsigned
literal> | <general value specification>
<target specification> ::= <host parameter
specification> | <SQL parameter reference> | <column
reference> | <dynamic parameter
specification>
<simple target specification> ::= <host
parameter specification> | <SQL parameter reference> |
<column reference> | <embedded variable
name>
<host parameter specification> ::= <host
parameter name> [ <indicator parameter> ]
<dynamic parameter specification> ::=
<question mark>
Specify one or more values, host parameters, SQL parameters, dynamic parameters, or host variables.
row value expression
row value expression
<row value expression> ::= <row value special
case> | <explicit row value constructor>
<row value predicand> ::= <row value special
case> | <row value constructor predicand>
<row value special case> ::=
<nonparenthesized value expression primary>
<explicit row value constructor> ::= <left
paren> <row value constructor element> <comma> <row
value constructor element list> <right paren>
|
ROW <left paren> <row value constructor
element list> <right paren> | <row
subquery>
Specify a row consisting of one or more elements. A comma separated list of expressions, enclosed in brackets, with the optional keyword ROW. In SQL, a row containing a single element can often be used where a single value is expected.
set function specification
set function specification
<set function specification> ::= <aggregate
function> | <grouping operation>
<grouping operation> ::= GROUPING <left
paren> <column reference> [ { <comma> <column
reference> }... ] <right paren>
Specify a value derived by the application of a function to an
argument. Early releases of HyperSQL 2.0 do not support
<grouping operation>
.
COALESCE
coalesce expression
<coalesce expression> := COALESCE <left
paren> <value expression> { <comma> <value
expression> }... <right paren>
Replace null values with another value. The coalesce expression
has two or more instances of <value expression>. If the first
<value expression> evaluates to a non-null value, it is returned
as the result of the coalesce expression. If it is null, the next
<value expression>
is evaluated and if it
evaluates to a non-non value, it is returned, and so on.
The type of the return value of a COALESCE expression is the
aggregate type of the types of all the <value
expression>
instances. Therefore, any value returned is
implicitly cast to this type. HyperSQL also features built-in functions
with similar functionality.
NULLIF
nullif expression
<nullif expression> := NULLIF <left paren>
<value expression> <comma> <value expression>
<right paren>
Return NULL if two values are equal. If the result of the first
<value expression>
is not equal to the result
of the second, then it is returned, otherwise NULL is returned. The type
of the return value is the type of the first <value
expression>
.
SELECT i, NULLIF(n, 'not defined') FROM t
CASE
case specification
<case specification> ::= <simple case> |
<searched case>
<simple case> ::= CASE <case operand>
<simple when clause>... [ <else clause> ]
END
<searched case> ::= CASE <searched when
clause>... [ <else clause> ] END
<simple when clause> ::= WHEN <when operand
list> THEN <result>
<searched when clause> ::= WHEN <search
condition> THEN <result>
<else clause> ::= ELSE
<result>
<case operand> ::= <row value predicand> |
<overlaps predicate part 1>
<when operand list> ::= <when operand> [ {
<comma> <when operand> }... ]
<when operand> ::= <row value predicand> |
<comparison predicate part 2> | <between predicate part 2> |
<in predicate part 2> | <character like predicate part 2> |
<octet like predicate part 2> | <similar predicate part 2> |
<regex like predicate part 2> | <null predicate part 2> |
<quantified comparison predicate part 2> | <match predicate
part 2> | <overlaps predicate part 2> | <distinct predicate
part 2>
<result> ::= <result expression> |
NULL
<result expression> ::= <value
expression>
Specify a conditional value. The result of a case expression is always a value. All the values introduced with THEN must be of the same type.
Some simple examples of the CASE expression are given below. The first two examples return 'Britain', 'Germany', or 'Other country' depending on the value of dialcode. The third example uses IN and smaller-than predicates.
CASE dialcode WHEN 44 THEN 'Britain' WHEN 49 THEN 'Germany' ELSE 'Other country' END CASE WHEN dialcode=44 THEN 'Britain' WHEN dialcode=49 THEN 'Germany' WHEN dialcode < 0 THEN 'bad dial code' ELSE 'Other country' END CASE dialcode WHEN IN (44, 49,30) THEN 'Europe' WHEN IN (86,91,91) THEN 'Asia' WHEN < 0 THEN 'bad dial code' ELSE 'Other continent' END
The case statement can be far more complex and involve several conditions.
CAST
cast specification
<cast specification> ::= CAST <left paren>
<cast operand> AS <cast target> <right
paren>
<cast operand> ::= <value expression> |
<implicitly typed value specification>
<cast target> ::= <domain name> | <data
type>
Specify a data conversion. Data conversion takes place automatically among variants of a general type. For example numeric values are freely converted from one type to another in expressions.
Explicit type conversion is necessary in two cases. One case is to determine the type of a NULL value. The other case is to force conversion for special purposes. Values of data types can be cast to a character type. The exception is BINARY and OTHER types. The result of the cast is the literal expression of the value. Conversely, a value of a character type can be converted to another type if the character value is a literal representation of the value in the target type. Special conversions are possible between numeric and interval types, which are described in the section covering interval types.
The examples below show examples of cast with their result:
CAST (NULL AS TIMESTAMP) CAST (' 199 ' AS INTEGER) = 199 CAST ('tRue ' AS BOOLEAN) = TRUE CAST (INTERVAL '2' DAY AS INTEGER) = 2 CAST ('1992-04-21' AS DATE) = DATE '1992-04-21'
NEXT VALUE FOR
next value expression
<next value expression> ::= NEXT VALUE FOR
<sequence generator name>
Return the next value of a sequence generator. This expression can be used as a select list element in queries, or in assignments to table columns in data change statements. If the expression is used more than once in a single row that is being evaluated, the same value is returned for each invocation. After evaluation of the particular row is complete, the sequence generator will return a different value from the old value. The new value is generated by the sequence generator by adding the increment to the last value it generated. In the example below the expression is used in an insert statement:
INSERT INTO MYTABLE(COL1, COL2) VALUES 2, NEXT VALUE FOR MYSEQUENCE
CURRENT VALUE FOR
current value expression
<current value expression> ::= CURRENT VALUE FOR
<sequence generator name>
Return the latest value that was returned by the NEXT VALUE FOR expression for a sequence generator. In the example below, the value that was generated by the sequence for the first insert, is reused for the second insert:
INSERT INTO MYTABLE(COL1, COL2) VALUES 2, NEXT VALUE FOR MYSEQUENCE; INSERT INTO CHILDTABLE(COL1, COL2) VALUES 10, CURRENT VALUE FOR MYSEQUENCE;
value expression
value expression
<value expression> ::= <numeric value
expression> | <string value expression> | <datetime value
expression> | <interval value expression> | <boolean value
expression> | <row value expression>
An expression that returns a value. The value can be a single value, or a row consisting more than one value.
numeric value expression
numeric value expression
<numeric value expression> ::= <term> |
<numeric value expression> <plus sign> <term> |
<numeric value expression> <minus sign>
<term>
<term> ::= <factor> | <term>
<asterisk> <factor> | <term> <solidus>
<factor>
<factor> ::= [ <sign> ] <numeric
primary>
<numeric primary> ::= <value expression
primary> | <numeric value function>
Specify a numeric value. The BNF indicates that
<asterisk>
and
<solidus>
(the operators for multiplication and
division) have precedence over <minus sign>
and
<plus sign>
.
numeric value function
numeric value function
<numeric value function> ::= <position
expression> | <extract expression> | <length expression>
...
Specify a function yielding a value of type numeric. The supported numeric value functions are listed and described in the Built In Functions chapter.
string value expression
string value expression
<string value expression> ::= <string
concatenation> | <string factor>
<string factor> ::= <value expression
primary> | <string value function>
<string concatenation> ::= <string value
expression> <concatenation operator> <string
factor>
<concatenation operator> ::=
||
Specify a character string value, a binary string value, or a
bit string value. The BNF indicates that a string value expression can
be formed by concatenation of two or more <value expression
primary>
. The types of the <value expression
primary>
elements must be compatible, that is, all must be
string, or binary or bit string values.
character value function
string value function
<string value function> ::=
...
Specify a function that returns a character string or binary string. The supported character value functions are listed and described in the Built In Functions chapter.
datetime value expression
datetime value expression
<datetime value expression> ::= <datetime
term> | <interval value expression> <plus sign>
<datetime term> | <datetime value expression> <plus
sign> <interval term> | <datetime value expression>
<minus sign> <interval term>
<datetime term> ::= <datetime
factor>
<datetime factor> ::= <datetime primary> [
<time zone> ]
<datetime primary> ::= <value expression
primary> | <datetime value function>
<time zone> ::= AT <time zone
specifier>
<time zone specifier> ::= LOCAL | TIME ZONE
<interval primary>
Specify a datetime value. Details are described in the SQL Language chapter.
datetime value function
datetime value function
<datetime value function> ::=
...
Specify a function that returns a datetime value. The supported datetime value functions are listed and described in the Built In Functions chapter.
interval term
interval value expression
<interval value expression> ::= <interval
term> | <interval value expression 1> <plus sign>
<interval term 1> | <interval value expression 1> <minus
sign> <interval term 1> | <left paren> <datetime value
expression> <minus sign> <datetime term> <right
paren> <interval qualifier>
<interval term> ::= <interval factor> |
<interval term 2> <asterisk> <factor> | <interval
term 2> <solidus> <factor> | <term>
<asterisk> <interval factor>
<interval factor> ::= [ <sign> ]
<interval primary>
<interval primary> ::= <value expression
primary> [ <interval qualifier> ] | <interval value
function>
<interval value expression 1> ::= <interval
value expression>
<interval term 1> ::= <interval
term>
<interval term 2> ::= <interval
term>
Specify an interval value. Details are described in the SQL Language chapter.
interval absolute value function
interval value function
<interval value function> ::= <interval
absolute value function>
<interval absolute value function> ::= ABS
<left paren> <interval value expression> <right
paren>
Specify a function that returns the absolute value of an interval. If the interval is negative, it is negated, otherwise the original value is returned.
boolean value expression
boolean value expression
<boolean value expression> ::= <boolean
term> | <boolean value expression> OR <boolean
term>
<boolean term> ::= <boolean factor> |
<boolean term> AND <boolean factor>
<boolean factor> ::= [ NOT ] <boolean
test>
<boolean test> ::= <boolean primary> [ IS
[ NOT ] <truth value> ]
<truth value> ::= TRUE | FALSE |
UNKNOWN
<boolean primary> ::= <predicate> |
<boolean predicand>
<boolean predicand> ::= <parenthesized
boolean value expression> | <nonparenthesized value expression
primary>
<parenthesized boolean value expression> ::=
<left paren> <boolean value expression> <right
paren>
Specify a boolean value.
Predicates are conditions with two sides and evaluate to a
boolean value. The left side of the predicate, the <row
value predicand>
, is the common element of all predicates.
This element is a generalisation of both <value
expression>
, which is a scalar, and of
<explicit row value constructor>
, which is a
row. The two sides of a predicate can be split in CASE statements where
the <row value predicand>
is part of multiple
predicates.
The number of fields in all <row value
predicand>
used in predicates must be the same and the
types of the fields in the same position must be compatible for
comparison. If either of these conditions does not hold, an exception is
raised. The number of fields in a row is called the
degree.
In many types of predicates (but not all of them), if the
<row value predicand>
evaluates to NULL, the
result of the predicate is UNKNOWN. If the <row value
predicand>
has more than one element, and one or more of
the fields evaluate to NULL, the result depends on the particular
predicate.
comparison predicand
comparison predicate
<comparison predicate> ::= <row value
predicand> <comp op> <row value
predicand>
<comp op> ::= <equals operator> | <not
equals operator> | <less than operator> | <greater than
operator> | <less than or equals operator> | <greater than
or equals operator>
Specify a comparison of two row values. If either
<row value predicand>
evaluates to NULL, the
result of <comparison predicate>
is UNKNOWN.
Otherwise, the result is TRUE, FALSE or UNKNOWN.
If the degree of <row value
predicand>
is larger than one, comparison is performed
between each field and the corresponding field in the other
<row value predicand>
from left to right, one
by one.
When comparing two elements, if either field is NULL then the result is UNKNOWN.
For <equals operator>
, if the result
of comparison is TRUE for all field, the result of the predicate is
TRUE. If the result of comparison is FALSE for one field, the result of
predicate is FALSE. Otherwise the result is UNKNOWN.
The <not equals operator>
is
translated to NOT (<row value predicand> = <row value
predicand>)
.
The <less than or equals operator>
is
translated to (<row value predicand> = <row value
predicand>) OR (<row value predicand> < <row value
predicand>)
. The <greater than or equals
operator>
is translated similarly.
For the <less than operator>
and
<greater than operator>
, if two fields at a
given position are equal, then comparison continues to the next field.
Otherwise, the result of the last performed comparison is returned as
the result of the predicate. This means that if the first field is NULL,
the result is always UNKNOWN.
The logic that governs NULL values and UNKNOWN result is as follows: Suppose the NULL values were substituted by arbitrary real values. If substitution cannot change the result of the predicate, then the result is TRUE or FALSE, based on the existing non-NULL values, otherwise the result of the predicate is UNKNOWN.
The examples of comparison given below use literals, but the literals actually represent the result of evaluation of some expression.
((1, 2, 3, 4) = (1, 2, 3, 4)) IS TRUE ((1, 2, 3, 4) = (1, 2, 3, 5)) IS FALSE ((1, 2, 3, 4) < (1, 2, 3, 4)) IS FALSE ((1, 2, 3, 4) < (1, 2, 3, 5)) IS TRUE ((NULL, 1, NULL) = (NULL, 1, NULL)) IS UNKNOWN ((NULL, 1, NULL) = (NULL, 2, NULL)) IS FALSE ((NULL, 1, NULL) <> (NULL, 2, NULL)) IS TRUE ((NULL, 1, 2) <all operators> (NULL, 1, 2)) IS UNKNOWN ((1, NULL, ...) < (1, 2, ...)) IS UNKNOWN ((1, NULL, ...) < (2, NULL, ...)) IS TRUE ((2, NULL, ...) < (1, NULL, ...)) IS FALSE
BETWEEN
between predicate
<between predicate> ::= <row value
predicand> <between predicate part 2>
<between predicate part 2> ::= [ NOT ] BETWEEN [
ASYMMETRIC | SYMMETRIC ] <row value predicand> AND <row value
predicand>
Specify a range comparison. The default is ASYMMETRIC. The
expression X BETWEEN Y AND Z
is equivalent to
(X >= Y AND X <= Z)
. Therefore if Y > Z, the
BETWEEN expression is never true. The expression X BETWEEN
SYMMETRIC Y AND Z
is equivalent to (X >= Y AND X
<= Z) OR (X >= Z AND X <= Y)
. The expression
Z NOT BETWEEN ...
is equivalent to NOT (Z
BETWEEN ...)
. If any of the three <row value
predicand>
evaluates to NULL, the result is
UNKNOWN.
IN
in predicate
<in predicate> ::= <row value predicand> [
NOT ] IN <in predicate value>
<in predicate value> ::= <table subquery>
| <left paren> <in value list> <right paren>
| <left paren> UNNEST <left paren>
<array value expression> <right paren> <right
paren>
<in value list> ::= <row value expression>
[ { <comma> <row value expression> }...
]
Specify a quantified comparison. The expression X NOT
IN Y is
equivalent to NOT (X IN Y)
. The
( <in value list> )
is converted into a table
with one or more rows. The expression X IN Y
is
equivalent to X = ANY Y
, which is a
<quantified comparison predicate>
.
If the <table subquery>
returns no
rows, the result is FALSE. Otherwise the <row value
predicand>
is compared one by one with each row of the
<table subquery>
.
If the comparison is TRUE for at least one row, the result is TRUE. If the comparison is FALSE for all rows, the result is FALSE. Otherwise the result is UNKNOWN.
HyperSQL supports an extension to the SQL Standard to allow an array to be used in the <in predicate value>. This is intended to be used with prepared statements where a variable length array of values can be used as the parameter value for each call. The example below shows how this is used in SQL. The JDBC code must create a new java.sql.Array object that contains the values and set the parameter with this array.
SELECT * FROM customer WHERE firstname IN ( UNNEST(?) ) Connection conn; PreparedStatement ps; // conn and ps are instantiated here Array arr = conn.createArrayOf("INTEGER", new Integer[] {1, 2, 3}); ps.setArray(1, arr); ResultSet rs = ps.executeQuery();
LIKE
like predicate
<like predicate> ::= <character like
predicate> | <octet like predicate>
<character like predicate> ::= <row value
predicand> [ NOT ] LIKE <character pattern> [ ESCAPE <escape
character> ]
<character pattern> ::= <character value
expression>
<escape character> ::= <character value
expression>
<octet like predicate> ::= <row value
predicand> [ NOT ] LIKE <octet pattern> [ ESCAPE <escape
octet> ]
<octet pattern> ::= <binary value
expression>
<escape octet> ::= <binary value
expression>
Specify a pattern-match comparison for character or binary
strings. The <row value predicand>
is always a
<string value expression>
of character or
binary type. The <character pattern>
or
<octet pattern>
is a <string value
expression>
in which the underscore and percent characters
have special meanings. The underscore means match any one character,
while the percent means match a sequence of zero or more characters. The
<escape character>
or <escape
octet>
is also a <string value
expression>
that evaluates to a string of exactly one
character length. If the underscore or the percent is required as normal
characters in the pattern, the specified <escape
character>
or <escape octet>
can
be used in the pattern before the underscore or the percent. The
<row value predicand>
is compared with the
<character pattern>
and the result of
comparison is returned. If any of the expressions in the predicate
evaluates to NULL, the result of the predicate is UNKNOWN. The
expression A NOT LIKE B
is equivalent to NOT
(A LIKE B)
. If the length of the escape is not 1 or it is used
in the pattern not immediately before an underscore or a percent
character, an exception is raised.
IS NULL
null predicate
<null predicate> ::= <row value predicand>
IS [ NOT ] NULL
Specify a test for a null value. The expression X IS
NOT NULL
is NOT equivalent to NOT (X IS
NULL)
if the degree of the <row value
predicand>
is larger than 1. The rules are: If all fields
are null, X IS NULL
is TRUE and X IS NOT
NULL
is FALSE. If only some fields are null, both X
IS NULL
and X IS NOT NULL
are FALSE. If all
fields are not null, X IS NULL
is FALSE and
X IS NOT NULL
is TRUE.
ALL and ANY
quantified comparison predicate
<quantified comparison predicate> ::= <row
value predicand> <comp op> <quantifier> <table
subquery>
<quantifier> ::= <all> |
<some>
<all> ::= ALL
<some> ::= SOME | ANY
Specify a quantified comparison. For a quantified comparison,
the <row value predicand>
is compared one by
one with each row of the <table sub
query>
.
If the <table subquery>
returns no
rows, then if ALL
is specified the result is TRUE,
but if SOME
or ANY
is specified
the result is FALSE.
If ALL
is specified, if the comparison is
TRUE for all rows, the result of the predicate is TRUE. If the
comparison is FALSE for at least one row, the result is FALSE. Otherwise
the result is UNKNOWN.
If SOME
or ANY
is
specified, if the comparison is TRUE for at least one row, the result is
TRUE. If the comparison is FALSE for all rows, the result is FALSE.
Otherwise the result is UNKNOWN. Note that the IN predicate is
equivalent to the SOME or ANY predicate using the <equals
operator>
.
In the examples below, the date of an invoice is compared to holidays in a given year. In the first example the invoice date must equal one of the holidays, in the second example it must be later than all holidays (later than the last holiday), in the third example it must be on or after some holiday (on or after the first holiday), and in the fourth example, it must be before all holidays (before the first holiday).
invoice_date = SOME (SELECT holiday_date FROM holidays) invoice_date > ALL (SELECT holiday_date FROM holidays) invoice_date >= ANY (SELECT holiday_date FROM holidays) invoice_date < ALL (SELECT holiday_date FROM holidays)
EXISTS
exists predicate
<exists predicate> ::= EXISTS <table
subquery>
Specify a test for a non-empty set. If the evaluation of
<table subquery>
results in one or more rows,
then the expression is TRUE, otherwise FALSE.
UNIQUE
unique predicate
<unique predicate> ::= UNIQUE <table
subquery>
Specify a test for the absence of duplicate rows. The result of
the test is either TRUE or FALSE (never UNKNOWN). The rows of the
<table subquery>
that contain one or more NULL
values are not considered for this test. If the rest of the rows are
distinct from each other, the result of the test is TRUE, otherwise it
is FALSE. The distinctness of rows X and Y is tested with the predicate
X IS DISTINCT FROM Y
.
MATCH
match predicate
<match predicate> ::= <row value
predicand> MATCH [ UNIQUE ] [ SIMPLE | PARTIAL | FULL ] <table
subquery>
Specify a test for matching rows. The default is MATCH SIMPLE without UNIQUE. The result of the test is either TRUE or FALSE (never UNKNOWN).
The interpretation of NULL values is different from other
predicates and quite counter-intuitive. If the <row value
predicand>
is NULL, or all of its fields are NULL, the
result is TRUE.
Otherwise, the <row value predicand>
is compared with each row of the <table
subquery>
.
If SIMPLE is specified, if some field of <row value
predicate>
is NULL, the result is TRUE. Otherwise if
<row value predicate>
is equal to one or more
rows of <table subquery>
the result is TRUE if
UNIQUE is not specified, or if UNIQUE is specified and only one row
matches. Otherwise the result is FALSE.
If PARTIAL is specified, if the non-null values
<row value predicate>
are equal to those in one
or more rows of <table subquery>
the result is
TRUE if UNIQUE is not specified, or if UNIQUE is specified and only one
row matches. Otherwise the result is FALSE.
If FULL is specified, if some field of <row value
predicate>
is NULL, the result is FALSE. Otherwise if
<row value predicate>
is equal to one or more
rows of <table subquery>
the result is TRUE if
UNIQUE is not specified, or if UNIQUE is specified and only one row
matches.
Note that MATCH can also used be used in FOREIGN KEY constraint definitions. The exact meaning is described in the Schemas and Database Objects chapter.
OVERLAPS
overlaps predicate
<overlaps predicate> ::= <row value
predicand> OVERLAPS <row value predicand>
Specify a test for an overlap between two datetime periods.
Each <row value predicand>
must have two fields
and the fields together represent a datetime period. So the predicates
is always in the form (X1, X2) OVERLAPS (Y1, Y2)
. The
first field is always a datetime value, while the second field is either
a datetime value or an interval value.
If the second value is an interval value, it is replaced with
the sum of the datetime value and itself, for example (X1, X1 +
X2) OVERLAPS (Y1, Y1 + Y 2)
.
If any of the values is NULL, the result is UNKNOWN.
The expression is true if there is there is any overlap between the two datetime periods. In the example below, the period is compared with a week long period ending yesterday.
(startdate, enddate) OVERLAPS (CURRENT_DATE - 7 DAY, CURRENT_DATE - 1 DAY)
IS DISTINCT
is distinct predicate
<distinct predicate> ::= <row value
predicand> IS [ NOT ] DISTINCT FROM <row value
predicand>
Specify a test of whether two row values are distinct. The
result of the test is either TRUE or FALSE (never UNKNOWN). The
degree the two <row value
predicand>
must be the same. Each field of the first
<row value predicand>
is compared to the field
of the second <row value predicand>
at the same
position. If one field is NULL and the other is not NULL, or if the
elements are NOT equal, then the result of the expression is TRUE. If no
comparison result is TRUE, then the result of the predicate is FALSE.
The expression X IS NOT DISTINCT FROM Y
is equivalent
to NOT (X IS DISTINCT FORM Y)
. The following check
returns true if startdate is not equal to enddate. It also returns true
if either startdate or enddate is NULL. It returns false in other
cases.
startdate IS DISTINCT FROM enddate
aggregate function
aggregate function
<aggregate function> ::= COUNT <left
paren> <asterisk> <right paren> [ <filter clause> ]
| <general set function> [ <filter clause> ] | <array
aggregate function> [ <filter clause> ]
<general set function> ::= <set function
type> <left paren> [ <set quantifier> ] <value
expression> <right paren>
<set function type> ::= <computational
operation>
<computational operation> ::= AVG | MAX | MIN |
SUM | EVERY | ANY | SOME | COUNT | STDDEV_POP | STDDEV_SAMP | VAR_SAMP |
VAR_POP | MEDIAN
<set quantifier> ::= DISTINCT |
ALL
<filter clause> ::= FILTER <left paren>
WHERE <search condition> <right paren>
<array aggregate function> ::= { ARRAY_AGG |
GROUP_CONCAT } <left paren> [ <set quantifier> ] <value
expression> [ <order by clause> ] [ SEPARATOR <separator>
] <right paren>
<separator> ::= <character string
literal>
Specify a value computed from a collection of rows.
An aggregate function is used exclusively in a
<query specification>
and its use transforms a
normal query into an aggregate query returning a single row instead of
the multiple rows that the original query returns. For example,
SELECT acolumn <table expression>
is a query
that returns the value of acolumn for all the rows the satisfy the given
condition. But SELECT MAX(acolumn) <table
expression>
returns only one row, containing the largest
value in that column. The query SELECT COUNT(*) <table
expression>
returns the count of rows, while
SELECT COUNT(acolumn) <table expression>
returns the count of rows where acolumn IS NOT
NULL
.
If the <table expression>
is a grouped
table (has a GROUP BY
clause), the aggregate function
returns the result of the COUNT
or
<computational operation>
for each group. In
this case the result has the same number of rows as the original grouped
query. For example SELECT SUM(acolumn) <table
expression>
when <table
expression>
has a GROUP BY
clause,
returns the sum of values for acolumn
in each
group.
The SUM operations can be performed on numeric and interval expressions only. AVG and MEDIAN can be performed on numeric, interval or datetime expressions. AVG returns the average value, while SUM returns the sum of all values. If all values are NULL, the operations return NULL. MEDIAN returns the middle value in the sorted list of values.
MAX and MIN can be performed on all types of expressions and return the minimum or the maximum value. If all values are NULL, the operations return NULL.
COUNT(*)
returns the count of all values,
including nulls, while COUNT(<value
expression>)
returns the count of non-NULL values. COUNT
with DISTINCT also accepts multiple arguments. In this usage the
distinct combinations of the arguments are counted. Examples
below:
SELECT COUNT(DISTINCT firstname, lastname) FROM customer SELECT COUNT(DISTINCT (firstname, lastname)) FROM customer
The EVERY, ANY and SOME operations can be performed on boolean expressions only. EVERY returns TRUE if all the values are TRUE, otherwise FALSE. ANY and SOME are the same operation and return TRUE if one of the values is TRUE, otherwise it returns FALSE.
The other operations perform the statistical functions STDDEV_POP, STDDEV_SAMP, VAR_SAMP, VAR_POP on numeric values. NULL values are ignored in calculations.
User defined aggregate functions can be defined and used instead of the built-in aggregate functions. Syntax and examples are given in the SQL-Invoked Routines chapter.
The <filter clause>
allows you to add a
search condition. When the search condition evaluates to TRUE for a row,
the row is included in aggregation. Otherwise the row is not included.
In the example below a single query returns two different filtered
counts:
SELECT COUNT(ITEM) FILTER (WHERE GENDER = 'F') AS "FEMALE COUNT", COUNT(ITEM) FILTER (WHERE GENDER = 'M') AS "MALE COUNT" FROM PEOPLE
ARRAY_AGG is different from all other aggregate functions, as
it does not ignore the NULL values. This set function returns an array
that contains all the values, for different rows, for the
<value expression>
. For example, if the
<value expression>
is a column reference, the
SUM function adds the values for all the row together, while the
ARRAY_AGG function adds the value for each row as a separate element of
the array. ARRAY_AGG can include an optional <order by clause>. If
this is used, the elements of the returned array are sorted according to
the <order by clause>
, which can reference all
the available columns of the query, not just the <value
expression>
that is used as the ARRAY_AGG argument. The
<order by clause>
can have multiple elements
and each element can include NULLS LAST or DESC qualifiers. No
<separator>
is used with this
function.
GROUP_CONCAT is a specialised function derived from ARRAY_AGG.
This function computes the array in the same way as ARRAY_AGG, removes
all the NULL elements, then returns a string that is a concatenation of
the elements of the array. If <separator>
has
been specified, it is used to separate the elements of the array.
Otherwise the comma is used to separate the elements.
The example below shows a grouped query with ARRAY_AGG and GROUP_CONCAT. The CUSTOMER table that is included for tests in the DatabaseManager GUI app is the source of the data.
SELECT LASTNAME, ARRAY_AGG(FIRSTNAME ORDER BY FIRSTNAME) FROM Customer GROUP BY LASTNAME LASTNAME C2 --------- ---------------------------------------------------------- Steel ARRAY['John','John','Laura','Robert'] King ARRAY['George','George','James','Julia','Robert','Robert'] Sommer ARRAY['Janet','Robert'] SELECT LASTNAME, GROUP_CONCAT(DISTINCT FIRSTNAME ORDER BY FIRSTNAME DESC SEPARATOR ' * ') FROM Customer GROUP BY LASTNAME LASTNAME C2 --------- ------------------------------------------------- Steel Robert * Laura * John King Robert * Julia * James * George Sommer Robert * Janet
search condition
search condition
<search condition> ::= <boolean value
expression>
Specify a condition that is TRUE, FALSE, or UNKNOWN. A search condition is often a predicate.
PATH
path specification
<path specification> ::= PATH <schema name
list>
<schema name list> ::= <schema name> [ {
<comma> <schema name> }... ]
Specify an order for searching for a user-defined SQL-invoked routine. This is not currently supported by HyperSQL.
routine invocation
routine invocation
<routine invocation> ::= <routine name>
<SQL argument list>
<routine name> ::= [ <schema name>
<period> ] <qualified identifier>
<SQL argument list> ::= <left paren> [
<SQL argument> [ { <comma> <SQL argument> }... ] ]
<right paren>
<SQL argument> ::= <value expression> |
<target specification>
Invoke an SQL-invoked routine. Examples are given in the SQL-Invoked Routines chapter.
COLLATE
collate clause
<collate clause> ::= COLLATE <collation
name>
Specify a collation for a column or for an ORDER BY expression. This collation is used for comparing the values of the column in different rows. Comparison can happens during the execution of SELECT, UPDATE or DELETE statements, when a UNIQUE constraint or index is defined on the column, or when the the rows are sorted by an ORDER BY clause.
CONSTRAINT
constraint name definition
<constraint name definition> ::= CONSTRAINT
<constraint name>
<constraint characteristics> ::= <constraint
check time> [ [ NOT ] DEFERRABLE ] | [ NOT ] DEFERRABLE [
<constraint check time> ]
<constraint check time> ::= INITIALLY DEFERRED |
INITIALLY IMMEDIATE
Specify the name of a constraint and its characteristics. This is an optional element of CONSTRAINT definition, not yet supported by HyperSQL.
HyperSQL fully supports all of SQL-92 data access statements, plus some additions from SQL:2008. Due to time constraints, the current version of this Guide does not cover the subject fully. You are advised to consult an SQL book such as the recent O'Reilly title "SQL and Relational Theory" by C. J. Date.
Database queries are data access statements. The most commonly used data access statement is the SELECT statement, but there are other statements that perform a similar role. Data access statements access tables and return result tables. The returned result tables are falsely called result sets, as they are not necessarily sets of rows, but multisets of rows.
Result tables are formed by performing the following operations on base tables and views. These operations are loosely based on Relational Algebra.
JOIN operations
SET and MULTISET operations
SELECTION
PROJECTION
COMPUTING
COLUMN NAMING
GROUPING and AGGREGATION
SELECTION AFTER GROUPING OR AGGREGATION
SET and MULTISET (COLLECTION) OPERATIONS
ORDERING
SLICING
Conceptually, the operations are performed one by one in the above order if they apply to the given data access statement. In the example below a simple select statement is made more complex by adding various operations.
CREATE TABLE atable (a INT, b INT, c INT, d INT, e INT, f INT); /* in the next SELECT, no join is performed and no further operation takes place */ SELECT * FROM atable /* in the next SELECT, selection is performed by the WHERE clause, with no further action */ SELECT * FROM atable WHERE a + b = c /* in the next SELECT, projection is performed after the other operations */ SELECT d, e, f FROM atable WHERE a + b = c /* in the next SELECT, computation is performed after projection */ SELECT (d + e) / f FROM atable WHERE a + b = c /* in the next two SELECT statements, column naming is performed in different ways*/ SELECT (a + e) / f AS calc, f AS div FROM atable WHERE a + b = c SELECT dcol, ecol, fcol FROM atable(acol, bcol, ccol, dcol, ecol, fcol) WHERE acol + bcol = ccol /* in the next SELECT, both grouping and aggregation is performed */ SELECT d, e, SUM(f) FROM atable GROUP BY d, e /* in the next SELECT, selection is performed after grouping and aggregation is performed */ SELECT d, e, SUM(f) FROM atable GROUP BY d, e HAVING SUM(f) > 10 /* in the next SELECT, a UNION is performed on two selects from the same table */ SELECT d, e, f FROM atable WHERE d = 3 UNION SELECT a, b, c FROM atable WHERE a = 30 /* in the next SELECT, ordering is performed */ SELECT (a + e) / f AS calc, f AS div FROM atable WHERE a + b = c ORDER BY calc DESC, div NULLS LAST /* in the next SELECT, slicing is performed after ordering */ SELECT * FROM atable WHERE a + b = c ORDER BY a FETCH 5 ROWS ONLY
The next sections discuss various types of tables and operations involved in data access statements.
The SELECT statement itself does not cover all types of data
access statements, which may combine multiple SELECT statements. The
<query specification>
is the most common data
access statement and begins with the SELECT keyword.
SELECT STATEMENT
select statement (general)
Users generally refer to the SELECT statement when they mean a
<query specification>
or <query
expression>
. If a statement begins with SELECT and has no
UNION or other set operations, then it is a <query
specification>
. Otherwise it is a <query
expression>
.
In data access statements, a table can be a database table (or view) or an ephemeral table formed for the duration of the query. Some types of table are <table primary> and can participate in joins without the use of extra parentheses. The BNF in the Table Primary section below lists different types of <table primary>:
Tables can also be formed by specifying the values that are contained in them:
<table value constructor> ::= VALUES <row
value expression list>
<row value expression list> ::= <table row
value expression> [ { <comma> <table row value
expression> }... ]
In the example below a table with two rows and 3 columns is constructed out of some values:
VALUES (12, 14, null), (10, 11, CURRENT_DATE)
When a table is used directly in a UNION or similar operation, the keyword TABLE is used with the name:
<explicit table> ::= TABLE <table or query
name>
In the examples below, all rows of the two tables are included in the union. The keyword TABLE is used in the first example. The two examples below are equivalent.
TABLE atable UNION TABLE anothertable SELECT * FROM atable UNION SELECT * FROM anothertable
A subquery is simply a query expression in brackets. A query expression is usually a complete SELECT statement and is discussed in the rest of this chapter. A scalar subquery returns one row with one column. A row subquery returns one row with one or more columns. A table subquery returns zero or more rows with one or more columns. The distinction between different forms of subquery is syntactic. Different forms are allowed in different contexts. If a scalar subquery or a row subquery return more than one row, an exception is raised. If a scalar or row subquery returns no row, it is usually treated as returning a NULL. Depending on the context, this has different consequences.
<scalar subquery> ::= <subquery>
<row subquery> ::= <subquery>
<table subquery> ::= <subquery>
<subquery> ::= <left paren> <query
expression> <right paren>
A query specification is also known as a SELECT statement. It is
the most common form of <derived table>
. A
<table expression>
is a base table, a view or
any form of allowed derived table. The SELECT statement performs
projection, naming, computing or aggregation on the rows of the
<table expression>
.
<query specification> ::= SELECT [ DISTINCT |
ALL ] <select list> <table expression>
<select list> ::= <asterisk> | <select
sublist> [ { <comma> <select sublist> }... ]
<select sublist> ::= <derived column> |
<qualified asterisk>
<qualified asterisk> ::= <asterisked
identifier chain> <period> <asterisk>
<asterisked identifier chain> ::= <asterisked
identifier> [ { <period> <asterisked identifier> }... ]
<asterisked identifier> ::=
<identifier>
<derived column> ::= <value expression> [
<as clause> ]
<as clause> ::= [ AS ] <column name>
The qualifier DISTINCT or ALL apply to the results of the SELECT statement after all other operations have been performed. ALL simply returns the rows, while DISTINCT compares the rows and removes the duplicate ones.
Projection is performed by the <select
list>
.
A single <asterisk>
means all columns of
the <table expression>
are included, in the
same order as they appear in the <table
expression>
. An asterisk qualified by a table name means
all the columns of the qualifier table name are included. If an
unqualified asterisk is used, then no other items are allowed in the
<select list>
. When the the <table
expression>
is the direct result of NATURAL or USING joins,
the use of <asterisk>
includes the columns used
for the join before the other columns. A qualified asterisk does not
cover the join columns.
A derived column is a <value expression>
,
optionally named with the <as clause>
. A
<value expression>
can be many things. Common
types include: the name of a column in the <table
expression>
; an expression based on different columns or
constant values; a function call; an aggregate function; a CASE WHEN
expression.
A table expression is part of the SELECT statement and consists of the FROM clause with optional other clauses that performs selection (of rows) and grouping from the table(s) in the FROM clause.
<table expression> ::= <from clause> [
<where clause> ] [ <group by clause> ] [ <having
clause> ]
<from clause> ::= FROM <table reference> [ {
<comma> <table reference> }... ]
<table reference> ::= <table primary> |
<joined table>
<table primary> ::= <table or query name>
[ [ AS ] <correlation name> [ <left paren> <derived
column list> <right paren> ] ]
| <derived table> [ AS ] <correlation
name> [ <left paren> <derived column list> <right
paren> ]
| <lateral derived table> [ AS ] <correlation
name> [ <left paren> <derived column list> <right
paren> ]
| <collection derived table> [ AS ]
<correlation name> [ <left paren> <derived column
list> <right paren> ]
| <table function derived table> [ AS ]
<correlation name> [ <left paren> <derived column
list> <right paren> ]
| <parenthesized joined table> [ AS ]
<correlation name> [ <left paren> <derived column
list> <right paren> ]
<where clause> ::= WHERE <boolean value
expression>
<group by clause> ::= GROUP BY [ <set
quantifier> ] <grouping element> [ { <comma> <grouping
element> }... ]
<having clause> ::= HAVING <boolean value
expression>
The <from clause>
contains one or more
<table reference>
separated by commas. A table
reference is often a table or view name or a joined table.
The <where clause>
filters the rows of
the table in the <from clause> and removes the rows for which the
search condition is not TRUE.
The <group by clause>
is a comma
separated list of columns of the table in the <from
clause>
or expressions based on the columns.
When a <group by clause>
is used, only
the columns used in the <group by clause>
or
expressions used there, can be used in the <select
list>
, together with any <aggregate
function>
on other columns. A <group by
clause>
compares the rows and groups together the rows that
have the same values in the columns of the <group by
clause>
. Then any <aggregate
function>
in the <select list>
is
performed on each group, and for each group, a row is formed that
contains the values of the columns of the <group by
clause>
and the values returned from each
<aggregate function>.
In the first example
below, a simple column reference is used in GROUP BY, while in the
second example, an expression is used.
CREATE TABLE atable (a INT, b INT, c INT, d INT, e INT, f INT); SELECT d, e, f FROM atable WHERE a + b = c GROUP BY d, e, f SELECT d + e, SUM(f) FROM atable WHERE a + b = c GROUP BY d + e HAVING SUM(f) > 2 AND d + e > 4
A <having clause>
filters the rows of the
table that is formed after applying the <group by
clause>
using its search condition. The search condition
must be an expression based on the expressions in the GROUP BY list or
the aggregate functions used.
Table primary refers to different forms of table reference in the FROM clause.
The simplest form of reference is simply a name. This is the name of a table, a view, a transition table in a trigger definition, or a query name specified in the WITH clause of a query expression.
<table or query name> ::= <table name> |
<transition table name> | <query name>
derived table
A query expression that is enclosed in parentheses and returns
from zero to many rows is a <table subquery>
.
In a <derived table>
the query expression is
self contained and cannot reference the columns of other table
references. This is the traditional and most common form of use of a
<table subquery>
.
<derived table> ::= <table
subquery>
LATERAL
When the word LATERAL is used before a <table subquery>, it means the query expression can reference the columns of other table references that go before it.
<lateral derived table> ::= LATERAL <table
subquery>
The use of <lateral derived table> completely transforms the way a query is written. For example, the two queries below are equivalent, but with different forms. The query with LATERAL is evaluated separately for each row of the first table that satisfies the WHERE condition. The example below uses the tables that are created and populated in DatabaseManagerSwing with the "Insert test data" menu option. The first query uses a scalar subquery to compute the sum of invoice values for each customer. The second query is equivalent and uses a join with a LATERAL table.
SELECT firstname, lastname, (SELECT SUM(total) FROM invoice WHERE customerid = customer.id) s FROM customer SELECT firstname, lastname, a.c FROM customer, LATERAL(SELECT SUM(total) FROM invoice WHERE customerid = customer.id) a (c)
UNNEST
UNNEST is similar to LATERAL, but instead of a query expression, one or more expressions that return an array are used. These expressions are converted into a table which has one column for each expression and contains the elements of the array. If WITH ORDINALITY is used, an extra column that contains the index of each element is added to this table. The number or rows in the table equals the length of the largest arrays. The smaller arrays are padded with NULL values. If an <array value expression> evaluates to NULL, an empty array is used in its place. The array expression can contain references to any column of the table references preceding the current table reference.
<collection derived table> ::= UNNEST <left
paren> <array value expression>, ... <right paren> [ WITH
ORDINALITY ]
The <array value expression>
can be the
result of a function call. If the arguments of the function call are
values from the tables on the left of the UNNEST, then the function is
called for each row of table.
In the first example below, UNNEST is used with the built in-function SEQUENCE_ARRAY to build a table containing dates for the last seven days and their ordinal position. In the second example, a select statement returns costs for the last seven days . In the third example, the WITH clause turns the two selects into named subqueries which are used in a SELECT statement that uses a LEFT join.
SELECT * FROM UNNEST(SEQUENCE_ARRAY(CURRENT_DATE - 7 DAY, CURRENT_DATE - 1 DAY, 1 DAY)) WITH ORDINALITY AS T(D, I) D I ---------- - 2010-07-25 1 2010-07-26 2 2010-07-27 3 2010-07-28 4 2010-07-29 5 2010-07-30 6 2010-07-31 7 CREATE TABLE expenses (item_date DATE, cost DECIMAL(8,2)) -- SELECT item_date, SUM(cost) AS s FROM expenses WHERE item_date >= CURRENT_DATE - 7 DAY GROUP BY item_date ITEM_DATE S ---------- ------ 2010-07-27 100.12 2010-07-29 50.45 WITH costs(i_d, s) AS (SELECT item_date, SUM(cost) AS s FROM expenses WHERE item_date >= CURRENT_DATE - 7 DAY GROUP BY item_date), dates(d, i) AS (SELECT * FROM UNNEST(SEQUENCE_ARRAY(CURRENT_DATE - 7 DAY, CURRENT_DATE - 1 DAY, 1 DAY)) WITH ORDINALITY) SELECT i, d, s FROM dates LEFT OUTER JOIN costs ON dates.d = costs.i_d I D S - ---------- ------ 1 2010-07-25 (null) 2 2010-07-26 (null) 3 2010-07-27 100.12 4 2010-07-28 (null) 5 2010-07-29 50.45 6 2010-07-30 (null) 7 2010-07-31 (null)
TABLE
When TABLE is used in this context, the <collection value expression> must be the result of a function call to a built-in function or user-defined function that returns an array or a table. When the function returns an array, this array is converted into a table, similar to the way UNNEST operates. When the function returns a table, the result is a MULTISET and is used as is.
<table function derived table> ::= TABLE <left
paren> <collection value expression> <right
paren>
column name list
The column list that is specified for the table reference must contain names that are unique within the list
<derived column list> ::= <column name
list>
<column name list> ::= <column name> [ {
<comma> <column name> }... ]
A parenthesized joined table is simply a joined table contained in parentheses. Joined tables are discussed below.
<parenthesized joined table> ::= <left paren>
<parenthesized joined table> <right paren> | <left
paren> <joined table> <right paren>
Joins are operators with two table as the operands, resulting in a third table, called joined table. All join operators are evaluated left to right, therefore, with multiple joins, the table resulting from the first join operator becomes an operand of the next join operator. Parentheses can be used to group sequences of joined tables and change the evaluation order. So if more than two tables are joined together with join operators, the end result is also a joined table. There are different types of join, each producing the result table in a different way.
<joined table> ::= <cross join> |
<qualified join> | <natural join>
<cross join> ::= <table reference> CROSS JOIN
<table factor>
<qualified join> ::= <table reference> | [
<join type> ] JOIN <table reference> <join
specification>
<natural join> ::= <table reference> NATURAL
[ <join type> ] JOIN <table factor>
<join specification> ::= <join condition> |
<named columns join>
<join condition> ::= ON <search
condition>
<named columns join> ::= USING <left paren>
<join column list> <right paren>
<join type> ::= INNER | <outer join type> [
OUTER ]
<outer join type> ::= LEFT | RIGHT |
FULL
<join column list> ::= <column name
list>
CROSS JOIN
The simplest form of join is CROSS JOIN. The CROSS JOIN of two tables is a table that has all the columns of the first table, followed by all the columns of the second table, in the original order. Each row of the first table is combined with each row of the second table to fill the rows of the new table. If the rows of each table form a set, then the rows of the CROSS JOIN table form the Cartesian product of the rows of the two table operands.
Conditions are not allowed as part of a cross join, which is
simply A CROSS JOIN B
. Any conditions in a WHERE
clause are later applied to the table resulting from the cross
join.
Tables in the FROM CLAUSE separated with commas, are equivalent to
cross joins between the tables. Two joined tables separated with a
comma, such as A, B
, is equivalent to (A) CROSS JOIN
(B), which means the joined tables A and B are populated separately
before they are joined.
CROSS JOIN is not is not generally very useful, as it returns large result tables unless WHERE conditions are used.
UNION JOIN
The UNION JOIN has limited use in queries. The result table has
the same columns as that of CROSS JOIN. Each row of the first table is
extended to the right with nulls and added to the new table. Each row of
the second table is extended to the left with nulls and added to the new
table. The UNION JOIN is expressed as A UNION JOIN B
.
This should not be confused with A UNION B
, which is
a set operation. Union join is for special applications and is not
commonly used.
JOIN ... ON
The condition join is similar to CROSS JOIN, but a condition is
tested for each row of the new table and the row is created only if the
condition is true. This form of join is expressed as A JOIN B
ON (<search condition>)
.
Equijoin is a condition join in which the search condition is an equality condition between on or more pairs of columns from the two table. Equijoin is the most commonly used type of join.
SELECT a.*, b.* FROM a INNER JOIN b ON a.col_one = b.col_two
JOIN ... USING
NATURAL JOIN
Joins with USING or NATURAL keywords joins are similar to an
equijoin but they cannot be replaced simply with an equijoin. The new
table is formed with the specified or implied shared columns of the two
tables, followed by the rest of the columns from each table. In NATURAL
JOIN, the shared columns are all the column pairs that have the same
name in the first and second table. In JOIN USING, only columns names
that are specified by the USING clause are shared. The joins are
expressed as A NATURAL JOIN B
, and A JOIN B
USING (<comma separated column name list>)
.
The columns of the joined table are formed by the following procedures: In JOIN ... USING the shared columns are added to the joined table in the same order as they appear in the column name list. In NATURAL JOIN the shared columns are added to the joined table in the same order as they appear in the first table. In both forms of join, the non-shared columns of the first table are added in the order they appear in the first table, finally the non-shared columns of the second table are added in the order they appear in the second table.
The type of each shared column of the joined table is based on the type of the columns in the original tables. If the original types are not exactly the same, the type of the shared column is formed by type aggregation. Type aggregations selects a type that can represent values of both aggregated types. Simple type aggregation picks one of the types. For example SMALLINT and INTEGER, results in INTEGER, or VARCHAR(10) and VARCHAR(20) results in VARCHAR(20). More complex type aggregation inherits properties from both types. For example DECIMAL(8) and DECIMAL (6,2) results in DECIMAL (8,2).
In the examples below, the rows are joined exactly the same way, but the first query contains a.col_two and b.col_two together with all the rest of the columns of both tables, while the second query returns only one copy of col_two.
SELECT * FROM a INNER JOIN b ON a.col_two = b.col_two SELECT * FROM a INNER JOIN b USING (col_two)
OUTER JOIN
LEFT, RIGHT and FULL OUTER JOIN
The three qualifiers can be added to all types of JOIN except
CROSS JOIN and UNION JOIN. First the new table is populated with the
rows from the original join. If LEFT is specified, all the rows from the
first table that did not make it into the new table are extended to the
right with nulls and added to the table. If RIGHT is specified, all the
rows from the second table that did not make it into the new table are
extended to the left with nulls and added to the table. If FULL is
specified, the addition of leftover rows is performed from both the
first and the second table. These forms are expressed by prefixing the
join specification with the given keyword. For example A LEFT
OUTER JOIN B ON (<search condition>)
or A
NATURAL FULL OUTER JOIN B
or A FULL OUTER JOIN B
USING (<comma separated column name list>)
.
SELECT a.*, b.* FROM a LEFT OUTER JOIN b ON a.col_one = b.col_two
Despite the name, selection has nothing to do with the list of columns in a SELECT statement. In fact, it refers to the search condition used to limit the rows that from a result table (selection of rows, not columns). In SQL, simple selection is expressed with a WHERE condition appended to a single table or a joined table. In some cases, this method of selection is the only method available. For example in DELETE and UPDATE statements. But when it is possible to perform the selection with join conditions, this is the better method, as it results in a clearer expression of the query.
Projection is selection of the columns from a simple or joined table to form a result table. Explicit projection is performed in the SELECT statement by specifying the select column list. Some form of projection is also performed in JOIN ... USING and NATURAL JOIN.
The joined table has columns that are formed according to the rules mentioned above. But in many cases, not all the columns are necessary for the intended operation. If the statement is in the form, SELECT * FROM <joined table>, then all the columns of <joined table> are returned. But normally, the columns to be returned are specified after the SELECT keyword, separated from each other with commas.
In the select list, it is possible to use expressions that reference any columns of <joined table>. Each of these expressions forms a computed column. It is computed for each row of the result table, using the values of the columns of the <joined table> for that row.
Naming is used to hide the original names of tables or table columns and to replace them with new names in the scope of the query. Naming is also used for defining names for computed columns.
Without explicit naming, the name of a column is a predefined name. If the column is a column of a table, or is a named parameter, the name is of the table column or parameter is used. Otherwise it is generated by the database engine. HyperSQL generates column names such as C1, C2, etc. As generated naming is implementation defined according to the SQL Standard, it is better to explicitly name the computed and derived columns in your applications.
Naming in Joined Table
Naming is performed by adding a new name after a table's real name
and by adding a list of column names after the new table name. Both
table naming and column naming are optional, but table naming is
required for column naming. The expression A [AS] X (<comma
separated column name list>)
means table A is used in the
query expression as table X and its columns are named as in the given
list. The original name A, or its original column names, are not visible
in the scope of the query. The BNF is given below. The
<correlation name>
can be the same or different
from the name of the table. The <derived column
list>
is a comma separated list of column names. The degree
of this list must be equal to the degree of the table. The column names
in the list must be distinct. They can be the same or different from the
names of the table's columns.
<table or query name> [ [ AS ] <correlation
name> [ <left paren> <derived column list> <right
paren> ] ]
In the examples below, the columns of the original tables are named (a, b, c, d, e, f). The two queries are equivalent. In the second query, the table and its columns are renamed and the new names are used in the WHERE clauses:
CREATE TABLE atable (a INT, b INT, c INT, d INT, e INT, f INT); SELECT d, e, f FROM atable WHERE a + b = c SELECT x, y, z FROM atable AS t (u, v, w, x, y, z) WHERE u + v = w
Naming in Select List
Naming in the SELECT list logically takes place after naming in the joined table. The new names for columns are not visible in the immediate query expression or query expression. They become visible in the ORDER BY clause and in the result table that is returned to the user. Or if the query expression is used as a derived table in an enclosing query expression.
In the example below, the query is on the same table but with column renaming in the Select list. The new names are used in the ORDER BY clause:
SELECT x + y AS xysum, y + z AS yzsum FROM atable AS t (u, v, w, x, y, z) WHERE u + v = w ORDER BY xysum, yzsum
If the names xysum
or yzsum
are not used, the computed columns cannot be referenced in the ORDER BY
list.
Name Resolution
In a joined table, if a column name appears in tables on both sides then any reference to the name must use the table name in order to specify which table is being referred to.
Grouping Operations
Grouping results in the elimination of duplicate rows. A grouping operation is performed after the operations discussed above. A simple form of grouping is performed by the use of DISTINCT after SELECT. This eliminates all the duplicate rows (rows that have the same value in each of their columns when compared to another row). The other form of grouping is performed with the GROUP BY clause. This form is usually used together with aggregation.
Aggregation is an operation that computes a single value from the values of a column over several rows. The operation is performed with an aggregate function. The simplest form of aggregation is counting, performed by the COUNT function.
Other common aggregate functions return the maximum, minimum and average value among the values in different rows.
Set and Multiset Operations
While join operations generally result in laterally expanded tables, SET and COLLECTION operations are performed on two tables that have the same degree and result in a table of the same degree. The SET operations are UNION, INTERSECT and EXCEPT (difference). When each of these operations is performed on two tables, the collection of rows in each table and in the result is reduced to a set of rows, by eliminating duplicates. The set operations are then performed on the two tables, resulting in the new table which itself is a set of rows. Collection operations are similar but the tables are not reduced to sets before or after the operation and the result is not necessarily a set, but a collection of rows.
The set operations on two tables A and B are: A UNION
[DISTINCT] B
, A INTERSECT [DISTINCT] B
and
A EXCEPT [DISTINCT] B
. The result table is formed in
the following way: The UNION operation adds all the rows from A and B
into the new table, but avoids copying duplicate rows. The INTERSECT
operation copies only those rows from each table that also exist in the
other table, but avoids copying duplicate rows. The EXCEPT operation
copies those rows from the first table which do not exist in the second
table, but avoids copying duplicate rows.
The collection operations are similar to the set operations, but
can return duplicate rows. They are A UNION ALL B
,
A INTERSECT ALL B
and A EXCEPT ALL
B
. The UNION ALL operation adds all the rows from A and B into
the new table. The INTERSECT operation copies only those rows from each
table that also exist in the other table. If n copies of a rows exists
in one table, and m copies in the other table, the number of copies in
the result table is the smaller of n and m. The EXCEPT operation copies
those rows from the first table which do not exist in the second table.
If n copies of a row exist in the first table and m copies in the second
table the number of copies in the result table is n-m, or if n < m,
then zero.
The optional WITH clause can be used in a query expression. The
WITH clause lists one or more named ephemeral tables that can be
referenced in the query expression body. The ephemeral tables are
created and populated before the rest of the query expression is
executed. Their contents do not change during the execution of the
<query expression body>
.
<with clause> ::= WITH [ RECURSIVE ] <with
list>
<with list> ::= <with list element> [ {
<comma> <with list element> }... ]
<with list element> ::= <query name> [
<left paren> <with column list> <right paren> ] AS
<left paren> <query expression> <right paren>
<with column list> ::= <column name
list>
An example of the use of the WITH clause is given above under
UNNEST. The <query expression>
in the WITH
clause is evaluated once and its result table can be referenced in the
body of the main <query expression body>
using
the specified <query name>.
The RECURSIVE keyword changes the way the elements of the
<with list>
are interpreted. The
<query expression>
contained in the
<with list element>
must be the UNION or UNION
ALL of two <query expression body> elements (simple VALUES or
SELECT statements). The left element of the UNION is evaluated first and
its result becomes the result of the <with list
element>
. After this step, the current result of the
<with list element> is referenced in the right element (a SELECT
statement) of the UNION, the UNION is performed between the result and
previous result of the <with list element>
,
which is enlarged by this operation. The UNION operation is performed
again and again, until the result of the <with list
element>
stops changing. The result of the
<with list element>
is now complete and is
later used in the execution of the <query expression
body>
. When RECURSIVE is used, the <with column
list>
must be defined.
HyperSQL limits recursion to 265 rounds. If this is exceeded, an error is raised.
A trivial example of a recursive query is given below. Note the first column GEN. For example, if each row of the table represents a member of a family of dogs, together with its parent, the first column of the result indicates the calculated generation of each dog, ranging from first to fourth generation.
CREATE TABLE pptree (pid INT, id INT); INSERT INTO pptree VALUES (NULL, 1) ,(1,2), (1,3),(2,4),(4,5),(3,6),(3,7); WITH RECURSIVE tree (gen, par, child) AS ( VALUES(1, CAST(null as int), 1) UNION SELECT gen + 1, pid, id FROM pptree, tree WHERE pid = child ) SELECT * FROM tree; GEN PAR CHILD --- ------ ----- 1 (null) 1 2 1 2 2 1 3 3 2 4 3 3 6 3 3 7 4 4 5
if recursive queries become complex they also become very difficult to develop and debug. HyperSQL provides an alternative to this with user-defined SQL functions which return tables. Functions can perform any complex, repetitive task with better control, using loops, variables and, if necessary, recursion.
A query expression consists of an optional WITH clause and a query expression body. The optional WITH clause lists one or more named ephemeral tables that can be referenced, just like the database tables in the query expression body.
<query expression> ::= [ <with clause> ]
<query expression body>
A query expression body refers to a table formed by using UNION and other set operations. The query expression body is evaluated from left to right and the INTERSECT operator has precedence over the UNION and EXCEPT operators. A simplified BNF is given below:
<query expression body> ::= <query term> |
<query expression body> UNION | EXCEPT [ ALL | DISTINCT ] [
<corresponding spec> ] <query term>
<query term> ::= <query primary> |
<query term> INTERSECT [ ALL | DISTINCT ] [ <corresponding
spec> ] <query term>
<query primary> ::= <simple table> |
<left paren> <query expression body> [ <order by
clause> ] [ <result offset clause> ] [ <fetch first
clause> ] <right paren>
<simple table> ::= <query specification> |
<table value constructor> | <explicit table> <explicit
table> ::= TABLE <table or query name>
<corresponding spec> ::= CORRESPONDING [ BY
<left paren> <column name list> <right paren>
]
A <query term>
and a <query
primary>
can be a SELECT statement, an
<explicit table>
, or a <table value
constructor>
.
The CORRESPONDING clause is optional. If it is not specified, then
the <query term>
and the <query
primary>
must have the same number of columns. If
CORRESPONDING is specified, the two sides need not have the same number
of columns. If no column list is used with CORRESPONDING, then all the
column names that are common in the tables on two sides are used in the
order in which they appear in the first table. If a columns list is
used, it allows you to select only some columns of the tables on the
left and right side to create the new table. In the example below the
columns named u and v from the two SELECT statements are used to create
the UNION table.
SELECT * FROM atable UNION CORRESPONDING BY (u, v) SELECT * FROM anothertable
The type of each column of the query expression is determined by combining the types of the corresponding columns from the two participating tables.
When the rows of the result table have been formed, it is possible to specify the order in which they are returned to the user. The ORDER BY clause is used to specify the columns used for ordering, and whether ascending or descending ordering is used. It can also specify whether NULL values are returned first or last.
SELECT x + y AS xysum, y + z AS yzsum FROM atable AS t (u, v, w, x, y, z) WHERE u + v = w ORDER BY xysum NULLS LAST, yzsum NULLS FIRST
The ORDER BY clause specifies one or more <value
expressions>
. The list of rows is sorted according to the
first <value expression>
. When some rows are
sorted equal then they are sorted according to the next
<value expression>
and so on.
<order by clause> ::= ORDER BY <sort
specification> [ { <comma> <sort specification> }...
]
<sort specification> ::= <value expression> [
<collate clause> ] [ ASC | DESC ] [ NULLS FIRST | NULLS LAST
]
The defaults are ASC and NULLS FIRST.
A collation is used for columns of an ORDER BY expression that are
of the type CHAR or VARCHAR. If a <collate
clause>
is not specified then the collation of the column,
or the default collation of the database is used.
An ORDER BY operation can sometimes be optimised by the engine when it can use the same index for accessing the table data and ordering. Optimisation can happen both with DESC + NULLS LAST and ASC + NULLS FIRST.
sort specification list
sort specification list
<sort specification list> ::= <value
expression> [ASC | DESC] [NULLS FIRST | NULLS
LAST]
Specify a sort order. A sort operation is performed on the
result of a <query expression>
or
<query specification>
and sorts the result
according to one or more <value expression>
.
The <value expression>
is usually a single
column of the result, but in some cases it can be a column of the
<table expression>
that is not used in the
select list. The default is ASC and NULLS FIRST.
A different form of limiting the rows can be performed on the result table after it has been formed according to all the other operations (selection, grouping, ordering etc.). This is specified by the FETCH ... ROWS and OFFSET clauses of a SELECT statement. In this form, the specified OFFSET rows are removed from start of the table, then up to the specified FETCH rows are kept and the rest of the rows are discarded.
<result offset clause> ::= OFFSET <offset row
count> { ROW | ROWS }
<fetch first clause> ::= FETCH { FIRST | NEXT } [
<fetch first row count> ] { ROW | ROWS } ONLY [ USING INDEX
]
<limit clause> ::= LIMIT <fetch first row
count> [ USING INDEX ]
A slicing operation takes the result set that has been already processed and ordered. It then discards the specified number of rows from the start of the result set and returns the specified number of rows after the discarded rows. The <offset row count> and <fetch first row count> can be constants, dynamic variables, routine parameters, or routine variables. The type of the constants must be INTEGER.
SELECT a, b FROM atable WHERE d < 5 ORDER BY absum OFFSET 3 FETCH 2 ROWS ONLY SELECT a, b FROM atable WHERE d < 5 ORDER BY absum OFFSET 3 LIMIT 2 /* alternative keyword */
When the FETCH keyword is used, the specified number of rows must be at least 1, otherwise an error is returned. This behaviour is consistent with the SQL Standard. When the LIMIT keyword is used, the specified number of rows can be zero, which means return all rows (no LIMIT). In MySQL and PostgreSQL syntax modes, zero limit means no rows (empty result).
If there is an index on all the columns specified in the ORDER BY clause, it is normally used for slicing. In some queries, an index on another column may take precedence. It is possible to add USING INDEX to the end of the slicing clause to force the use of the index for ordering and slicing.
DELETE FROM
delete statement: searched
<delete statement: searched> ::= DELETE FROM
<target table> [ [ AS ] <correlation name> ] [ WHERE
<search condition> ]
Delete rows of a table. The search condition is a
<boolean value expression>
that is evaluated
for each row of the table. If the condition is true, the row is deleted.
If the condition is not specified, all the rows of the table are
deleted. In fact, an implicit SELECT is performed in the form of
SELECT * FROM <target table> [ WHERE <search
condition>]
and the selected rows are deleted. When used in
JDBC, the number of rows returned by the implicit SELECT is returned as
the update count.
If there are FOREIGN KEY constraints on other tables that reference the subject table, and the FOREIGN KEY constraints have referential actions, then rows from those other tables that reference the deleted rows are either deleted, or updated, according to the specified referential actions.
In the second example below the rows that have the maximum value for column A are deleted;
DELETE FROM T WHERE C > 5 DELETE FROM T AS TT WHERE TT.A = (SELECT MAX(A) FROM T)
TRUNCATE TABLE
truncate table statement
<truncate table statement> ::= TRUNCATE TABLE
<target table> [ <identity column restart option> ] [
<truncate options> ]
<identity column restart option> ::= CONTINUE
IDENTITY | RESTART IDENTITY
<truncate options> ::= AND COMMIT [ NO CHECK
]
Delete all rows of a table without firing its triggers. This
statement can only be used on base tables (not views). If the table is
referenced in a FOREIGN KEY constraint defined on another table, the
statement causes an exception. Triggers defined on the table are not
executed with this statement. The default for <identity
column restart option>
is CONTINUE
IDENTITY
. This means no change to the IDENTITY sequence of the
table. If RESTART IDENTITY
is specified, then the
sequence is reset to its start value.
TRUNCATE is faster than ordinary DELETE. The TRUNCATE statement is an SQL Standard data change statement, therefore it is performed under transaction control and can be rolled back if the connection is not in the auto-commit mode.
HyperSQL also supports the optional AND COMMIT and NO CHECK options. If AND COMMIT is used, then the transaction is committed with the execution of the TRUNCATE statement. The action cannot be rolled back. If the additional NO CHECK option is also specified, then the TRUNCATE statement is executed even if the table is referenced in a FOREIGN KEY constraint defined on another, non-empty table. This form of TRUNCATE is faster than the default form and does not use much memory.
TRUNCATE SCHEMA
truncate schema statement
<truncate schema statement> ::= TRUNCATE SCHEMA
<target schema> [ <identity column restart option> ] AND
COMMIT [ NO CHECK ]
Performs the equivalent of a TRUNCATE TABLE ... AND COMMIT on all the table in the schema. If the additional NO CHECK option is also specified, then the TRUNCATE statement is executed even if any of the tables in the schema is referenced in a FOREIGN KEY constraint defined on a non-empty table in a different schema.
If RESTART IDENTITY is specified, all table IDENTITY sequences and all SEQUENCE objects in the schema are reset to their start values.
Use of this statement requires schema ownership or administrative privileges.
INSERT INTO
insert statement
<insert statement> ::= INSERT INTO <target
table> <insert columns and source>
<insert columns and source> ::= <from
subquery> | <from constructor> | <from
default>
<from subquery> ::= [ <left paren>
<insert column list> <right paren> ] [ <override
clause> ] <query expression>
<from constructor> ::= [ <left paren>
<insert column list> <right paren> ] [ <override
clause> ] <contextually typed table value
constructor>
<override clause> ::= OVERRIDING USER VALUE |
OVERRIDING SYSTEM VALUE
<from default> ::= DEFAULT
VALUES
<insert column list> ::= <column name
list>
Insert new rows in a table. An INSERT statement inserts one or more rows into the table.
The special form, INSERT INTO <target table>
DEFAULT VALUES
can be used with tables which have a default
value for each column.
With the other forms of INSERT, the optional
(<insert column list>)
specifies to which
columns of the table the new values are assigned.
In one form, the inserted values are from a <query
expression>
and all the rows that are returned by the
<query expression>
are inserted into the table.
If the <query expression>
returns no rows,
nothing is inserted.
In the other form, a comma separated list of values called
<contextually typed table value constructor>
is
used to insert one or more rows into the table. This list is
contextually typed, because the keywords NULL and DEFAULT can be used
for the values that are assigned to each column of the table. The
keyword DEFAULT means the default value of the column and can be used
only if the target column has a default value or is an IDENTITY or
GENERATED column of the table.
The <override clause>
must be used
when a value is explicitly assigned to a column that has been defined as
GENERATED ALWAYS AS IDENTITY. The clause, OVERRIDE SYSTEM VALUE means
the provided values are used for the insert, while OVERRIDING USER VALUE
means the provided values are simply ignored and the values generated by
the system are used instead.
An array can be inserted into a column of the array type by using literals, by specifying a parameter in a prepared statement or an existing array returned by query expression. The last example below inserts an array.
The rows that are inserted into the table are checked against all the constraints that have been declared on the table. The whole INSERT operation fails if any row fails to inserted due to constraint violation. Examples:
INSERT INTO T DEFAULT VALUES /* all columns of T have DEFAULT clauses */ INSERT INTO T (SELECT * FROM Z) /* table Z has the same columns as table T */ INSERT INTO T (A,B) VALUES ((1,2),(3,NULL), (DEFAULT,6)) /* three rows are inserted into table T */ INSERT INTO T VALUES 3, ARRAY['hot','cold']
If the table contains an IDENTITY column, the value for this column for the last row inserted by a session can be retrieved using a call to the IDENTITY() function. This call returns the last value inserted by the calling session. When the insert statement is executed with a JDBC Statement or PreparedStatement method, the getGeneratedKeys() method of Statement can be used to retrieve not only the IDENTITY column, but also any GENERATED computed column, or any other column. The getGeneratedKeys() returns a ResultSet with one or more columns. This contains one row per inserted row, and can therefore return all the generated columns for a multi-row insert.
There are three methods of specifying which generated keys should be returned. The first method does not specify the columns of the table. With this method, the returned ResultSet will have a column for each column of the table that is defined as GENERATED ... AS IDENTITY or GENERATED ... AS (<expression>). The two other methods require the user to specify which columns should be returned, either by column indexes, or by column names. With these methods, there is no restriction on which columns of the inserted values to be returned. This is especially useful when some columns have a default clause which is a function, or when there are BEFORE triggers on the table that may provide the inserted value for some of the columns.
UPDATE
update statement: searched
<update statement: searched> ::= UPDATE
<target table> [ [ AS ] <correlation name> ] SET <set
clause list> [ WHERE <search condition> ]
Update rows of a table. An UPDATE statement selects rows from
the <target table>
using an implicit SELECT
statement formed in the following manner:
SELECT * FROM <target table> [ [ AS ]
<correlation name> ] [ WHERE <search condition>
]
Then it applies the SET <set clause
list>
expression to each selected row.
If the implicit SELECT returns no rows, no update takes place. When used in JDBC, the number of rows returned by the implicit SELECT is returned as the update count.
If there are FOREIGN KEY constraints on other tables that reference the subject table, and the FOREIGN KEY constraints have referential actions, then rows from those other tables that reference the updated rows are updated, according to the specified referential actions.
The rows that are updated are checked against all the constraints that have been declared on the table. The whole UPDATE operation fails if any row violates any constraint.
set clause list
set clause list
<set clause list> ::= <set clause> [ {
<comma> <set clause> }... ]
<set clause> ::= <multiple column
assignment> | <set target> <equals operator> <update
source>
<multiple column assignment> ::= <set target
list> <equals operator> <assigned
row>
<set target list> ::= <left paren> <set
target> [ { <comma> <set target> }... ] <right
paren>
<assigned row> ::= <contextually typed row
value expression>
<set target> ::= <column
name>
<update source> ::= <value expression> |
<contextually typed value specification>
Specify a list of assignments. This is used in UPDATE, MERGE and SET statements to assign values to a scalar or row target.
Apart from setting a whole target to a value, a SET statement can set individual elements of an array to new values. The last example below shows this form of assignment to the array in the column named B.
In the examples given below, UPDATE statements with single and multiple assignments are shown. Note in the third example, a SELECT statement is used to provide the update values for columns A and C, while the update value for column B is given separately. The SELECT statement must return exactly one row . In this example the SELECT statement refers to the existing value for column C in its search condition.
UPDATE T SET A = 5 WHERE ... UPDATE T SET (A, B) = (1, NULL) WHERE ... UPDATE T SET (A, C) = (SELECT X, Y FROM U WHERE Z = C), B = 10 WHERE ... UPDATE T SET A = 3, B[3] = 'warm'
MERGE INTO
merge statement
<merge statement> ::= MERGE INTO <target
table> [ [ AS ] <merge correlation name> ] USING <table
reference> ON <search condition> <merge operation
specification>
<merge correlation name> ::= <correlation
name>
<merge operation specification> ::= <merge
when clause>...
<merge when clause> ::= <merge when matched
clause> | <merge when not matched clause>
<merge when matched clause> ::= WHEN MATCHED
THEN <merge update specification>
<merge when not matched clause> ::= WHEN NOT
MATCHED THEN <merge insert specification>
<merge update specification> ::= UPDATE SET
<set clause list>
<merge insert specification> ::= INSERT [
<left paren> <insert column list> <right paren> ] [
<override clause> ] VALUES <merge insert value
list>
<merge insert value list> ::= <left paren>
<merge insert value element> [ { <comma> <merge insert
value element> }... ] <right paren>
<merge insert value element> ::= <value
expression> | <contextually typed value
specification>
Update rows, or insert new rows into the <target
table>
. The MERGE statement uses a second table, specified
by <table reference>
, to determine the rows to
be updated or inserted. It is possible to use the statement only to
update rows or to insert rows, but usually both update and insert are
specified.
The <search condition>
matches each
row of the <table reference>
with each row of
the <target table>
. If the two rows match then
the UPDATE clause is used to update the matching row of the target
table. Those rows of <table reference>
that
have no matching rows are then used to insert new rows into the
<target table>
. Therefore, a MERGE statement
can update between 0 and all the rows of the <target
table>
and can insert between 0 and the number of the rows
in <table reference>
into the
<target table>
. If any row in the
<target table>
matches more than one row in
<table reference>
a cardinality error is
raised. On the other hand, several rows in the <target
table>
can match a single row in <table
reference>
without any error. The constraints and
referential actions specified on the database tables are enforced the
same way as for an update and an insert statement.
The MERGE statement can be used with only the WHEN NOT MATCHED clause as a conditional INSERT statement that inserts a row if no existing rows match a condition.
In the first example below, the table originally contains two
rows for different furniture. The <table
reference>
is the (VALUES(1, 'conference table'),
(14, 'sofa'), (5, 'coffee table'))
expression, which evaluates
to a table with 3 rows. When the x value for a row matches an existing
row, then the existing row is updated. When the x value does not match,
the row is inserted. Therefore one row of table t is updated from
'dining table' to 'conference table', and two rows are inserted into
table t. The second example uses a SELECT statement as the source of the
values for the MERGE.
In the third example, a new row in inserted into the table only when the primary key for the new row does not exist. This example uses parameters and should be executed as a JDBC PreparedStatement.
CREATE TABLE t (id INT PRIMARY KEY, description VARCHAR(100)) INSERT INTO t VALUES (1, 'dining table'), (2, 'deck chair') MERGE INTO t USING (VALUES(1, 'conference table'), (14, 'sofa'), (5, 'coffee table')) AS vals(x,y) ON t.id = vals.x WHEN MATCHED THEN UPDATE SET t.description = vals.y WHEN NOT MATCHED THEN INSERT VALUES vals.x, vals.y MERGE INTO t USING (SELECT * FROM tt WHERE acol = 2) AS vals(x,y) ON t.id = vals.x WHEN MATCHED THEN UPDATE SET t.description = vals.y WHEN NOT MATCHED THEN INSERT VALUES vals.x, vals.y MERGE INTO t USING (VALUES(CAST(? AS INT))) AS vals(x) ON t.id = vals.x WHEN NOT MATCHED THEN INSERT VALUES vals.x, ?
HyperSQL supports some SQL statements, expressions, functions and Java methods that report on the most recently executed statement.
The IDENTITY()
function returns the last inserted
identity value for the current session.
The GET DIAGNOSTICS
statement is supported to a
limited extent. The built-in function DIAGNOSTICS()
is
an alternative. These are normally used in SQL/PSM routines to check the
result of the last data update operation.
GET DIAGNOSTICS
get diagnostics statement
<get diagnostics statement> ::= GET DIAGNOSTICS
<simple target value specification> = ROW_COUNT
The <simple target value specification>
is
a session variable, or a routine variable or OUT parameter.
The keyword ROW_COUNT
specifies the row count
returned by the last executed statement. For INSERT, UPDATE, DELETE and
MERGE statements, this is the number of rows affected by the statement.
This is the same value as returned by JDBC execute() methods. For all
other statements, zero is returned.
The value of ROW_COUNT
is stored in the specified
target.
In future versions, more options will be supported for diagnostics values.
$Revision: 4864 $