Introduction to CQL

This article gives a high-level introduction to the Constellation Query Language (CQL), which is the central component of Infinuendo’s data modeling approach and technologies.  CQL is a member of the family of Fact-Based Modeling languages.  It was originally developed in 2008 by Clifford Heath, one of Infinuendo’s co-founders.

Fact-Based Modeling is a semantically-rich conceptual approach for modeling, querying and transforming information.  It is based on the following principles:

  • All facts of interest are conceptually represented in terms of attribute-free structures known as fact types.
  • Conceptual models are developed from concrete examples of the required information.
  • Conceptual models are verbalised in a controlled natural language.

Fact-Based Modeling originated in Europe in the 1970’s and has since evolved into a family of related languages.  One of the best known Fact-Based Modeling approaches is Object-Role Modeling (ORM) developed by Terry Halpin and his colleagues in 1990’s. ORM provides both a textual modeling language and an equivalent graphical modeling language.

CQL is a representation of ORM in plain text, and extended to cover queries. Some ideas were also adopted from the Semantics of Business Vocabulary and Business Rules (SBVR).  One of the design objectives of CQL was to create a concise and precise structured nearly-natural language for conceptual modeling that is readily understood and verified by domain experts.

This article gives a high-level introduction to the CQL.  Details on how to download, install and use the open source implementation of the CQL compiler afgen, and links to further resources can be found at the end of the article.

Constellation Query Language (CQL)

The Constellation Query Language (CQL) is a language for constructing and querying fact-based information models.

A CQL model is composed of the following kinds of statements:

  • Object Type definitions, each designated by a name (which may be multiple words, conventionally using initial upper-case).  An object type is either:
    • value type, which is the type of a single value (e.g., number, name, date, etc)
    • an entity type, which identifies a class of things of interest (e.g., country, product, asset)
    • named fact type, which is a named relationship (e.g., employment, production forecast)
  • Fact Types that declare the relationship between object types or a (boolean) characteristic of a single object type.  A fact type is designated by one or more readings which gives a verbal description of the object types’ relationship.
  • Constraints that state conditions which restrict the allowed object instances and facts in the model.
  • Units that are used to automate value conversion.
  • Instances of object types and facts conforming to fact types, which are used as examples, reference data, or for test scenarios.
  • Queries which can be asked in the modeling language, and return either true/false or a result set answer. A query can also be projected as a Derived Fact Type, akin to a view.

CQL is written as series of controlled natural language statements.  A number of key phrases have special meaning in the language but the majority of the statements are written in close to conventional natural language with a minimum of markup or special typography. This facilitates communication over conventional channels and with untrained people.

Value Type

A value type identifies a concept in the model with a single value (e.g., number, name, date, etc).  A new value type is based on a parent value type with possible range restrictions and parameters.  A value type is introduced with an “is written as” statement:

each VALUE_TYPE_ID is written as a BASE_VALUE_TYPE_ID(PARAMS)

For example:

each Quantity is written as an Unsigned Integer(32);

CQL itself has no built-in value types (you can use whatever names you like), but certain types are special when mapping to database or programming technologies, including String, Signed and Unsigned Integer, Real, Boolean, Date. Each value instance is identified by its canonical written form (so Real 2.00 is the same value as Real 2).

Value Type Constraint

Value type constraints are introduced inside curly brackets, and are defined by a list of values or value ranges (including ranges with either end open), introduced with a “restricted to” key phrase:

each VALUE_TYPE_ID is written as a BASE_VALUE_TYPE_ID restricted to { RESTRICTION }

For example:

each Season is written as a String restricted to {'Autumn', 'Spring', 'Summer', 'Winter'};

Entity Type

An entity type identifies an object type of which the instances are identified through relationships with other objects (e.g., country, product, asset).  An entity type is introduced with an “is identified by” statement. The long form lists the identifying roles and then enumerates the fact types, but a short-hand form introduced by “its” (a reference mode) provides a default has/is of fact type for a single role:

each ENTITY_TYPE_ID is identified by OBJECT_TYPE_ROLE … where
        FACT_TYPE_READINGS

each ENTITY_TYPE_ID is identified by its OBJECT_TYPE_ID

For example:

each Australian Company is identified by ACN where 
    Australian Company has one ACN,
    ACN applies to at most one Australian Company;
each Product is identified by its Name;

Subtyping

An entity type may be declared to be a subtype of one or more supertype entity types.  All the characteristics and relationships of a supertype apply to the subtype. Subtyping is introduced with a “is a kind of” key phrase. A subtype may also introduce separate identification:

each ENTITY_TYPE_ID is a kind of SUPERTYPE_ENTITY_TYPE_ID

each ENTITY_TYPE_ID is a kind of SUPERTYPE_ENTITY_TYPE_ID identified by its OBJECT_TYPE_ID

For example:

each Person is a kind of Party;
each Taxpayer is a kind of Person identified by its Tax File Number;

Fact Type

A fact type declares the relationship between two or more object types or a boolean property of a single object type.  A fact type is designated by one or more readings separated by commas. Each reading gives a verbal description of the relationship between roles (a name for the predicate) taken by the object types in the fact type (roles will be explained below.)  A binary fact type is relationship between two objects, written with connecting verb(s).

OBJECT_TYPE_ROLE verb OBJECT_TYPE_ROLE

For examples:

Customer raises Order;

ternary or greater fact type is a relationship between three or more objects, and will often be written with a verb between the first two objects and prepositions between the later objects.

OBJECT_TYPE_ROLE verb OBJECT_TYPE_ROLE preposition OBJECT_TYPE_ROLE 

For example:

Person plays Sport for Country;

unary fact type declares a boolean property of a single object type.

OBJECT_TYPE_ROLE verb

For example:

Person is deceased;
Person smokes;

Note that CQL does not understand grammar, so you are free to use any expression that makes sense to you and does not introduce key phrases in inappropriate contexts.

In addition, preceding and following text is also allowed in any fact type reading.  For example, the following statement is allowed:

role of Character was played by Cast Member in Production;

 

Multiple readings and quantifiers

For binary and greater fact types, multiple readings can be given for the different object role orderings to denote the natural way of expressing the relationship in different contexts.

The final object role in a reading for a fact type may be preceded by a quantifier to constrain the cardinality of the final object role (with respect to all but this object role).  Valid quantifiers include:

some
that
one
exactly QUANTITY
at least QUANTITY
at most QUANTITY
at least QUANTITY and at most QUANTITY
from QUANTITY to QUANTITY

where QUANTITY can be one or a number.

Here is an example of multiple readings for a fact type concerned with oil supply (note the “one” on Quantity, which implies that exactly one production forecast is being made):

some Refinery in some Supply Period will make some Product in one Quantity,
that Refinery will make that Product in that Supply Period in that Quantity,
that Refinery will make that Quantity of that Product in that Supply Period;

Named Fact Type

A fact type (including its multiple readings) may be named so it can be used as an object type in the same way as a value or entity type. Each instance of that fact type is regarded as an object, so we say the fact type is objectified.  A named fact type is introduced by the “is where” key phrase:

each NAMED_FACT_TYPE_ID is where FACT_TYPE

For example:

each Production Forecast is where
        Refinery in Supply Period will make Product in one Quantity,
        Refinery will make Product in Supply Period in Quantity,
        Refinery will make Quantity of Product in Supply Period;

Named Roles in Fact Types, Constraints and Queries

Each fact type definition includes roles, each played by one object type. An object type may appear more than once in the same fact type, representing different roles in the relationship.  For example, the Party object type appears twice in a Related Party relationship.  To distinguish them in natural language they might be known as, say, the First Party and the Second Party.  CQL offers several ways of indicating the different roles for the same object type in a fact type: as a named role, using adjectives (bound initially by a hyphen – this is special markup), or by subscripting.

named object role is introduced by a “(as …)” key phrase. Where a role name is used, it may be introduced anywhere in the statement (even after it has been first used!) but other occurrences of that role must all use the role name:

OBJECT_TYPE_ID ( as OBJECT_TYPE_ROLE )

For example:

Celebration was organised by one Organiser,
    Person (as Organiser) organised Celebration;

An adjectival object role is introduced by a hyphenated adjective either before or after the object type id:

adjective- OBJECT_TYPE_ID
OBJECT_TYPE_ID -adjective

Both forms are allowed in CQL to allow for languages and customs of having the adjective before or after the noun in an expression. Variations even allow multiple words or hyphenated words to be used, but this detail is not described here.

The first instance of an object type role in a fact type can be a object type id (i.e. the role is implicitly named), but the second and subsequent instances of an object type must take a named role to avoid ambiguity.  Each reading of a fact type has the same role naming for all its object roles.

For example, here are two readings of a fact type where the role of the first instance of Product is implicitly named, and the second instance is adjectivally named:

Product may be substituted by alternate-Product in Season,
alternate-Product is an acceptable substitute for Product in Season

Finally, where neither of the above forms applies, a numeric subscript in parentheses may be used:

Person(1) sent friendship request to Person(2),
    Person(2) received friendship request from Person(1);

Constraints

Constraints state conditions that restrict the allowed facts and object instances in the model.

Some types of constraints are embedded in kinds of CQL declarations (some of these have already been introduced):

  1. Value type constraints were introduced as value type restrictions.
  2. Single-role mandatory, uniqueness and frequency constraints were introduced as quantifiers attached to fact type readings
  3. Role value constraints (these are like value type constraints, but are embedded into a reading)

Other types of constraints are not so easily embedded, so they are introduced as separate declarations within the model. These constraints operate on the set of values a role can take in a fact type compared to the set of values in other fact types.

Subset constraints

An subset constraint declares the set of values of a role in one fact type is a subset to the set of values of the same role in another fact type.  A subset constraint is introduced by the “only if” key phrase or using “if A then B”:

FACT_TYPE_READING_2 only if FACT_TYPE_READING_1

if FACT_TYPE_READING_2 then FACT_TYPE_READING_1

The two fact type readings must have at least one role in common.  For the common roles, the set of values for these roles in reading 2 is a subset of the set of values in reading 1.  For example in a medical context which models whether a patient smokes and whether a patient is cancer-prone, if the set of patients who smoke is a subset of the set of patients who are cancer prone, then we could declare the constraint:

if some Patient smokes then that Patient is at risk of cancer;
some Person may log in only if that Person has some validated Email Address;

Equality constraints

An equality constraint declares the set of values of a role in one fact type is equal to the set of values of the same role in another fact type.  An equality constraint between two readings is introduced by the “if and only if” key phrase:

FACT_TYPE_READING_1 if and only if FACT_TYPE_READING_2

The two fact type readings must have at least one role in common.  For the common roles, the set of values for these roles in reading 2 is equal to the set of values in reading 1.  For example in a medical context which models the collection of blood pressure data, if systolic and diastolic blood pressure readings are taken at the same time and neither may be recorded without the other, then the set of patients who have systolic blood pressure is the same as the set of patients who have diastolic blood pressure, then we could declare the constraint:

some Patient has some systolic-BloodPressure
    if and only if that Patient has some diastolic-BloodPressure;

Ring constraints

A ring constraint may arise when an object type plays two roles in the one fact type.  This situation often occurs in hierarchies such as where there is a “parent of” relationship for one object type; one role in the relationship is the parent and the other role in the relationship is the child.  Ring constraints put conditions on how the roles in one, two or many relationships are treated as valid in the model.

CQL allows the declaration of many types of ring constraint, but are not fully explained here.

Ring type Description
transitive If A is related to B and B is related to C, then A is related to C
intransitive If A is related to B and B is related to C, then A cannot be related to C
strongly intransitive If A is related to B and B has a path of one or more instances of this relation to C, then A cannot be related to C
acyclic There is no path of one or more relations from A back to A
symmetric A is related to B if and only if B is related to A
antisymmetric If A related to B then B can not be related to A
reflexive A is related to A
irreflexive A cannot be related to A

 

Because of the difficulty of adequately verbalising such restrictions, ring constraints are introduced into CQL by attaching an annotation containing the ring constraints wrapped in square brackets “[ … ]” at the end of a fact reading.

Here are some examples:

Value Type is subtype of at most one super-Value Type (as Supertype) [acyclic, transitive];
Product may be substituted by alternate-Product in Season [acyclic, intransitive];
Topic belongs to at most one parent-Topic [acyclic];
User is friendly with other-User [symmetric];

Disjunctive, mandatory and external uniqueness constraints

A simple either/or syntax and a generalised syntax allows a variety of other types of constraint to be defined:

either FACT_TYPE_1 or FACT_TYPE_2;

either FACT_TYPE_1 or FACT_TYPE_2 but not both;

each [combination] Role1[, Role2…] occurs QUANTIFIER time[s] in FACT_TYPE_READINGS;

Disjunctive mandatory (exclusive-or) constraint, two-role uniqueness constraint, three-role mandatory constraint:

either Employee is supervised by some Manager or that Employee is ceo but not both;
each combination Name, Year occurs at most one time in
    Film has Name, Film was made in Year;
each Contact Method occurs at least one time in
    Contact Method is by mail to Address,
    Contact Method is by Email,
    Contact Method is by Phone;

Units

Units are used to automate value conversions.  A unit definition defines a new unit identifier in terms of one or more base units each possibly raised to a power and multiplied by an optional coefficient (a real number or integer fraction), plus an offset.   A unit can be defined in singular and plural forms if they are different.  A unit definition may be approximate, or even ephemeral (time-varying but provided by lookup to a web service). A unit definition is introduced by the “converts to” key phrase.  The syntax rule for a unit definition from a single base unit is:

COEFFICIENT BASE_UNIT OFFSET converts to UNIT

Here are a few examples, including more complex ones:

25.4 millimeters converts to inch/inches;
kelvin + 273.15 converts to celsius;
5.0/9 degC + 32.0 converts to degF;
9.80665 m sec^-2 converts to gravity;
365.24219879 day converts to year approximately;
0.00000000011125945705385 C^2 N^-1 e^-2 electronmass^-1 hbar^2 m^-2 converts to bohrradius;
auDollar converts to usDollar ephemera http://convert.currency.org/aud/usd;

Facts and Object Instance populations

All the statements described above are CQL definitions that define the model.  A model can also contain one or more data set populations containing instances of objects and facts.  The anonymous population are things that always exist, such as required reference data. Other populations (prefixed by the name and a colon) can be used to provide example data, for example to specify test case scenarios. Populations are not checked for compliance with constraints.

A value instance is defined by stating the name of the value type followed by a lexical representation of the value.  Similarly, a simply-identified entity instance is defined by stating the name of the entity type followed by the value of its identifying role. For example:

Quantity 20;
Product 'Super98';

A fact is instantiated by writing a reading with values after each of the roles.  For example:

Person 'Jarryd Hayne' plays Sport 'gridiron' for Country 'USA';

For an entity having multiple identifying roles, an expression containing multiple facts must be provided, using “and“, or a comma or a contraction to combine the facts. Likewise, a fact instance is written in a syntax that combines the fact readings with the syntax for object instances. Every identifying role of an object must be bound to a value somewhere within the overall expression. Written in this way, a fact or object instance is a fully bound version of the supporting fact type or object type. A named fact type is instantiated by adding an “in which” clause in parentheses. The contraction is explained afterwards.

example:
Year 2015 is of some Supply Period that is in Month 'June'
and Production Forecast 
    (in which Refinery 'Geelong' in that Supply Period will make Product 'Super98' in Quantity 20)
predicts Cost 110.9 per kL;

The example above follows the first occurrence of Supply Period with a right contraction, where the right-most player of the first reading is elided from the start of the second. The un-contracted form of this part would be:

Year 2015 is of some Supply Period, that Supply Period is in Month 'June';

A left contraction (where the left-hand player in the first binary is elided from the second) for this part might be:

Supply Period is in Year 2015 and is in Month 'June';

Queries

The statements described above define the model and provide reference and seed data set for the model.  In this section we describe how to query the data model. Query expressions may also be used within complex constraints, and are used in the definition of derived fact types.

A query in CQL is written in a similar syntax to a compound fact definition followed by a question mark “?“, except that some roles may be left without a binding to any value (the quantifier “which” may be used to emphasise this). The query searches for possible values that satisfy the overall expression. If a value is provided for every object role (i.e. it is fully bound), then the query will return true or false (either the specified facts exist or they do not).

For example:

Year 2015 is of some Supply Period that is in Month 'June'
and Production Forecast (in which Refinery 'Geelong' in that Supply Period will make which Product in which Quantity)?

The result may be verbalised as:
that Refinery in that Supply Period will make Product ‘Super98’ in Quantity 20;

Derived Fact Types

A derived fact type combines a normal fact type definition (without quantifiers) with a query using the “where” keyword. The role players of the fact type are projected from inside the query. The full syntax is not presented here. CQL also supports derived subtypes, where the fact that a instance of a super-type must also be an instance of a subtype is derived. Finally as mentioned previously, most constraints can contain queries:

some Person is susceptible to cancer where that Person smokes or has been exposed to Asbestos;
Driver is a kind of Person where that Person holds some Driving License or drives some Car;
some Booking has some allocated Seat
    only if that Booking is for some Session that is at some Cinema that contains that Seat;

This description provides only limited coverage of the operation of queries, which is a large subject by itself and we will leave a coverage of this topic until a later article. Not all the provided query syntax is currently functional, and even where the syntax is accepted, not all the implementation is complete.

A full example

The example snippets in the description above have largely concerned a simple Oil Supply model.  Here is the description of the model:

A model of the supply and demand for refined oil. A populated database can be used to optimise profitability by minimising transport costs, maximise supply by allowing substitution of compatible products (with compatibility depending on season) and also to predict shortages.

Here is the the full CQL for this model (note that this uses a terse form of the grammar, which omits each and some other unnecessary keywords:

vocabulary OilSupply;

/*
 * Value Types
 */
Cost is written as Money;
Month Nr is written as Signed Integer(32);
Product Name is written as String;
Quantity is written as Unsigned Integer(32);
Refinery Name is written as String(80);
Region Name is written as String;
Season is written as String(6) restricted to {'Autumn', 'Spring', 'Summer', 'Winter'};
Transport Method is written as String restricted to {'Rail', 'Road', 'Sea'};
Year Nr is written as Signed Integer(32);

/*
 * Entity Types
 */
Month [static] is identified by its Nr restricted to {1..12};
Month is in one Season;

Product is independent identified by its Name;

Refinery is independent identified by its Name;

Region is independent identified by its Name;
Transport Route is where
 Transport Method transportation is available from Refinery to Region,
 Transport Method transportation is available to Region from Refinery;
Transport Route incurs at most one Cost per kl;

Year is identified by its Nr;

Acceptable Substitution is where
 Product may be substituted by alternate-Product in Season [acyclic, intransitive],
 alternate-Product is an acceptable substitute for Product in Season;

Supply Period [separate, static] is identified by Year and Month where
 Supply Period is in one Year,
 Supply Period is in one Month;

Production Forecast is where
 Refinery in Supply Period will make Product in one Quantity,
 Refinery will make Product in Supply Period in Quantity,
 Refinery will make Quantity of Product in Supply Period;
Production Forecast predicts at most one Cost;

Regional Demand is where
 Region in Supply Period will need Product in one Quantity,
 Region will need Product in Supply Period in Quantity,
 Region will need Quantity of Product in Supply Period;

Downloading and Using the CQL Compiler

CQL was originally developed as part of the Active Facts project principally developed by Clifford Heath.

Within this project Clifford developed the Active Facts generator afgen, which reads a CQL or ORM data model and can generate any of a range of application and systems development outputs, including:

  • Normalised SQL schema for efficient storage of data
  • Ruby API, for application development based on the input data model
  • Data Vault SQL schema, for the development of a Data Vault data warehousing based on the input data model
  • Supporting documentation, including HTML pages that build a business glossary for the terms defined in the input data model.

The flexible nature of the generator infrastructure means you can refine or develop new generators easily,

Here are some useful links for downloading, installing and using the afgen CQL compiler:

afgen has been released as an open source project under a MIT License, and you can install it using the ruby command gem install activefacts.  The afgen source code and CQL examples can also be found on github at

Further Resources

References

[1] Halpin T. & Morgan T. 2008, Information Modeling and Relational Databases, 2nd edition, Morgan Kaufmann.

[2] Halpin T. 2015, Object-Role Modeling Fundamentals, Technics Publications.

Graeme PortIntroduction to CQL