Data Services:CQL

From caGridWiki

Jump to: navigation, search

Home | Documentation | Client API | Introduce Extension | Service Styles | CQL | Tutorials


CQL is the query language used by caGrid Data Services to express queries against a data source using an object oriented language.

Contents

Design

CQL is defined in an XML document conforming to a well defined schema with the URI http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery.

Parts of the CQL schema

  • CQL Query
    • A simple wrapper element at the head of every CQL query document. It contains the target element.
  • Target
    • The Target element is of the type Object, and describes the data type which the query will return.
  • QueryModifier
    • An optional element modifying the returned result set. This modifier has a required attribute ‘countOnly’ which can tell the data service to return the number of results the query would return. The modifier optionally allows for a choice of a list of Attribute Names or a single Distinct Attribute to return. When the list of attribute names is specified, sets of tuples are returned containing the attribute names and corresponding values for each object instance returned by the query. When distinct attribute is used, only unique attribute values are placed in the returned attribute sets.
      The CQL Schema Diagram
      Enlarge
      The CQL Schema Diagram
  • Object
    • The Object element contains the required attribute ‘name.’ This attribute’s value defines the caDSR class of the object. When the Object is the top level target of a CQL query, it identifies the data type that will be returned by the caGrid Data Service. The Object allows for a choice between three child elements. The possible child elements are Attribute, Association, and Group. Objects may have at most one of these child elements.
  • Attribute
    • An Attribute type in CQL describes a restriction for an attribute of an Object. The Attribute contains three XML attributes, which define the restriction. The attribute ‘name’ defines the name of the attribute to be restricted. The attribute ‘value’ defines the restriction on the attribute. The attribute ‘predicate’ describes what type of restriction the Attribute defines. Allowable predicates are defined by the schema’s simple type ‘Predicate’, which defines an enumeration of allowable values. The predicate values are generally self-descriptive: “EQUAL_TO", “NOT_EQUAL_TO", “LIKE", "LESS_THAN", "LESS_THAN_EQUAL_TO", “GREATER_THAN", and “GREATER_THAN_EQUAL_TO." Two additional predicates, "IS_NOT_NULL", and “IS_NULL" check only for the presence or absence, respectively, of an attribute, and do not restrict its value at all. Therefore, any ‘value’ attribute will be ignored when using these predicates.
  • Association
    • An association describes a related Object, which defines the associated Object’s restrictions for the query, as well as the relationship from one object to another. Specifically, it defines the relationship down the object model tree. The Association complex type is an extension of Object. The Association has a single, optional attribute named ‘roleName.’ This attribute identifies which associated object field the Association is defining. For example, a person may have more than one address, perhaps business and home. To perform a restriction against the home address, a query must specify the home address role name for the associated object. If the query omits the role name, such a query becomes ambiguous, as there is more than one field of Person which has a type of Address. In this case, the data service will throw a MalformedQueryException explaining that the requested association is ambiguous. In the case of an object where there is only one field of a given type, the roleName attribute may be omitted, and the data service will resolve the correct name as the query is processed.
  • Group
    • Groups define logical joints of two or more conditions, and operate against the Object to which they are attached. Groups must have two or more children, which may be a mixture of type Attribute, Association, or Group. Groups also have an attribute named ‘logicOperator,’ whose type is defined in the schema’s simple type LogicalOperator. This type is an enumeration of the values “AND” and “OR.” The operator is applied to all children in the group. The “AND” operator requires that all conditions in the group be true for the group to evaluate as true. The “OR” operator requires that any condition in the group evaluate as true.

Examples

Return all objects of a given type

The simplest CQL query defines only one Object. The data service will return all objects in its underlying data source of the given type.

This query will return all Gene objects in the data source:

<CQLQuery xmlns="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery">
	<Target name="gov.nih.nci.cabio.domain.Gene"/>
</CQLQuery>

Count all objects of a given type

A variant of the same simple CQL query defines only the target object and a query modifier to count the number of results. Such a query may be useful in situations where the returned result set is potentially very large, and knowing the size of this set may determine handling ahead of actually retrieving results.

This query will return the number of all Gene objects in the data source:

<CQLQuery xmlns="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery">
	<Target name="gov.nih.nci.cabio.domain.Gene"/>
	<QueryModifier countOnly="true"/>
</CQLQuery>

Objects with a single attribute restriction

A simple extension to the above query is to ask for all Objects with a single Attribute restriction. Here, a LIKE restriction is used. Note that the syntax for LIKE queries mimics that of SQL.

This query will return all Genes with a symbol starting with ‘BRCA’:

<CQLQuery xmlns="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery">
	<Target name="gov.nih.nci.cabio.domain.Gene">
		<Attribute name="symbol" value="BRCA%" predicate="LIKE"/>
	</Target>
</CQLQuery>

Distinct attributes of an object

The following query will return only distinct symbol names from all genes with symbol names starting with ‘BRCA’.

This query will return distinct symbol values for all Genes with a symbol starting with ‘BRCA’:

<CQLQuery xmlns="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery">
	<Target name="gov.nih.nci.cabio.domain.Gene">
		<Attribute name="symbol" value="BRCA%" predicate="LIKE"/>
	</Target>
   <QueryModifier countOnly="false">
		<DistinctAttribute>symbol</DistinctAttribute>
	</QueryModifier>
</CQLQuery>

Multiple attributes of a target object

The final query modification which can be used is one where multiple attributes of a target object are returned.

This query will both id and symbol values for all Genes with a symbol starting with ‘BRCA’:

<CQLQuery xmlns="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery">
	<Target name="gov.nih.nci.cabio.domain.Gene">
		<Attribute name="symbol" value="BRCA%" predicate="LIKE"/>
	</Target>
   <QueryModifier countOnly="false">
		<AttributeNames>symbol</AttributeNames>
		<AttributeNames>id</AttributeNames>
	</QueryModifier>
</CQLQuery>

Objects with a single association

A more involved CQL query defines an Object and an Association to another Object. The data service will return all objects of the requested data type and meeting the restriction on their association.

This query will return all Genes with an associated Taxon whose id is equal to 6:

<CQLQuery xmlns="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery">
	<Target name="gov.nih.nci.cabio.domain.Gene">
		<Association roleName="taxon" name="gov.nih.nci.cabio.domain.Taxon">
			<Attribute name="id" value="6" predicate="EQUAL_TO"/>
		</Association>
	</Target>
</CQLQuery>

Groups

Groups are required when more than one restriction on an Object is needed. In this example, a group is made with an AND logical operator, meaning that all conditions defined within the Group must be met.

This query will return all Genes with a symbol beginning with ‘BRCA’ and have an associated Taxon with an id equal to 6:

<CQLQuery xmlns="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery">
	<Target name="gov.nih.nci.cabio.domain.Gene">
		<Group logicRelation="AND">
			<Attribute name="symbol" value="BRCA%" predicate="LIKE"/>
			<Association roleName="taxon" name="gov.nih.nci.cabio.domain.Taxon">
				<Attribute name="id" value="6" predicate="EQUAL_TO"/>
			</Association>
		</Group>
	</Target>
</CQLQuery>

Nested groups

Nested groups can be used to build up sophisticated compound queries. The groups may have differing logical relations to express restrictions on various portions of the object.

This query will return all Genes with an associated Taxon that has a scientificName attribute of ‘Mus musculus’ and have either a symbol like ‘BRCA’ or like ‘ICR’:

<CQLQuery xmlns="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery" >
	<Target name="gov.nih.nci.cabio.domain.Gene">
		<Group logicRelation="AND">
			<Association roleName="taxon" name="gov.nih.nci.cabio.domain.Taxon">
				<Attribute name="scientificName" value="Mus musculus" predicate="EQUAL_TO"/>
			</Association>
			<Group logicRelation="OR">
				<Attribute name="symbol" value="BRCA%" predicate="LIKE"/>
				<Attribute name="symbol" value="ICR%" predicate="LIKE"/>
			</Group>
		</Group>
	</Target>
</CQLQuery>

Schemas

http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery

The CQL querying schema.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:cql="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery" targetNamespace="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery" elementFormDefault="qualified" attributeFormDefault="unqualified">
	<xsd:complexType name="Object">
		<xsd:annotation>
			<xsd:documentation>Object used as search criteria or target definition</xsd:documentation>
		</xsd:annotation>
		<xsd:choice>
			<xsd:element name="Attribute" type="cql:Attribute" minOccurs="0"/>
			<xsd:element name="Association" type="cql:Association" minOccurs="0"/>
			<xsd:element name="Group" type="cql:Group" minOccurs="0"/>
		</xsd:choice>
		<xsd:attribute name="name" type="xsd:string" use="required"/>
	</xsd:complexType>
	<xsd:complexType name="Association">
		<xsd:annotation>
			<xsd:documentation>Association to another Object</xsd:documentation>
		</xsd:annotation>
		<xsd:complexContent>
			<xsd:extension base="cql:Object">
				<xsd:attribute name="roleName" type="xsd:string" use="optional"/>
			</xsd:extension>
		</xsd:complexContent>
	</xsd:complexType>
	<xsd:complexType name="Attribute">
		<xsd:annotation>
			<xsd:documentation>Object Property element used as search criteria</xsd:documentation>
		</xsd:annotation>
		<xsd:attribute name="name" type="xsd:string" use="required"/>
		<xsd:attribute name="predicate" type="cql:Predicate" use="optional" default="EQUAL_TO"/>
		<xsd:attribute name="value" type="xsd:string" use="required"/>
	</xsd:complexType>
	<xsd:complexType name="Group">
		<xsd:annotation>
			<xsd:documentation>Binary joint</xsd:documentation>
		</xsd:annotation>
		<xsd:choice minOccurs="2" maxOccurs="unbounded">
			<xsd:element name="Association" type="cql:Association" maxOccurs="unbounded"/>
			<xsd:element name="Attribute" type="cql:Attribute" maxOccurs="unbounded"/>
			<xsd:element name="Group" type="cql:Group" maxOccurs="unbounded"/>
		</xsd:choice>
		<xsd:attribute name="logicRelation" type="cql:LogicalOperator" use="required"/>
	</xsd:complexType>
	<xsd:simpleType name="Predicate">
		<xsd:annotation>
			<xsd:documentation>Extensible predicate type for object properties</xsd:documentation>
		</xsd:annotation>
		<xsd:restriction base="xsd:string">
			<xsd:enumeration value="EQUAL_TO" id="equal_to"/>
			<xsd:enumeration value="NOT_EQUAL_TO" id="not_equal_to"/>
			<xsd:enumeration value="LIKE" id="like"/>
			<xsd:enumeration value="IS_NULL" id="is_null"/>
			<xsd:enumeration value="IS_NOT_NULL" id="is_not_null"/>
			<xsd:enumeration value="LESS_THAN" id="less_than"/>
			<xsd:enumeration value="LESS_THAN_EQUAL_TO" id="less_than_equal_to"/>
			<xsd:enumeration value="GREATER_THAN" id="greater_than"/>
			<xsd:enumeration value="GREATER_THAN_EQUAL_TO" id="greater_than_equal_to"/>
		</xsd:restriction>
	</xsd:simpleType>
	<xsd:simpleType name="LogicalOperator">
		<xsd:annotation>
			<xsd:documentation>Logical operators</xsd:documentation>
		</xsd:annotation>
		<xsd:restriction base="xsd:string">
			<xsd:enumeration value="AND"/>
			<xsd:enumeration value="OR"/>
		</xsd:restriction>
	</xsd:simpleType>
	<xsd:element name="CQLQuery">
		<xsd:annotation>
			<xsd:documentation>Top level of CQL queries</xsd:documentation>
		</xsd:annotation>
		<xsd:complexType>
			<xsd:sequence>
				<xsd:element name="Target" type="cql:Object">
					<xsd:annotation>
						<xsd:documentation>Defines the target data type of a CQL query</xsd:documentation>
					</xsd:annotation>
				</xsd:element>
				<xsd:element name="QueryModifier" type="cql:QueryModifier" minOccurs="0">
					<xsd:annotation>
						<xsd:documentation>Optionally modifies the returned results of the query</xsd:documentation>
					</xsd:annotation>
				</xsd:element>
			</xsd:sequence>
		</xsd:complexType>
	</xsd:element>
	<xsd:complexType name="QueryModifier">
		<xsd:annotation>
			<xsd:documentation>Modifies the returned data from the query</xsd:documentation>
		</xsd:annotation>
		<xsd:choice minOccurs="0">
			<xsd:element name="AttributeNames" type="xsd:string" maxOccurs="unbounded"/>
			<xsd:element name="DistinctAttribute" type="xsd:string"/>
		</xsd:choice>
		<xsd:attribute name="countOnly" type="xsd:boolean" use="required"/>
	</xsd:complexType>
</xsd:schema>

http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLResultSet

The CQL results schema.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:res="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLResultSet" targetNamespace="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLResultSet" elementFormDefault="qualified" attributeFormDefault="unqualified">
	<xsd:complexType name="CQLQueryResults">
		<xsd:annotation>
			<xsd:documentation>Results from a CQL query executed against a caGrid data service</xsd:documentation>
		</xsd:annotation>
		<xsd:choice>
			<xsd:sequence>
				<xsd:element name="ObjectResult" type="res:CQLObjectResult" minOccurs="0" maxOccurs="unbounded"/>
			</xsd:sequence>
			<xsd:sequence>
				<xsd:element name="IdentifierResult" type="res:CQLIdentifierResult" minOccurs="0" maxOccurs="unbounded"/>
			</xsd:sequence>
			<xsd:sequence>
				<xsd:element name="AttributeResult" type="res:CQLAttributeResult" minOccurs="0" maxOccurs="unbounded"/>
			</xsd:sequence>
			<xsd:element name="CountResult" type="res:CQLCountResult"/>
		</xsd:choice>
		<xsd:attribute name="targetClassname" type="xsd:string" use="required"/>
	</xsd:complexType>
	<xsd:complexType name="CQLResult">
		<xsd:annotation>
			<xsd:documentation>Single result from a CQL query</xsd:documentation>
		</xsd:annotation>
	</xsd:complexType>
	<xsd:complexType name="CQLObjectResult">
		<xsd:annotation>
			<xsd:documentation>Result object</xsd:documentation>
		</xsd:annotation>
		<xsd:complexContent>
			<xsd:extension base="res:CQLResult">
				<xsd:sequence>
					<xsd:any processContents="lax"/>
				</xsd:sequence>
			</xsd:extension>
		</xsd:complexContent>
	</xsd:complexType>
	<xsd:complexType name="CQLIdentifierResult">
		<xsd:annotation>
			<xsd:documentation>Grid Identifier to an object</xsd:documentation>
		</xsd:annotation>
		<xsd:complexContent>
			<xsd:extension base="res:CQLResult">
				<xsd:sequence>
					<xsd:element name="Identifier" type="res:TBDIdentifier"/>
				</xsd:sequence>
			</xsd:extension>
		</xsd:complexContent>
	</xsd:complexType>
	<xsd:complexType name="CQLAttributeResult">
		<xsd:annotation>
			<xsd:documentation>Result Attribute</xsd:documentation>
		</xsd:annotation>
		<xsd:complexContent>
			<xsd:extension base="res:CQLResult">
				<xsd:sequence>
					<xsd:element name="Attribute" type="res:TargetAttribute" maxOccurs="unbounded"/>
				</xsd:sequence>
			</xsd:extension>
		</xsd:complexContent>
	</xsd:complexType>
	<xsd:complexType name="TBDIdentifier">
		<xsd:annotation>
			<xsd:documentation>caGrid identifier, as yet TBD</xsd:documentation>
		</xsd:annotation>
	</xsd:complexType>
	<xsd:complexType name="TargetAttribute">
		<xsd:annotation>
			<xsd:documentation>an attribute (name and value) of an Object instance</xsd:documentation>
		</xsd:annotation>
		<xsd:attribute name="name" type="xsd:string" use="required"/>
		<xsd:attribute name="value" type="xsd:anySimpleType" use="required"/>
	</xsd:complexType>
	<xsd:element name="CQLQueryResultCollection" type="res:CQLQueryResults"/>
	<xsd:complexType name="CQLCountResult">
		<xsd:annotation>
			<xsd:documentation>Result of a count query</xsd:documentation>
		</xsd:annotation>
		<xsd:complexContent>
			<xsd:extension base="res:CQLResult">
				<xsd:attribute name="count" type="xsd:long" use="required"/>
			</xsd:extension>
		</xsd:complexContent>
	</xsd:complexType>
</xsd:schema>

Caveats

  • CQL does not permit querying for attributes with values that are XML schema complex types.
    • Only values that can be represented as XML schema simple types are allowed.
  • CQL Attribute Results cannot contain attribute values which are XML schema complex types.
    • Only values that can be represented as XML schema simple types are allowed.
  • CQL does not provide a facility for returning object instances other than the targeted data type.
    • This includes subclasses of the targeted data type. These cannot be returned because their XML representation will differ from that of the requested object, which violates the expected results schema.
  • CQL cannot return populated associations on instances of the targeted data type.
    • This has some implications when dealing with uni-directional associations. For example:
Assume two classes: 
Person
-name
Address
-street

And there is a uni-directional association between Person and Address (Person->Address)

With CQL you can say: 
"Give me the name of the Person at '123 Main St'" 
You can just write the CQL with target Person, and criteria of Association to Address where Address.street='123 Main St' 
 
But you can't say: 
"Give me the Address of 'Scott'" 
Because Address needs to be the target, and there is no way to express constraints on the Person (as there is no association).
 
However, if the association is bi-directional, you CAN do both. 
To do the second query, you just would express the query as target Address, and criteria of Association to Person where Person.name='Scott'.
You basically need to "invert" the criteria.
Personal tools