Are ontologies copyrightable? - Ontologies


Ontologies are one of those neither-fish-nor-fowl creatures that really bring the greyness of copyright as applied to software into blurred focus.

  1. Do ontologies constitute copyrightable subject matter?
  2. Scope of Protection: Is an ontology data or software?
  3. Do specialized ontologies constitute derivative works of upper ontologies?
  4. What effect does GNU GPL restrictions have on ontologies?

Do ontologies constitute copyrightable subject matter?

Copyright protects expression of an idea that is an original work fixed in tangible form. Copyright does not protect any idea, procedure, process, system, method of operation, concept, principle or discovery , or any expression that is merged with the preceding. Copyright also does not protect scenes a faire.

Ontology is the branch of metaphysics concerned with the nature and relations of being, and an ontology is a particular theory about the nature of being or the kinds of existents. From a software perspective, an ontology is an expression of a theory about the nature and relations of existents codified in first order predicate logic. The expression of those ideas are comprised of a collection of definitions and axioms, that collectively constitute an original work that is fixed in an ASCII text file.

As far as non-copyrightable subject matter is concerned, discoveries and scenes a faire are not relevant to our discussion. However, there is an issue with the merger doctrine with regard to ideas and processes. Fortunately, there is precedent with ontology’s lowly cousin, the taxonomy, that gives us guidance in this regard.

A taxonomy is an orderly classification of a subject according to its relationships. The Seventh Circuit specifically addressed the copyrightability of a taxonomy ; the ADA published a taxonomy of dental procedures and subsequently Delta Dental Association published a derivative work of the ADA’s taxonomy that included most of the numbers and short descriptions of the ADA’s taxonomy. Delta Dental did not dispute that a substantial amount of its taxonomy was copied from ADA's taxonomy. The issue before the court was whether a taxonomy is copyrightable subject matter.

Delta challenged copyrightability on originality and systems arguments. The threshold of originality for a literary work is very low, and that is overcome by the numbering system employed and descriptions given. The more substantive challenge involves the argument that taxonomies are systems. The court held that a taxonomy is not system. A taxonomy may be used as part of a system, e.g., a system of recording dental procedures in a dental office, but this does not preclude protection for the taxonomy. The ADA cannot preclude a dentist from using its taxonomy to record dental procedures as to do so would provide protection for a system, but the ADA can prevent a party from copying the taxonomy. Delta did not use the taxonomy as part of a ‘system’; it copied the taxonomy and made a derivate work of the taxonomy.

Scope of Protection: Is an ontology data or software?

It is necessary to make a determination of where along the data-software spectrum an ontology lies as that will determine the scope of protection afforded, which will indicate the appropriate infringement analysis.

A computer program, or software, is defined as a set of statements or instructions to be used directly or indirectly in a computer in order to bring about a certain result.

The line between software (or computer program) and the data to be manipulated by the software is sometimes hard to discern. For example, knowledge bases, the databases used in artificial intelligence programs, are not mere lists or records of facts. The knowledge base itself includes structured rules and relationships needed for making decisions.

The fog gets thicker as object-oriented programming and component-based architectures further blur the distinction between program and data. These architectures invest more of the intelligence of a system into the organization of the data, while as the same time making the data more operational within the system. So in our spectrum, data contains data, databases contain data and loosely coupled methods (stored procedures, triggers, etc.), taxonomies contain data and structure, ontologies contain data, structure and rules, knowledge bases contain data, structure and rules, objects contain data and tightly coupled methods, components likewise contain data and tightly coupled methods, and applications generally contain objects and components. Thus, as you move along the spectrum, it becomes more difficult to distinguish the data from the algorithm.

For purposes of determining the scope of protection for any given software code, one could conclude that the appropriate method for determining the scope of protection is to simply slot the purported code on the data-application spectrum below, the position of which indicates the relative thinness-thickness of protection.


If the protection is thin, then there is no need to perform any analysis pertaining to a derivative work. Since you would only be looking at literal infringement, you would either determine that there was copying or not. If the protection is thick, then you would look a literal infringement as well as non-literal infringement analysis, in which case, by virtue of the Abstraction-Filtration-Comparison test, you would necessarily need to perform a derivative work analysis.

On the data end of the spectrum, a data in a list format (such as a phonebook) does not meet the threshold of copyrightable subject matter. A database can be copyrightable subject matter, but any alleged infringing code would only be subject to a literal infringement analysis. The protection for a taxonomy is thin. In the Dental Dental case, Delta admitted that a substantial amount of its taxonomy was copied from the ADA taxonomy. The scope of protection for an ontology would be thicker than an taxonomy, but thinner than a knowledge base.

Do specialized ontologies constitute derivative works of upper ontologies?

Literal copying of a significant portion of source code is not always sufficient to establish that a second work is a derivative work of an original program. Conversely, a second work can be a derivative work of an original program even though there is no copying of the literal source code of the original program has been made. This is the case because copyright protection does not always extend to all portions of a program’s code, while at the same time, it can extend beyond the literal code of a program to its non-literal aspects, such as its architecture, structure, sequence, organization, operation modules, and user interface.

The copyright act is of little, if any, help in determining the definition of a derivative work of software. However, the applicable provisions do provide some, albeit cursory, guidance. Section 101 of the Copyright Act sets forth the following definitions:

  • A computer program is a set of statements or instructions to be used directly or indirectly in a computer in order to bring about a certain result.
  • A derivative work is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgement, condensation, or any other form in which a work may be recast, transformed, or adapted.
  • A work consisting of editorial revisions, annotation, elaboration, or other modifications which, as a whole, represent an original work of authorship, is a derivative work.

On the data-application spectrum, ontologies lean towards data, and consequently are afforded thinner protection than more method rich implementations. This speaks to literal copying as being the test for infringement on the basis of derivation. Furthermore, there may be case for only considering the literally copied portion as infringing. This makes a strong case for the position that specialized ontologies do not constitute derivative works of upper level ontologies, but rather stand on their own with reference to an API-like layer of upper facts.

What effect does GNU GPL restrictions have on onotologies?

Section 0 (it's a Unix thing) of the GPL defines a work based on the Program, as in when you distribute the same sections as part of a whole which is a work based on the Program, as such: a work based on the Program means either the Program or any derivative work under copyright law.

The GPL does appear to rely on the definition of derivate work under copyright law for drawing the line between those works that are covered and those that are not.

The meaning of derivative work is subsumed in the definition of an exact copy of the original work, at least partially, because, in some cases, distributing an exact copy of the original work would not implicate the copyright law if that original work contained no copyrightable subject matter. As such, no compliance with the license is necessary for such re-distribution of such an original work.

Distribution of an unmodified original work may be seen by some as a different analysis from distribution of a derivative work. However, at the end of the day, courts will inevitably follow the filtration test and reduce the original work to its copyright protectable elements. This has led some observers to note that what one is actually distributing as an exact copy of the [purported] original work, is in fact a derivative work of the protectable portion of the original work. This results in a derivative work analysis because the court will not be comparing what is re-distributed as a whole work (unless the entire work constituted copyrightable subject matter), but rather portion of the work that constitutes copyrightable subject matter.

Consequently, using the previous pegging of ontologies on our data-application spectrum, and the definition of derivative work under the GPL, it seems that lower level ontologies that reference upper level ontologies are not derivative works and therefore lower level ontologies will not be constrained by the requirements and restrictions of the GPL.