Creating Good Data Definitions
- Good Data Definitions: Why do we care?
Good definitions are well worth the effort. In fact, they are central to the effort of data governance. A good definition, once developed, provides a clear picture of what the data asset is (and, by extension, what it isn’t). A good definition precludes the kinds of contradictions and ambiguities that create problems for the interpretation and organization of data, especially large amounts of data across numerous databases.
- Creating a well-written data definition
A well-written data definition should explicitly describe and explain the meaning of the business term or data element. As the definition provides the context for which business is being conducted, each data definition should consist of certain components and characteristics.
- Characteristics of Good Data Definitions
Understanding that the context in which the data is used is a key factor in defining data elements, generally, when composing a data element definition, each definition should have the following characteristics:
- Unique - A definition should be unique and distinguishable from every other data element definition.
- Clear – A data definition should be precise, concise, and unambiguous. The definition should be clear enough to allow only one possible interpretation.
- Singular - The data definition should be expressed in the singular.
- Positive - The definition should be expressed as what it is, limit any emphasis on what it is not.
- Specific Concept - The definition should include the essential meaning or primary characteristics of the concept.
- Defined with Commonly Understood Abbreviations - The definition should only use abbreviations when necessary and the abbreviation must be commonly understood.
- Primary Definition - The definition should not contain any embedded definitions or underlying concepts of other data elements.
- Expressed without rationale, functional usage, domain information or procedural information - The definition should not include statements about why and how a data element is used.
- Defined without circular reasoning – The definition should not be defined in terms of another data element.
- Good Data Definition Structure
Because nuances in meaning often occur based upon the context, it is important to clearly describe the term, providing as much information as possible to limit such occurrences. A well-written definition should incorporate at least 2 of the following components as part of the definition text:
- Broader Term – a general class to which a term belongs; often this is implied. To better explain this concept, consider an “IS A” relationship. For example, “A school is an organization.”
- Distinguishing Characteristics – the pertinent attributes with specific values of the term. To better explain this concept, consider a “HAS A” relationship. For example, “A school has an academic program.”
- Function Qualifier – how the term being defined is used, usually involves verbs. To better explain this concept, consider a “USED FOR” relationship. For example, “A school is used for educating students.”
- An example of a good definition using the above components is:
- “A school is a learning organization that has one or more academic programs used for educating students.”
- Broader Term: learning organization
- Distinguishing Characteristic: one or more academic programs
- Function Qualifier: used for educating
- Good Data Definition Habits
Good data definition habits ensure a better understanding of data content and the differences between data from different parts of the organization. The following are good data definition habits:
- Seek out unique characteristic(s) - When writing a definition, consider the asset to be defined and ask yourself, what is it that makes this unique from other similar assets? This characteristic or set of characteristics should be central to your definition.
- Write true and relevant definitions - If a definition is false, it is worse that useless: it is also misguiding. Errors happen, sometimes through typos and sometimes because of a lack of information. In both cases, false definitions can be caught by asking many people to read over a definition in order to approve it. This is why definition approval and data quality management are central to data governance!
- Use phrases like “is a,” “has a,” and “used for” - Definitions are stronger when they identify the type or class of the asset (“is a”), provide central identifying characteristics of the asset
(“has a”), and note the general function of the asset (“used for”).
- Data Definition Habits to Avoid
- Nested definitions - When writing a definition, do not include nested definitions
- (definitions within a definition). This adds unnecessary complexity to the definition.
- Lists - The purpose of a definition is to pinpoint the meaning of a concept, not to provide every possible example of that concept. Lists of examples should be avoided, as should lists of descriptors, unless they are key to a specific and unambiguous definition.
- Using a synonym as a definition - A definition should describe an asset. Synonyms do not describe the characteristics of an asset, but only provide another name for that asset. This is not a definition.
- Circular definitions - A definition should not include the name of the asset in the definition. It is Important in the development of a business semantics glossary or other repository of definitions that two assets should not be defined in such a way that each refers to the other for its definition. Instead, look for the unique characteristics of the asset, and use those to define it.
- Obscure or overly technical language - Definitions that rely on technical language or make assumptions about a reader’s knowledge base should be avoided. Wherever possible, replace technical language or jargon with simple explanations.
- Adding metadata to Data Definitions
In the creation of good data definitions, additional metadata should be incorporated into the entry for clarity.
- Related Term – a term that has relevance to the term being defined but not a synonym.
- Synonyms – terms that mean nearly the same as the term being defined.
- Descriptive Example – an instance of the term as it is seen in everyday life.
- Possible Values – a list of possible values for the term.
- Calculation – defines how the term is derived. For example, the calculation for the term Cost Ration is Actual Cost / Planned Cost.
- Source – where the definition of the term came from (originated).
- Approval – information about when the term definition was approved, by whom, etc.
- Data Definitions Technical Lineage
By adding additional metadata to a definition, you are able gain technical insights including the data lineage.
Portions of the material in this document are adapted from “Defining the data: Constructing a well-written data element definition,” originally retrieved from https://www.pesc.org/