By Jonathan D. Gold, MD MHA MSc FAMIA FHIMSS
April 1, 2024
Challenge yourself:
What is the importance of concept-to-concept mapping for pre-coordinated phrases?
Why is 'certainty' an important part of speech?
When is a 'tethered' concept required? What is it tethered to?
What is the difference between 'syntax' and 'semantic validity'?
Temporal Objects can be divided into three major groupings or subdomains: 1) Parts of Speech, 2) Pre-Coordinated Phrases, and 3) Foundational Detail.
Temporal phrases can be readily interpretable, like pre-coordinated phrases, or deconstructed into parts of speech that are connected using rules of syntax to produce an interpretable meaning. As stated previously, not all interpretable phrases can be plotted on a timeline, though most significant ones can. Semantic validity requires that parsing the temporal phrase result in a plottable timeframe.
Parts of Speech Subdomain[1]
Certainty
Certainty describes the likelihood that the event occurred at a specific time or might have occurred. (Examples: “I’m pretty sure my heart attack happened in 1989.” “I may have had the mumps as a child.”)
Tense
Tense designates an event as past, present, or future. It also allows for the extension of a past event into the present or even future, or a present event into the future. (Examples: “Appendectomy last year.” “Began chemotherapy three months ago which patient continues to receive once every four weeks.”)
Value
Value represents the number, the period of the day, day of the week, or month of the year. It also includes comparative terms or phrases (called “modifiers”). (Examples: “Five days of intermittent coughing.” “Every Monday, awakens with a migraine.” “Chills more than 3 times a week.”)
Measurement
Measurement serves as the type of unit associated with the value. It may include a time unit or a phase. (Examples: “CT scheduled four days from now.” “Bleeding postoperatively.”)
Duration
Duration relates to a moment when an event occurred or an event’s time span. (Examples: “Momentary lapse of consciousness.” “She smoked a pack a day for twenty-five years.”)
Recurrency
Recurrency pattern designates whether events are regularly recurrent, variably recurrent, or non-recurrent. (Examples: “Recurrent chills, fever, malaise every three days.” “Irregular menstruation cycles.”). These patterns may be subsumed within the pre-coordinated intervals.
Frequency
Frequency defines the number of occurrences per period or units per period. (Examples: “Three times a week.” “Eight hours a day…”)
Mode
Mode depicts the stress of the time description—the key emphasis may be sequential or an event’s priority.[2] Additionally, mode contains prepositions and conjunctions that serve to define the context of a phrase. (Examples: “Two years prior to the lung cancer diagnosis, the patient quit smoking.” “Immediate need for kidney donor.”)
Pre-coordinated phrases may combine value + measurement, value + time-date format, or other common expressions to simplify NLP concepts for dates[3], ages, common time intervals, tensed intervals[4], and observable narratives. Observable narratives incorporate observable phrases associated with dates, ages, milestones, and times, e.g., “Gestational age” and “Date of Birth”. (Examples of pre-coordination: “Loss of consciousness for 10 minutes after choking on food”, “15yo adolescent with rash from today”, “two months ago”, “Date of onset: 12/13/2015”).
Unlinked Concepts
Commonly, dates that include a year’s value (e.g., “5/19/2020”, “May 2020”, “2020”), do not require the input of metadata to plot on a timeline. These unlinked concepts differ from other pre-coordinated phrases. Unlinked concepts are usually mapped to additional concepts (concept-to-concept maps) that contain specific dates, including a point in time and upper and lower date delimiters.
Tethered Concepts
Tethered concepts require the input of metadata in order to plot them on a timeline—typically this input is the ‘date stamp of entry’ (‘dse’) or some landmark event.[5] For example, the phrase “last May” is dependent upon when the entry was written (i.e., tethered to it). In this example, a ‘dse’ from December 2020, would point to May 2020, whereas one from April 2020, would be associated with May 2019. Similarly, the age “74 years old” suggests that the person is that age on the day of the entry (or when an event occurred), therefore the year of birth was 74 years prior to the entry or event. Like unlinked concepts, tethered concepts utilize concept-to-concept maps. Unlike unlinked concepts, tethered concept-to-concept maps require an intermediate step, known as ‘transformation’, which incorporates the metadata ‘dse’ (or the referenced event date, like birthdate) into a concept-to-concept formula to arrive at plottable dates.
Foundation Subdomain
Additional supporting groups of concepts, ‘Calculation Concepts’ provide mathematical expressions, points in time, delimiters, and syntactic context, and are not parts of speech, but rather help convert text to points on a health timeline:
Syntax Rules
Syntax rules are used to determine if the proper parts of speech for NLP are present that will allow an event to be plotted on a timeline.[6] Developing syntax rules requires close attention to common sentence structure related to temporal statements. Initial construction requires association between the most elemental and simplest phrases—for example, a phrase using only two parts of speech (“last year” parsed as ‘Tense (Past) + Measurement (Unit_year)’). Additional rules require increasing numbers of parts of speech and structures that are more complex. Since word order plays a key role in our interpretation of a phrase, this, too, requires conventions. Machine learning tools and large language models will be instrumental in defining syntax rules.
Calculation
Calculation concepts include those that are mathematical expressions, points in time and delimiters.
Mathematical Expression
This group of concepts aids in mapping pre-coordinated groups to the health timeline. Mathematical expressions use concept-to-concept associations that include components for calculating when an event occurred. Expressions include: 1) Formulae relative to dse (date stamp of entry), dob (date of birth), and other clearly identifiable dates, 2) Conversion formulae for measurements >day to days for use with temporal values when interpreting ‘value + measurement’ phrases, and 3) Recurrence formulae. [7]
Point in Time
'Point in Time' is used to call out the 'exact' date when an event has or will occur. The majority of these are associated with specific dates, but these also may appear as number of days (e.g., 'point in time: 10d' is used to denote "10 days" in a mathematical formula).
Delimiters
‘Delimiters’ call out two types of boundaries: 1) the earliest an event is likely to occur ('lower delimiter'), and 2) the latest an event is likely to occur ('upper delimiter'). Like ‘Point in Time’, the majority of these are associated with specific dates, but these also may appear as number of days. Additionally, values, specifically cardinal numbers, may have a point reference and delimiters, too. For example, a cardinal number, like ‘16’ will have a point reference of 16 and delimiters equal to half the unit’s value (i.e., ‘± 0.5’), such that the value ‘16’ will have a lower delimiter of ’15.5’ and an upper one of ‘16.5’.
Time-Date Format
Time and date formats vary as does the granularity used to capture a time or date. (Examples: “January 6, 1950”, “6.1.1950”, “1950-01-06”, “Jan-1950”). Date concepts use the format mm/dd/yyyy; date synonyms can use a variety of recognized formats but associate with a concept using the aforementioned format.
Context
Concept-to-concept mapping connects concepts to formulae that in turn allow them to be plotted on the health timeline. Pre-coordinated, fully specified dates (month/day/year) can usually be plotted directly. Less specific dates (month/day) require additional information for context. For these, the NLP application may need to infer the year by proximal words which imply the tense for the phrase (compare "last 5/16 brain CT performed" with "on 5/16 he will undergo a brain CT").

Semantic Validity
Usually, if all parts of speech in a statement obey the syntactic rules and lead to a plottable timeframe for an event, the rules can be considered semantically valid. Unlike syntax, which may allow for a grammatically acceptable statement that cannot be added to a health timeline,[8] because our intention is to identify plottable data points, the key determinant of validity is that it be plottable. Therefore, the endpoint for using natural language when processing a temporal phrase is to produce a specific date[9] and a range (reasonable lower and upper limits for an event) to indicate when the event most likely occurred or will occur. To this end, when a pre-coordinated concept cannot be identified, parts of speech are joined and if possible, through ‘normalization’ associated with a pre-coordinated concept.
Errata Log
In addition to the patient’s health timeline, an errata log lists all phrases which appear to contain temporal information that cannot be plotted to the timeline. The log fields include the following:
1. Data origin metadata
a. source type (note, lab report, medication entry, etc.)
b. origin facility
c. record date
d. record ID (if available)
2. NLP processing
a. text reviewed (+ 4 words pre- and post- identified words in the phrase)
b. error message/category[10]
c. NLP process date
d. NLP process facility
Parsing of Temporal Phrases
Depending on how explicit the reference lexicals are that are used to deconstruct a phrase, a phrase may be parsed in multiple ways. Let us consider, for example, the phrase “during the past two weeks”. We can parse the phrase in three different ways:
1. (during the) + “past two weeks”
2. (during the) + “past” + “two weeks”
3. (during the) + “past” + “two” + “weeks”
The syntax structures for parsing the phrase are, respectively,
1. (Duration) + Pre-coordinated (Tensed Interval)
2. (Duration) + Tense (Past) + Pre-coordinated (Interval)
3. (Duration) + Tense (Past) + Value (Cardinal number) + Measurement (Unit)
Each of these options may be used and each should lead to the same semantic interpretation. Hence, rules are required that define what component parts of speech may produce an alternative part of speech.[11] The preferred parsing will opt for the pre-coordinated concept—in this case, the Pre-coordinated (Tensed Interval) “past two weeks”. The aim for parsing phrases that do not have a pre-coordinated lexical is to find an association between the parsed components in a phrase and link them to a pre-coordinated concept where possible.
After association of syntactic phrases with pre-coordinated concepts, other temporal parts of speech in proximity to these are used as descriptors, for example, “possibly during the past two weeks”. This phrase can be parsed as “possibly” [certainty], “during” [duration], and ”past two weeks” [pre-coordinated tensed interval]. While the health timeline will show the locus as dse – 14d +/- 3.5d, the two modifiers “possibly” and “during” will be attached to the event’s temporal descriptors.
In general, the approach to natural language processing should first be to explore for immediately interpretable pre-coordinated phrases followed by other pre-coordinated groups and then by other syntactic groups. The preferred search order is as follows:
1. First wave (immediately interpretable pre-coordinated)
a. Time/date
b. Tensed interval
c. Age
2. Second wave (remaining pre-coordinated)
a. Interval
b. Observable narrative
3. Third wave
a. Value
b. Measurement
c. Tense
d. Recurrency
e. Frequency
4. Fourth wave
a. Duration
b. Mode
c. Certainty
Below is a schematic representation of the decision tree showing how temporal statements are parsed.
Examples
Interpretable: “Some time ago the patient experienced headaches.”
Plottable: “Headaches beginning in May 2020.”
Errata: Notes to accompany the patient health timeline or a log of non-plottable statements
Pre-Coordinated: “May 8, 2020”, “in two weeks”
Unlinked: “May 2020”
Tethered: “last May”
Transformation: (requires metadata)—date stamp of entry (“dse”) of Dec. 14, 2020, transforms “last May” into “May 2020”
Concept: “05/2020”
Concept-to-Concept Maps: “point in time: 5/15/2020”, “measure delimiter month: 15d”, “delimiter (lower range): 5/1/2020”, “delimiter (upper range): 5/31/2020”
Parsable: “one and a half years ago”
Syntax: Value + Measurement + Tense
Semantic Validity: yes; (example of syntax using value + measurement + tense without semantic validity: “first years ago”)
Normalization: (using ontology) converts parts of speech to tethered pre-coordinated concepts
Concept-to-Concept Maps: “point in time: dse - 549d”, “measure delimiter year: 182.5d“, “delimiter (lower range): dse - 366.5d“, “delimiter (upper range): dse - 731.5d”
Conclusion
To summarize, determination of a temporal phrase's meaning (whether through pre-coordination or built from parts of speech) requires conforming to recognized syntax and being semantically valid. Many temporal phrases can be associated with concept-to-concept links that allow for the introduction of mathematical formulae. These formulae when associated with metadata can help derive dates and date ranges for events.
[1] Dr. Yuval Shahar, Chair of the Medical Informatics Research Center at Ben-Gurion University in Israel, has proposed the IDAN model for users to query for medical terms using a knowledge-based temporal abstraction method. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.109.8042. Shahar has built an ontology of temporality called Knowledge Based Temporal Abstraction (KBTA) [Shahar Y. A framework for knowledge-based temporal abstraction. Artificial Intelligence 90 (1997) 79-133] which supports the Temporal Abstraction Knowledge (TAK) language. Available from: https://www.sciencedirect.com/science/article/pii/S0004370296000252 . For a simpler approach modified from the Shahar model, we might only consider key temporal objects as parts of speech.
[2] Sequential attributes are described by Allen’s Interval Algebra. See James F. Allen: Maintaining knowledge about temporal intervals. In: Communications of the ACM. 26 November 1983. ACM Press. pp. 832–843, ISSN 0001-0782. https://en.wikipedia.org/wiki/Allen's_interval_algebra
[3] Due to the large number of dates and their links to other defining concepts (through concept-to-concept mapping), a proposed operational approach is to use date masking which would allow the interpretation of dates (day.month.year or month/day/year or year-month-day, month/year, year) and associate the correct point in time and delimiter dates based upon a set of rules. The limitation to this approach is the loss of associated lexicals for date concepts (including dates that mix textual months—either complete words or abbreviations with or without periods).
[4] Intervals and tensed intervals larger than one day may be associated with up to four additional concepts: point in time, measure delimiter, delimiter lower range and delimiter upper range.
[5] Reference may be made to a second event rather than the ‘dse’, for example, “7 months after being diagnosed with diabetes mellitus…”
[6] Rule construction is currently incomplete; however, it is grounded in the curation of tens of thousands of NLP phrases from de-identified, plottable clinical text. Curation parses plottable text into parts of speech and determines the requisite components, word order, word proximity, rules, etc.
[7] An example of a mathematical expression concept is 'measure conversion week: x 7d' which is concept-to-concept mapped to the Temporal Object concept 'week(s)'. An additional example, were the phrase "5 weeks ago" not mapped to a pre-coordinated phrase, this could be parsed to "5" + "weeks" + "ago". In the constructed formula "5" would be equal to the number '5', weeks would equal 'x 7d', and "ago" would equal 'dse -'. This would map to “dse – 5 x 7d” (i.e., date stamp of entry minus 35 days). A second example is “4-week complaint of”. This can be parsed to “4” and “week complaint of”. The two concept-to concept mathematical expressions for “week complaint of” are ‘measure conversion week: x 7d’ and ‘calculation: dse –‘. In other words, the application will multiply the number 4 times the measure conversion formula x 7d and include that in the calculation dse -. In other words, ‘dse –‘ (‘4’ ‘x7d’), which is the same as ‘dse -28d’.
[8] Like the example above, “…was some time ago.”
[9] an approximation of an “exact” date for an event
[10] For example, syntax, semantic validity, missing metadata [if needed for transformation], missing value, ambiguous occurrence or data, duplicate or copy forward, etc.
[11] e.g., Value (Cardinal number) + Measurement (Unit_hour) ∈ Pre-coordinated (Interval). i.e., a number value and a measurement unit are elements of a pre-coordinated interval; Pre-coordinated (Interval) + Tense (Past) ∈ Pre-coordinated (Tensed Interval).

Comments