With the growing power of large language models (LLMs), structured healthcare data like FHIR (Fast Healthcare Interoperability Resources) is becoming increasingly vital. FHIR is the standard for health data exchange in the U.S. and is widely accepted in clinical systems due to its interoperability and detailed data structure. In the context of AI, especially in Retrieval-Augmented Generation (RAG) systems, FHIR data allows healthcare solutions to provide personalized insights about patient lab results, diagnoses, and medical history.
At the ML team, we explored the huge potential of merging FHIR into RAG systems through different strategies we'll share in this blog. We’ll also cover:
RAG systems can leverage FHIR data to address patient-specific queries effectively. By processing structured data, they generate precise answers that complement the insights provided by specialists, reducing the ambiguity often present in traditional interpretations. For instance, patients can ask, "What do my glucose levels mean?" and receive a tailored response based on their health records, providing clarity on diagnostics and treatment plans.
By combining RAGs and FHIR, users get explanations for complex terms, compare previous results with current ones, and better understand their diagnoses and treatments. It acts as a complement that facilitates data comprehension and enhances communication between patients and healthcare providers.
Despite the potential of combining FHIR with RAG models, there are specific technical challenges:
To tackle these challenges, several strategies have been tested to improve data retrieval and make FHIR data more accessible for AI systems:
Flattening Resources: This strategy involves converting complex FHIR JSON data into simpler, plain text phrases. The goal of flattening is to remove unnecessary layers of structure (like nested fields) and make the information easier to search.
Example: Suppose you have an observation resource in JSON format:
{
"resourceType": "Observation",
"code": {
"coding": [{
"code": "8302–2",
"display": "Body Height"
}]
},
"valueQuantity": {
"value": 170,
"unit": "cm"
}
}
After flattening, it would be simplified to something like:
"Resource type is Observation. Code coding 0 code is 8302–2. Code coding 0 display is Body Height. Value quantity value is 123.6. Value quantity unit is cm."
.
This makes it much easier for a search engine to locate relevant concepts without navigating through nested fields.
Resource as String in Chunks: In this method, FHIR resources are broken into smaller, text-based chunks. A chunk is a section of data that can be processed individually, usually a fragment of the original resource. By segmenting the resource, the RAG model can focus on relevant data sections rather than processing the entire resource at once.
Example: If you break the JSON into chunks, you might have:
"{"resourceType": "Observation", "code": {"coding": [{"code": "8302–2", "display": "Body Height"}]}}"
"{"valueQuantity": {"value": 123.6, "unit": "cm"}}"
Each chunk is then treated as an independent unit, making retrieving precise information easier for the model.
Summarization: Using LLMs, each FHIR resource can be summarized into concise blocks of text (e.g., 800 characters) that capture its key elements. Summaries help improve retrieval precision by focusing on the most important information.
For example, for the FHIR resource we are using, the summary could be:
“The resource is a medical observation indicating the body height of an individual. The observation type is identified as 'Body Height' using the code '8302–2.' The measured height is recorded as 123.6 centimeters (cm).”
We evaluated these strategies using Elasticsearch with boosting for text and cosine similarity to retrieve relevant FHIR data. The key metrics used were:
In some experiments, ID and Date were included in the queries. These represent unique identifiers such as resource IDs or dates of observations, which are key fields in FHIR resources. Including them helps reduce search ambiguity, as they provide precise markers that help the system target the correct resource more effectively.
Here are the findings:
Future work could focus on:
By optimizing the retrieval and structuring of FHIR data, RAG systems in healthcare can provide clear explanations and address patient-specific questions, making complex medical information easier to understand.
These systems serve as a valuable complement to healthcare providers by helping patients interpret their lab results, diagnoses, and treatment options. This enhanced understanding not only improves communication between patients and specialists but also empowers patients to make informed decisions about their health.
Future advancements, such as integrating knowledge graphs and refining data descriptions, will further expand the capabilities of AI in healthcare, fostering a more informed and engaged patient experience.
We're passionate about pushing the boundaries of healthcare technology, constantly exploring ways to merge the latest research with practical solutions for our clients. If you're intrigued by this topic, or AI in healthcare, we'd love to connect. Whether you have questions, ideas to share, or are considering implementing similar solutions, don't hesitate to reach out!