Beyond speech recognition to speech understanding: Podcast interview with M*Modal’s Juergen Fritsch (transcript)
Posted Apr 26 2010 12:32pm
This is a transcript of my recent podcast interview with M*Modal’s Chief Scientist, Juergen Fritsch.
David Williams: This is David Williams, co-founder of MedPharma Partners and author of the Health Business Blog. I’m speaking today with Juergen Fritsch, Chief Scientist at M*Modal . Thanks for your time today.
Juergen Fritsch: Thanks for your time. I appreciate it.
Williams: We’ve just been through a very interesting demo that unfortunately the blog listeners and readers won’t be able to observe. Tell me about speech understanding and how it from speech recognition.
Fritsch: Speech recognition to me is the literal translation of the spoken word to text. Speech understanding goes beyond that, into understanding what the intent of the communication is and understanding what the meaning is behind the dictation, especially in the health care domain.
When physicians document they are very used to humans transcriptionists interpreting the dictation. They are typically pretty narrative and very goal driven. They don’t expect somebody to transcribe exactly what they speak, but rather really interpret and understand the meaning they want to convey and turn that into a document. And that’s really what speech understanding is about, not literally transcribing but capturing the essential clinical facts and the meaning that the physician wants to convey.
The implication is that as physicians adopt speech recognition technology they are finding themselves having to change their pattern of dictation and their style of dictation just to accommodate the technology. Speech understanding is a very different approach. We’re trying to accommodate physicians’ style by adapting the technology to the physicians’ way of speaking rather than the other way around. It’s a paradigm shift in how to turn voice into documents.
Williams: I am familiar with speech recognition and how people are using it with EHRs. There is a good amount of satisfaction with using that relative to typing. What feels different for a physician if they’re using speech understanding instead of speech recognition when they’re working with an electronic health record?
Fritsch: Usability improves. With speech recognition, there is already an improved usability as opposed to typing and selecting information from drop down menu’s and other things that have been traditionally used with the EMR system. But physicians find themselves going through a lengthy training period with speech recognition systems. They have to train a profile for their voice and improve recognition accuracy that way. They still have to navigate around by clicking or by issuing commands into different aspects of the applications. So there is an improvement in usability but it’s not where it should be in our view.
Speech understanding takes it one level further where it’s about allowing physicians to narrate the clinical facts in a very conversational manner. Rather than clicking on the allergy field and saying “Tylenol” or “penicillin” and then clicking on adverse reaction and saying “severe reaction,” we allow physicians to just narrate the fact that their patient has a severe allergy to penicillin. That should be all that is needed for the system to pick out the intents and the components of that dictation and populate the fields in the EMR. In that sense, speech understanding takes it to a much more user-friendly level where physicians can continue to narrate their findings the same way they have done with transcription services but still see that they are populated in a structured way in the EMR.
Williams: What about for a physician is who not using an EMR? Are they still able to make use of your system?
Fritsch: Absolutely. That’s a good point. We’re not strictly about EMRs, but we are about document-centered reporting in general, allowing physicians to create narrative form of an encounter, which is then structured and coded in a meaningful way.
That can happen on the telephone. They can pick up the telephone and call into the service. It could happen on a mobile device, an iPhone or a Blackberry device. It could happen on a desktop outside of or away from an EMR system. That’s really not too different in the sense of usability from the physicians’ perspective.
It’s all about allowing them to not have to change their behavior to dictate and narrate the encounter yet get the structure and encoded documents. To that extent we have adopted HL7’s document architecture for representing both the narratives and any structured information and coded information that we can derive from the narrative. This is the perfect vehicle for allowing physicians to narrate and see the narratives and then repeat the narratives and at the same time capture the essential clinical facts at the same time and allows the physicians to review and validate the correctness of those facts. And since it’s a standard it enables interoperability with all kinds of clinical systems out there.
Williams: How is the technology actually delivered? Is it a piece of software? Is it a service?
Fritsch: Typically we deliver this as a Software as a Service offering, a cloud based computing offering. We’re not typically selling to the end user market; we work with partners in various health care verticals who eventually take our services as web services as cloud computing services and embed them in their offering. So you might see an EMR company embedding our services behind the scenes and using us to give the users a more user-friendly experience in the documentation.
You might see us with radiology vendors and their systems. However you will see not necessarily a local deployment and installation of software, but really a kind of ubiquitous software service model where it doesn’t matter whether you’re on your pc or away with your iPhone, calling in from your car. You will be able to use the same technology with the same user profiles and the same quality of recognition.
Williams: You mentioned working through partners. Do you have specific partners that are public?
Fritsch: Absolutely. In the radiology space we partner with radiology information systems and PACS systems. We’re partnering with many of the big players including GE Health Care, who has embedded our services in their physician-reporting product. There is an EMR space we’re partnering with a variety of big players in the EMR space, including Allscripts.
We are also traditionally in a lot of the transcription service providers, all the companies that provide transcription services to hospitals. Actually eight out of ten such companies are using our services to improve their productivity in creating these documents from speech. You won’t see it directly there but you’ll be able to see the end result, which is a structured document.
Williams: Earlier in the conversation you used the term “meaningful” and so I have to ask you about whether there is a direct fit with Meaningful Use under the HITECH Act.
Fritsch: That’s a loaded question. The term “meaningful use” is over used and misinterpreted in many ways. There is not a single vendor who won’t use the term these days. We have actually been using that term for a long, long time — way before the government started using it.
We use the term “meaningful documents.” These structured and encoded documents fuse narrative with clinical fact, so in that respect they are a conduit to what the government means by meaningful use. We enable compliance with meaningful use rules and follow the government’s outlines of what a physician should document. Once you have structured and encoded information you can set rules on top of that to verify whether it’s complete and accurate and compliant with all these rules.
So we do provide the infrastructure and the technology to drive compliance with meaningful use as the government defines it, but it also goes beyond that. We can access the content of the narrative to drill down in a computable way into the contents of a narrative dictation. That’s really unlocking content that has not been available in the past. In the past, in order to get to any kind of physician support or compliance analysis for different rules, you had to create structured information. You had to go into the EMR style systems to enter information into database tables.
Now with speech understanding and the technology that comes with it, it’s actually possible to unlock the narrative and drill down to the spoken word and find evidence of clinical encounters in clinical conditions that you would not otherwise find.
Williams: What kind of training is required for an end user to be able to use your technology?
Fritsch: Very little. Our focus is on not changing the physician’s behavior. Out of the box, most of the physicians that use our technology find it very useful in the sense that they don’t have to adjust their style. They don’t have to learn how to dictate.
There is a little bit of training to learn the interface. But they don’t have to go through the training and dictation enrollment period as you have to go through with many speech recognition systems. You start using it out of the box and it gets better as you use it.
Williams: I’ve been speaking today with Juergen Fritsch. He is Chief Scientist at M*Modal. Juergen, thank you so much for your time.