Speech recognition – fact or fiction?

01 June 2016, at 1:00am

Adam Bernstein suggests there are money and time savings to be made, and talks to a number of software providers who explain how the technology works.

KUBRICK’S MASTERPIECE, 2001: A SPACE ODYSSEY, features a number of technologies that have moved from lm to workplace. One, speech recognition, is now gaining traction and while we have some way to go before we have sentient computers, the latest developments in speech recognition now make it suitable for many environments including veterinary practices – for both secretarial and vet use alike.

In simple terms, speech recognition systems are computer-based applications that accept a source of audio and turn that into text with (in the practice context) the aid of a medical-specific vocabulary. 

Sarah Fisher, responsible for healthcare regional marketing at Nuance Communications, says the technology allows practice staff to dictate directly into patient records and other clinical documentation “to document animal care in their own words and at the point of care... [while] spending less time typing and clicking”.

Indeed she reckons, with Nuance’s software at least, that staff can navigate systems using voice commands up to three times faster than most people can type or click with a mouse. 

There are, says Dr Andrew Whiteley, managing director of Lexacom, two different ways of converting speech to text: “The first system uses a first person (foreground) method, whereby words appear on the screen as they are dictated, and the second uses a deferred (background) method whereby a document is dictated and the audio is sent to a server. The server turns that document into text and returns it.”

Foreground is typically used for data entry, while background is often used for creating letters and lengthier documents. Either way, both are a marked improvement on tape transcription which can take days to turn around.

Clearly anything that can be used to reduce the burden of data entry into clinical systems is to be welcomed, which means there is great potential for speech recognition.

In particular, the technology can reduce administration time for vets while improving accuracy of captured information within clinical systems. For secretaries, background systems can be used to reduce the time and effort involved in creating referral letters and other correspondence.

John Bendall, director of UK Operations at Crescendo Systems, makes the point that savings follow from either approach because of reduced keyboard activity and this improved accuracy. He adds that systems aren’t dumb and that “any corrections made by the user to recognised text are returned to the system so it can learn new words”.

“The benefits are obvious,” says Whiteley. “It can reduce the turnaround times of letters, improving referrals, and can also reduce the cost of secretarial support while smoothing out fluctuations in typing demand.” For Fisher the saving is clear: “Time saved can equate to up to 30 to 60 minutes per day, per user.”

A few considerations

There are no drawbacks as such, but it’s important to understand that there is no “one size ts all” approach to speech recognition and so the technology needs to be used with common sense.

By definition of it being sound- actuated, speech recognition will not work well in noisy or disruptive environments. Further, systems need both high-quality recording devices, and for the author to dictate in a specific way, being explicit about the punctuation for instance. There may also be some accuracy issues – busy vets may not check the words as they appear to ensure the system has translated correctly.

It’s also worth noting that there may be privacy and confidentiality issues – from overhearing clients and staff alike – as details are spoken instead of typed. This is less of an issue for veterinary practices compared to GP surgeries, but privacy issues do remain.

One obvious question that follows is how systems cope with regional accents. Well, reckons Whiteley. He says systems are configured to learn and adapt to such things: “Speech recognition is based on the probability of a word being that word, rather than another. In medicine, there are lots of complicated words; unique sounding words, for instance the probability of ‘feline lymphoma’ being anything [else] is very small.”

He notes, however, saying “tree” could lead a system to think it’s “three”, or “thee”, etc. The advice is that time and patience are required during the learning process, which can be likened to teaching Siri on the iPhone to recognise a given user’s voice.

Fisher says it’s important to recognise that in most situations, success gains follow from understanding the working practices of the clinic and by creating standard templates to speed up processes.

What to look for

The technology isn’t cheap but it can offer good value and savings over time. Here Fisher says that Nuance estimates – for a single user – its “software will pay for the investment in software, the microphone and half a day of training and set-up in less than six weeks” (the company is assuming a vet “costs”

£70 per hour and works five days per week). And on top of that come savings from reduced administration, improved quality of notes, and an improved client experience.

Standalone speech recognition systems can be bought for £800-£1,000 per user plus the cost of training – Nuance, for example, charges £995 for the software and its partners may charge between £350 and £500 for training and set-up, and Crescendo charges vary from £899 for a one-off licence to as low as £700 for larger volumes.

For some, this large up-front cost combined with uncertainty of what success a system may bring may be putting them off. There is an alternative – practices can subscribe on a monthly basis. Lexacom, for example, charges £20 per user per month – for an embedded application which, says Whiteley, “fully integrates with all primary care clinical systems”.

This option may well be helpful to practices that don’t want a large single bill without understanding if speech recognition is suitable for them. However, they will also need to subscribe to other products from Lexacom for the service to work. (Nuance also offers subscription access, with prices being set by its resellers and Crescendo is in the process of launching a subscription model too.)

There is another cost consideration, one made by Bendall, and it’s that a system must allow for both foreground and background speech recognition to eliminate the need to pay for two licences.

Overall, it’s the outright cost or subscription obligation that makes it important that practices try before they buy and also ensure they have buy-in from potential users. Further, practices need to look at what problems they are trying to solve.

As Whiteley puts it: “If they are buying the technology to reduce the workload for a secretarial team, transferring the burden to the vet is unlikely to help.” In this scenario, he recommends a background system.

For these reasons practices should look for a system that allows the use of digital dictation, outsourced transcription and speech recognition so that there are a variety of solutions for each user.

Fisher agrees on this: “Some may prefer to dictate with their client listening during a consultation; some may wish to summarise the consultation after the face-to-face.” She thinks the software used should support whatever the preferred work practice of the individual.

It also makes sense to have a system that, if required, integrates well with other practice systems and which is virtually training-free. Bendall suggests a system must be capable of handling “roaming profiles” so vets can move between consultation rooms and use their personal profile for the best accuracy.

Also, systems should be capable of creating voice shortcuts so individuals can perform commands by voice such as launching a specific template or opening a client record.

Adoption of the technology is slow, but growing as it overcomes earlier (failed) attempts to use it. Says Fisher: “Many medical professionals may have tried earlier versions of speech software years ago and may not have been impressed with the results. However, modern speech recognition solutions take minutes to get set up and get going.

“The latest speech recognition solutions designed specifically for medical situations – combined with today’s more powerful PCs – boast performance and capabilities that far exceed their predecessors.”

Whatever practices do, they shouldn’t necessarily choose the cheapest option and certainly not one without a medical dictionary built in. Bendall illustrates this: he says Dragon Medical in a surgery is over 30% more accurate than Dragon Professional, and that “choosing non-medical versions of speech recognition means the users have to add drug names and terminology on an ongoing basis, which takes time and can be very frustrating”.