But doing the same thing in reverse has proved a harder nut to crack.
This is surprising. Short phrases can be shouted into the search box on your phone or spoken into a translation app and be recognised immediately – but longer passages have remained the province of the stenographer – teams of whom can now be hired on the internet.
However, a few apps are beginning to change the pattern. The best of them can record an entire meeting, transcribing and punctuating everything that is said.
The granddaddy of voice recognition software is Dragon Naturally Speaking, a computerised dictation taker which I first used in the 1990s. It was a cumbersome process: the program had first to be “trained” to understand my voice, and mine alone.
Naturally Speaking is still available, with versions starting at £140, but compared to phone apps like Otter, it’s showing its age.
Available for Apple and Android phones, Otter is capable of transcribing live or recorded speech and uploading the resulting text to a cloud, from where you can access it on any device. Around six hours a month can be stored for free, after which the service costs $3 a month for students and teachers, or $10 for everyone else.
I tested it with a recording of an interview that had taken place several weeks earlier in a noisy environment, and was surprised at how well it did at telling one speaker from another and recognising the construction of sentences. But it’s not perfect, and the text is always going to require a careful edit before it can be circulated.
Otter is not the only such app out there but it is one of the few to work without training, and in situations where lots of people speak at once. Just Press Record, for Apple devices only, is a credible alternative at around £5, and Reason8, which is free and available for Android phones, too, is also good on paper. It allows delegates to join a meeting remotely, although I couldn’t get it to work on my phone.
There is, though, a more obvious solution, and it’s already installed on all your devices. Among YouTube’s lesser-known functions is the ability to create subtitles for every piece of video you throw at it. It does this automatically; all you need do is wait half an hour or so.
That means you can record a conversation using the standard app on your phone, then upload the file and let the speech recognition engine do the rest. The subtitles can be saved as a simple text file.
YouTube doesn’t accept plain audio, so you will need to convert your recordings to video by adding a picture of some sort. Websites like tunestotube.com, which generate a simple caption, are the easiest way to do this.
YouTube’s subtitle engine is basic; it doesn’t punctuate text and it can’t separate different speakers. It’s also prone to producing some surprising results. On my test recording, the word “vane”, as in “I’m not a vane woman”, was translated as “sane”, and other passages were rendered as gibberish.
But used with caution, it’s a tremendous time saver – and one that has arrived not before time.