Making computers smarter, and helping deaf people too
A friend of mine is very hard of hearing — not quite deaf enough to fully belong to the deaf community, but sufficiently deaf that participating in a conversation is terribly hard work for her. She does her best to put together what she can hear with what she can lip read and what she can extrapolate, and then she asks her conversational partners to repeat themselves as often as she can bear. I was shocked to hear just how exhausting and isolating it is for her.
One of the young researchers here is developing a solution that could make a big difference for people like her, as well as the fully deaf — and even for journalists. In particular, Walter Lasecki of the University of Rochester (together with his advisor Jeffrey P. Bigham) is creating a system to transcribe conversations in real time, with no advance planning, for a fraction of the cost of a skilled human transcriber.
Lasecki’s basic idea is to crowdsource the problem, using Amazon’s Mechanical Turk (or another service) to get six or seven people to simultaneously transcribe bits of the conversation. His software then stitches together the transcriptions using their overlaps to get a single coherent, accurate transcript. Ordinarily, transcription at real-time requires a highly skilled transcriber, who might charge $150 to $300 an hour; Lasecki’s system harnesses the ability of ordinary folks.
Lasecki pointed out that one of the big advantages of his system is that it eliminates scheduling hassles. Universities are required to provide “reasonable accommodation” for students with disabilities, which includes providing sign language interpreters. But usually, there are only a very few interpreters available, so if a student needs assistance at the last minute and hasn’t scheduled it at least 24 to 48 hours in advance, he may well be out of luck. But Lasecki’s system is always just a cell phone app away — and he aims for it cost no more than $50/hour.
Furthermore, for someone like my friend, who is hard of hearing but not quite deaf, sign language interpretation can be confusing and difficult. American sign language is truly a language, with its own syntax and grammar, not simply a transcription of English into motions. So catching some of the English and simultaneously watching the sign language interpretation requires a strange bifurcation of one’s mind.
That has also set a demanding task for Lasecki’s transcription system, because it requires that it work very, very quickly. If the system produces a transcription with a 15-second delay, a hard-of-hearing person who catches every other word can’t hold what they’ve heard in their minds and then use the transcription to fill it in, and so they’ll be forced to ignore what they hear entirely and rely fully on the transcription, and they then won’t be able to put together the words with the facial expressions and gestures. So he’s aimed to have his system produce a transcription within five seconds. He’s currently at just under four.
Lasecki came to this research through his desire to make computer interfaces smarter. In particular, computers like precisely prescribed tasks, but humans need them to be able to do things that aren’t so carefully spelled out. For example, when humans understand language, we often don’t catch every word that people say, but we fill in the missing words to create something coherent. Computers aren’t good at that kind of thing. When he started his PhD research with Jeffrey Brigham, he adopted Brigham’s approach to this issue: If humans can do this stuff so well, and computers suck at them, then let’s harness the power of people to do it! Brigham’s team has created crowdsourced systems to do all kinds of things: steer robots, identify objects, answer questions expressed in ordinary natural language, and, in this case, transcribe conversations. Many of these capabilities are especially useful for people with disabilities of one kind or another. The mobility impaired, for example, can use a steerable robot to retrieve things, and the blind can use an object identification system to tell them whether a tube contains eye drops or super glue. “I never had any exposure to these communities before grad school, and now I’m neck deep in them,” Lasecki says. “My broad interest is still in intelligent interfaces, and supporting these communities is one of the great uses of them.”
As he and I talked, I took notes, getting down perhaps three-quarters of what he said. “It’s too bad I don’t have a data plan in Germany!” Lasecki said. “I could show you the system and spare you the trouble.”