An alternative to touchscreens? In-car voice control is finally good

Over the past decade or so, cars have become pretty complicated machines, with often complex user interfaces. Mostly, the industry has added touch to the near-ubiquitous infotainment screen—it makes manufacturing simpler and cheaper and UI design more flexible, even if there’s plenty of evidence that touchscreen interfaces increase driver distraction.

But as I’ve been discovering in several new cars recently, there may be a better way to tell our cars what to do—literally telling them what to do, out loud. After years of being, frankly, quite rubbish, voice control in cars has finally gotten really good. At least in some makes, anyway. Imagine it: a car that understands your accent, lets you interrupt its prompts, and actually does what you ask rather than spitting back a “Sorry Dave, I can’t do that.”

You don’t actually have to imagine it if you’ve used a recent BMW with iDrive 8 or a Mercedes-Benz with MBUX—admittedly, a rather small sample population. In these cars, some of which are also pretty decent EVs, you really can dispense with poking the touchscreen for most functions while you’re driving.

You can be general with your commands—if you tell the car “I’m cold,” it will bump up the cabin temperature, for example. Or you can be specific—telling the car to “set the front temperature to 75 degrees” or “turn on the seat heater to level 2” is, to me at least, a lot easier than remembering which segment of a touchscreen I’m supposed to poke.

The voice recognition is even good enough to understand me when I tell it to navigate to a specific address, to the point that I actually use the native navigation systems if I’m driving a modern BMW or Mercedes rather than relying on CarPlay like everyone else. Having passengers in the car doesn’t pose many problems, either—something you can’t say about BMW’s gesture control when the front-seat passenger talks with their hands.

Some of that credit should probably be directed at Cerence, which supplies (among other things) the voice assistant to both BMW and Mercedes (as well as BYD, Renault, VinFast, and others, Cerence told Ars). Because much of the software runs on the car, it has access to functions that cars using Google’s Android Automotive OS don’t. What’s more, Google’s once-heralded voice assistant feels like it has gotten worse at understanding speech over the past 12 months or so, for reasons I have yet to fathom.

Current BMWs have very good voice recognition and natural language processing, but they also include a physical jog dial, so you neither have to talk nor touch a screen if you don't want to. — Enlarge / Current BMWs have very good voice recognition and natural language processing, but they also include a physical jog dial, so you neither have to talk nor touch a screen if you don’t want to.

My enthusiasm for talking to cars appears to put me in a minority. Despite a generation of nerds growing up with the adventures of KITT and Michael Knight, it seems like no one else wants to talk to their cars. Some of that is an exposure problem—as mentioned earlier, good voice control systems are not widely distributed yet.

But even among my colleagues who test the same cars for other outlets, I’m mostly greeted with skepticism when I praise good voice interfaces.

A 5,000-lb car is not the same as a smartphone

“I think part of it is just that there’s something inherently social about language. For thousands of years it has developed as an inherently social system. So I think there is something in human beings that is hesitant to talk to something that is not another sentient being,” said Betty Birner, a professor of linguistics and cognitive science at Northern Illinois University.

“We’ll talk to our dogs, but we may not want to talk to our toaster. So I think that’s a little part of it. That we use language to communicate and we have a notion of what communication means, and it means another mind. Right? My mind in communication with yours,” she told me.

“The other thing I mean, the obvious thing, is your car can kill you. Your toaster—I mean it could kill you, but you’ve really got to work at it. With a car there’s a real danger, so you need to really, really trust the artificial intelligence there, and I think people don’t understand how far artificial intelligence and natural language processing have gotten, and they’re not going to trust it with their life. Which, you know, is understandable,” Birner said.

https://arstechnica.com/?p=1916860

Sean Gallagher and an AI expert break down our crazy machine-learning adventure

We’ve spent the past few weeks burning copious amounts of AWS compute time trying to invent an algorithm to parse Ars’ front-page story headlines to predict which ones will win an A/B test—and we learned a lot. One of the lessons is that we—and by “we,” I mainly mean “me,” since this odyssey was more or less my idea—should probably have picked a less, shall we say, ambitious project for our initial outing into the machine-learning wilderness. Now, a little older and a little wiser, it’s time to reflect on the project and discuss what went right, what went somewhat less than right, and how we’d do this differently next time.

Our readers had tons of incredibly useful comments, too, especially as we got into the meaty part of the project—comments that we’d love to get into as we discuss the way things shook out. The vagaries of the edit cycle meant that the stories were being posted quite a bit after they were written, so we didn’t have a chance to incorporate a lot of reader feedback as we went, but it’s pretty clear that Ars has some top-shelf AI/ML experts reading our stories (and probably groaning out loud every time we went down a bit of a blind alley). This is a great opportunity for you to jump into the conversation and help us understand how we can improve for next time—or, even better, to help us pick smarter projects if we do an experiment like this again!

Our chat kicks off on Wednesday, July 28, at 1:00 pm Eastern Time (that’s 10:00 am Pacific Time and 17:00 UTC). Our three-person panel will consist of Ars Infosec Editor Emeritus Sean Gallagher and me, along with Amazon Senior Principal Technical Evangelist (and AWS expert) Julien Simon. If you’d like to register so that you can ask questions, use this link here; if you just want to watch, the discussion will be streamed on the Ars Twitter account and archived as an embedded video on this story’s page. Register and join in or check back here after the event to watch!

https://arstechnica.com/?p=1783029

Ars AI headline experiment finale—we came, we saw, we used a lot of compute time

We may have bitten off more than we could chew, folks.

An Amazon engineer told me that when he heard what I was trying to do with Ars headlines, the first thing he thought was that we had chosen a deceptively hard problem. He warned that I needed to be careful about properly setting my expectations. If this was a real business problem… well, the best thing he could do was suggest reframing the problem from “good or bad headline” to something less concrete.

That statement was the most family-friendly and concise way of framing the outcome of my four-week, part-time crash course in machine learning. As of this moment, my PyTorch kernels aren’t so much torches as they are dumpster fires. The accuracy has improved slightly, thanks to professional intervention, but I am nowhere near deploying a working solution. Today, as I am allegedly on vacation visiting my parents for the first time in over a year, I sat on a couch in their living room working on this project and accidentally launched a model training job locally on the Dell laptop I brought—with a 2.4 GHz Intel Core i3 7100U CPU—instead of in the SageMaker copy of the same Jupyter notebook. The Dell locked up so hard I had to pull the battery out to reboot it.

But hey, if the machine isn’t necessarily learning, at least I am. We’re almost at the end, but if this were a classroom assignment, my grade on the transcript would probably be an “Incomplete.”

The gang tries some machine learning

To recap: I was given the pairs of headlines used for Ars articles over the past five years with data on the A/B test winners and their relative click rates. Then I was asked to use Amazon Web Services’ SageMaker to create a machine-learning algorithm to predict the winner in future pairs of headlines. I ended up going down some ML blind alleys before consulting various Amazon sources for some much-needed help.

Most of the pieces are in place to finish this project. We (more accurately, my “call a friend at AWS” lifeline) had some success with different modeling approaches, though the accuracy rating (just north of 70 percent) was not as definitive as one would like. I’ve got enough to work with to produce (with some additional elbow grease) a deployed model and code to run predictions on pairs of headlines if I crib their notes and use the algorithms created as a result.

But I’ve got to be honest: my efforts to reproduce that work both on my own local server and on SageMaker have fallen flat. In the process of fumbling my way through the intricacies of SageMaker (including forgetting to shut down notebooks, running automated learning processes that I was later advised were for “enterprise customers,” and other miscues), I’ve burned through more AWS budget than I would be comfortable spending on an unfunded adventure. And while I understand intellectually how to deploy the models that have resulted from all this futzing around, I am still debugging the actual execution of that deployment.

If nothing else, this project has become a very interesting lesson in all the ways machine-learning projects (and the people behind them) can fail. And failure this time began with the data itself—or even with the question we chose to ask with it.

I may still get a working solution out of this effort. But in the meantime, I’m going to share the data set on my GitHub that I worked with to provide a more interactive component to this adventure. If you’re able to get better results, be sure to join us next week to taunt me in the live wrap-up to this series. (More details on that at the end.)

Modeler’s glue

After several iterations of tuning the SqueezeBert model we used in our redirected attempt to train for headlines, the resulting set was consistently getting 66 percent accuracy in testing—somewhat less than the previously suggested above-70 percent promise.

This included efforts to reduce the size of the steps taken between learning cycles to adjust inputs—the “learning rate” hyperparameter that is used to avoid overfitting or underfitting of the model. We reduced the learning rate substantially, because when you have a small amount of data (as we do here) and the learning rate is set too high, it will basically make larger assumptions in terms of the structure and syntax of the data set. Reducing that forces the model to adjust those leaps to little baby steps. Our original learning rate was set to 2×10^-5 (2E-5); we ratcheted that down to 1E-5.

We also tried a much larger model that had been pre-trained on a vast amount of text, called DeBERTa (Decoding-enhanced BERT with Disentangled Attention). DeBERTa is a very sophisticated model: 48 Transform layers with 1.5 billion parameters.

DeBERTa is so fancy, it has outperformed humans on natural-language understanding tasks in the SuperGLUE benchmark—the first model to do so.

The resulting deployment package is also pretty hefty: 2.9 gigabytes. With all that additional machine-learning heft, we got back up to 72 percent accuracy. Considering that DeBERTa is supposedly better than a human when it comes to spotting meaning within text, this accuracy is, as a famous nuclear power plant operator once said, “not great, not terrible.”

Deployment death spiral

On top of that, the clock was ticking. I needed to try to get a version of my own up and running to test out with real data.

An attempt at a local deployment did not go well, particularly from a performance perspective. Without a good GPU available, the PyTorch jobs running the model and the endpoint literally brought my system to a halt.

So, I returned to trying to deploy on SageMaker. I attempted to run the smaller SqueezeBert modeling job on SageMaker on my own, but it quickly got more complicated. Training requires PyTorch, the Python machine-learning framework, as well as a collection of other modules. But when I imported the various Python modules required to my SageMaker PyTorch kernel, they didn’t match up cleanly despite updates.

As a result, parts of the code that worked on my local server failed, and my efforts became mired in a morass of dependency entanglement. It turned out to be a problem with a version of the NumPy library, except when I forced a reinstall (pip uninstall numpy, pip install numpy -no-cache-dir), the version was the same, and the error persisted. I finally got it fixed, but then I was met with another error that hard-stopped me from running the training job and instructed me to contact customer service:

ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateTrainingJob operation: The account-level service limit 'ml.p3.2xlarge for training job usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please contact AWS support to request an increase for this limit.

In order to fully complete this effort, I needed to get Amazon to up my quota—not something I had anticipated when I started plugging away. It’s an easy fix, but troubleshooting the module conflicts ate up most of a day. And the clock ran out on me as I was attempting to side-step using the pre-built model my expert help provided, deploying it as a SageMaker endpoint.

This effort is now in extra time. This is where I would have been discussing how the model did in testing against recent headline pairs—if I ever got the model to that point. If I can ultimately make it, I’ll put the outcome in the comments and in a note on my GitHub page.

https://arstechnica.com/?p=1781863