Google AIY Voice Kit Review

AIY Voice Kit box

Google is selling a DIY smart speaker kit in the US through Target stores. They call it an AIY Voice Kit with the sub-heading of a “Do-it-yourself intelligent speaker.” Is it a kit that lives up to what the box promises for only $50? Let’s find out, together.

I’m not super fond of Google, they’re great at search but really make their money today by selling advertising space on websites. In my opinion, their “don’t be evil” motto has shifted as their priorities changed. There’s always the upfront cost of a product ($50) but with any smart speaker device there’s also the intangible cost of allowing a company to listen to, and process, whatever it can hear.

Ideally a smart speaker would only listen after a physical input, or most smart speakers also have a wake word to summon the device to interpret your speech and do something with it.  This kit has an arcade button on top for physical input if that’s your game.

The do-it-yourself aspect is mostly fun, you assemble the included bare speaker, wires, cardboard, arcade button, and a Raspberry Pi Zero WH with Google’s Voice Bonnet add-on board to make the smart speaker. It’s not very difficult to put this kit together, the instructions are clear, but it is missing two things you’ll need, and one critical component of the setup requires other tools or devices.

You’ll need a very small flathead screwdriver to connect the speaker cables to the terminals. I happen to have the right screwdriver, but these terminal screws are incredibly tiny. Your regular household tools aren’t going to work with them.

It only takes about an hour before you’re putting the included Micro SD card in and powering the speaker up, or you would be. If there were a power supply included. You get a USB cable in the box, but no power connection.

Why not include the power supply and the screwdriver in the box? The screwdriver is almost understandable, because you could own one already if you’re into technology. The power supply is just necessary for the device to function, it makes no sense to me that it isn’t included in a general-purpose kit.

Wires

There’s one other small issue with the connections inside the kit. The wires to connect the arcade button are not friendly to the color blind. I am only mildly color blind, so I can’t differentiate between some colors with red and green in them. The arcade button wires are blue, green, grey, black, red, and orange. I had a hard time picking out the green from the grey and the red from the orange.

Okay you’ve got the kit assembled, and you’ve found a power supply to turn it on.

How do you connect to the box so that you can get it on your home WiFi?

The Rapsberry Pi Zero WH included with the kit has USB, it has HDMI, but they’re all mini connectors that need adapters and a hub to connect a keyboard and mouse. The other option, and this is what I chose, is to use an app that is only available for Android devices to get the diy smart speaker onto WiFi and find out the IP address so you can connect to it via SSH.

Once you get that IP address, and learn SSH and the Linux shell, you’re in business with a shell prompt at a Linux terminal running a variant of Raspian that Google’s engineers modified to support their Voice Bonnet.

Finally, you’ve got a smart speaker, right?

This is the real thing that kills this project, it doesn’t include any kind of hot-word, or wake-word, detection. Just like hotkeys, hotwords like “Hey, Siri,” and “Okay, Google” tell our phones and other smart speakers to start listening. Ideally the processing for these prompts happens on the device so they’re not just uploading everything you say to Apple’s, Google’s, or Amazon’s, servers.

This AIY smart speaker box promises, on the back, a “…smart device that understands and responds when you speak.” I don’t think that is truthful. It is not at all a smart speaker that listens when you speak, you have to press the arcade button before the included Python code will fetch Google’s assistant to start listening and interpreting your words into a reply. It’s an infuriating experience to have to press that button, especially whenever Google’s assistant demands interaction.

Google’s assistant can play a MadLibs game with you. Just like the real game, you supply the nouns, verbs, and adjectives and the assistant fills in a virtual MadLibs sheet to make a silly story. Unlike the real game, you have to press the stupid button each time the assistant needs the next word.

The times when I’d press the button there was no guarantee the assistant would listen. Many times it would just ignore me and I’d have to press the button again. I ended up pressing the button about 25 times to get 18 words into the MadLibs game. I don’t think I will ever do that again.

This built-in python-based assistant code was just slow to react and frustrating to interact with.

It was also incredibly limited compared to other assistants and even the iOS version of Google’s assistant is easier to use. This smart speaker version of Google’s assistant can’t even access your calendar or other information tied to your Google account.

So, overall it’s a pretty disappointing device as shipped by Google. But this is a DIY thing, right? Well, I haven’t found much of an active development community around it. The forums for Google’s “AIY” projects are sparsely populated and the best use I’ve gotten out of the device was to load free software onto it that made the assembled device into a genuinely useful AirPlay speaker.

Some of the replies from Google engineers on these forums indicate that more functionality could come to the device soon, but I don’t think they have any plans to add hot-word detection.

The most surprising thing I’ve found on that forum is that there was an older version of this project that included hot-word detection. This was possible when version 1 was based on the more capable Raspberry Pi 3 single-board computer. Apparently this is version 2 of their voice kit.

I don’t understand a lot of the choices Google made here, but the most important question is: Why did they drop the hot-word detection? Why don’t they mention anywhere on the box that you need an Android device or a bunch of adapters so that you can get this device on the network?

Maybe parents buying this kit for teenagers (the box lists it as appropriate for ages 14 and up) were concerned about it listening to them all the time. That’s the only reason I can think of as to why Google decided to drop the smartest feature of a smart speaker, otherwise it’d just be down to cost. The Raspberry Pi Zero WH is about $10, the Raspberry Pi 3 is about $35.

When I first saw this project in the store I knew there had to be some limitations to hit that $50 price point, and it went lower than even my wildly low expectations. I don’t think most people would be happy with the device as a “smart speaker.” Years ago, when you assembled a transistor radio kit, you ended up with a radio. What you end up with here is a very versatile Linux computer kit with microphones and a speaker that could be incredibly useful in the right hands. I turned it into an AirPlay speaker without having to write any code at all, and I haven’t even remarked about the quality of the sound yet — it’s fine in general, but turn it up loud and you’re gonna get distortion — but without hot-word detection this kit is just too dumb to be called smart.

1 out of 5 HomePods for the Google AIY Voice Kit