Introduction: Voice Assistant Head Gear
This is a project where I go out of my way to design a head gear to assist me in some daily tasks. The head gear is voice activated and some basic functions have been programmed such as translating text, reading out documents, playing YouTube videos and so on. It houses a Raspberry Pi 4B with 4GB of RAM and an Arduino Nano. The 3D model was designed using Fusion 360 for this project. The Raspberry Pi uses VOSK offline speech recognition which implements an English language model to recognize speech. Its worth noting that to run the VOSK speech recognition toolkit you would need a minimum configuration of a Raspbian OS running on a Raspberry Pi 3.
Apart from the voice commands the head gear has a headset and a 5 inch 800 x 480 screen which supports HDMI interface. The screen is able to lower down and retract back up on voice command. The Vosk toolkit can be installed using the pip command ( pip3 install vosk ) and can be programmed in python as well as many other languages. In this project I have used python as it is easy to get up and running fast.
Supplies
For this project you would need the following items but you may interchange an item with a similar one or according to availability and price.
- 1 x Raspberry Pi 4B - 4GB
- 1 x 3D Printer and PLA filament
- 1 x Small USB Microphone
- 1 x Generic Headset ( Any headset and if connectivity by Bluetooth is available it would be better, but since mine was unable to connect via Bluetooth I opted for an audio jack.
- 1 x 5 inch HDMI interface screen
- 1 x Arduino Nano
- Several M2.5 Hex Screws and Nuts
- Several Washers (5 mm outer diameter)
- 1 x ON/OFF Switch
- 1 x 5v cooling fan (40 x 40 mm)
- 2 x 3 LED strips ( I used green )
- 1 x 40cm HDMI cable (micro to normal)
- Some 22 AWG wires
- 1 x 7.4 V LIPO Battery
- 1 x Power Bank and a cable with Type C and Type A
- 1 x Micro USB Cable with open end wires ( You can strip the wires of a normal cable if you have one lying around)
- 1 x 25kg/cm Digital Servo Motor
- 1 x Radial Bearing (8x14x4 mm - Inner x Outer x Height)
- 1 x 10 cm USB Type A to Type C cable ( Arduino Nano and Raspberry Pi connection)
- 1 x Voltage Regulator
- 1 x Nylon straps for keeping the head gear in place
- 2 x Aluminum angle brackets (10 cm length)
- Some flexible wire guides to house the HDMI and screen power cables (optional)
- 1 x Aluminum plate ( 8 x 5 x 2 cm - Length x Width x Thickness )
Step 1: Build Design - Part 1
The head gear was designed and printed in parts and finally assembled. The main arc like structure that goes over the head is the base in which I started to build on. This arc was dimensioned according to my head measurements and then the rest of the components there after. Once the arc structure was 3D printed, it was attached to the headset at the midpoint. I drilled four 3 mm holes in the middle section of the headset and then placed the Aluminum plate in between sandwiching the arc like structure and the headset. This was done to increase the structural strength. Its important to keep in mind when you design the arc like structure to give just enough room on the sides so that when the headset is worn it had enough room to stretch outwards.
Next up was the design of the moveable arms where one is connected to the servo and the other is hinged with a radial bearing in place to the arc structure. An aluminum rod ( 8 mm diameter and 60 mm length) was used to hinge the joint through the bearing. The two hinge arms are not connected yet but will be assembled with the structure that houses the 5 inch screen. Once the screen structure was designed and printed all three parts were assembled and joined together with the hex screws, nuts and washers.
The Raspberry Pi is powered externally with a power bank unlike the Arduino and the other peripherals such as LEDs, cooling fan, screen, and the servo motor. They are powered by the 7.4V Lipo battery which can be switched on and off with a switch.
Step 2: Build Design - Part 2
The top part of the arc structure is flat in shape and a base plate is glued on top and all the electronics get housed in a casing on top of the base plate. Its worth noting that at this point the placing of the components inside the electronics case depended a lot on the achieving a good balance to determine the center of gravity and ergonomics, hence the Raspberry Pi was shifted a bit back to compensate for the weight of the hinged arms and the screen which when lowered would create a moment about the hinge to topple the headgear forwards. The Lipo battery was also moved accordingly to achieve a good center of gravity.
After packing the electronics in the compartment it was time to seal it up. The top panel covering the electronics housed a small cooling fan to provide sufficient cooling for the Raspberry Pi.
Step 3: Build Design - Part 3
The two strips (each having 3 LEDs) of LEDs were fitted on a slot on both sides of the headgear. The Top panel where the electronics were placed needed added support and so I designed a small structure connecting the arc structure to the top plate on both sides and the LED strips were slotted on these support structures.
The top of the right hinge arm has a magnet and when retracted to 90 degrees in the upright position it comes into contact with another magnet fixed to the top panel which holds the screen upright. The magnets are encased in plastic to provide the right amount of gap between them so that when the screen is in the upright position it has enough attraction to keep it in position but not too strong so that the servo can separate them when the screen is lowered. You would need to find a balance by testing with different material to weaken and strengthen the magnetic field or increasing or decreasing the distance between the magnets.
Once all the structural assemblies were complete the HDMI cable and screen power cables were routed through a flexible material in a way that it doesn't interfere with the motion of the hinged arms. Finally, two aluminum brackets were bolted to the ends of both sides of the arc structure to prevent the hinge arms from moving further down from the line of vision of the person wearing it. When the screen is in the viewing position, the hinged arms rest on the aluminum brackets preventing the arms going further down.
Step 4: Electronics
The servo moves the hinged arms to desired locations and the signal to the servo is detached when it is in the upright or lower positions. This meant that on both resting positions the hinged arms were supported without a force from the servo. At the retracted position the magnets hold the arms in position and when the screen is lowered, the aluminum brackets are used to rest the hinged arms on them. This was necessary as the servo would need just a few hundred milliamps to move the arms to the desired locations and the rest was taken care of without the need for a holding torque from the servo.
The Raspberry Pi communicates via UART (Serial Communication) with the Arduino Nano to control the LEDs, cooling fan, power to the screen and the servo motor. Besides the servo motor, all the others peripherals were switched on and off electronically using 3 TIP31C transistors. The power to the Arduino and the components mentioned above were supplied by the 7.4 V Lipo Battery which also provides the 5V via a voltage regulator for the LEDs, screen, and the cooling fan. The servo runs on 7.4 V.
Attachments
Step 5: Arduino Programming
/*
Arduino code for serial communication with the Raspberry Pi and control of peripherals.
*/
// Include Servo Library
#include <Servo.h>
// Create Servo Object
Servo myservo;
// Definitions
const int led = 3;
const int fan = 6;
const int screen = 5;
// Variables
bool fan_on;
bool screen_down = false;
String command;
int pos = 70;
bool actuate = false;
void setup() {
Serial.begin(9600);
pinMode(led, OUTPUT);
pinMode(screen, OUTPUT);
pinMode(fan, OUTPUT);
digitalWrite(led, HIGH);
}
void loop() {
// Read serial data from Raspberry Pi as strings
if (Serial.available()){
command = Serial.readStringUntil('\n');
command.trim();
if (command.equals("fanon")){
fan_on = true;
}
else if (command.equals("fanoff")){
fan_on = false;
}
else if (command.equals("screendown")){
screen_down = true;
actuate = true;
}
else if (command.equals("screenup")){
screen_down = false;
actuate = true;
}
}
if (fan_on){
digitalWrite(fan, HIGH);
}
else if (!fan_on) {
digitalWrite(fan, LOW);
}
// actuate variable is true when screen motion commands are issued
// after moving the screen the servo is detached
if (actuate){
if (screen_down){
myservo.attach(9);
delay(100);
for (pos = 160; pos >= 50; pos -= 1){
myservo.write(pos);
delay(15);
}
delay(500);
myservo.detach();
digitalWrite(screen, HIGH);
}
if (!screen_down){
digitalWrite(screen, LOW);
myservo.attach(9);
delay(100);
for (pos = 50; pos <= 169; pos += 1){
myservo.write(pos);
delay(15);
}
delay(500);
myservo.detach();
}
actuate = false;
}
}
Step 6: Raspberry Pi Programming
# Raspberry pi code issuing commands via Serial to the Arduino
#Import serial and time modules
import serial
import time
if __name__ == '__main__':
# Setup serial connection and flush the buffer of any data before transmission
# Rest of the code just sends out data typed by the user in the console to the Arduino
# which are for controlling the peripherals.
ser = serial.Serial('/dev/ttyUSB0',9600, timeout=1)
ser.flush();
command = input('type')
if command == "fanon":
ser.write(b"fanon\n")
print("fan on")
elif command == "fanoff":
ser.write(b"fanoff\n")
elif command == "screenon":
ser.write(b"screenon\n")
elif command == "screenoff":
ser.write(b"screenoff\n")
elif command == "screenup":
ser.write(b"screenup\n")
elif command == "screendown":
ser.write(b"screendown\n")
Step 7: The Setup
Basically I have setup a main python program which run on startup of the Raspberry Pi to autonomously run several smaller python scripts to to certain tasks based on the voice commands from the person wearing the head gear. As of now it is able to traverse the file system on voice commands and do certain tasks directed by the user. As of now I have setup several smaller scripts which do tasks such as reading a pdf or a text document in the reading list folder and translating text from pictures or files. In addition to this I have used the pyautogui python library to navigate the browser and perform certain operations such as going to a specific web page such as searching YouTube for a video. This would play the first video that comes up that matches the search criteria dictated by the user.
The script below uses pyautogui library to navigate the OS and run the Raspberry Pi script to issue commands to control the peripherals over the serial communications with the Arduino nano.
import pyautogui
import sys
import time
command = sys.argv[1]
# print(command)
# Open the terminal
pyautogui.hotkey('win')
pyautogui.write('term')
pyautogui.hotkey('enter')
time.sleep(1)
# Type in the command to run the console_control.py script
pyautogui.write('python3 /home/console/Documents/console_control.py')
pyautogui.hotkey('enter')
time.sleep(1)
# Enter the command to send to to the Arduino
pyautogui.write(command)
time.sleep(1)
pyautogui.hotkey('enter')
time.sleep(2)
# Clean exit
pyautogui.write('exit')
time.sleep(1)
pyautogui.hotkey('enter')
The next script is for translating a text from an image which is also run from the main program as a sub script. In order for the voice commands to work you need to installespeak andpyttsx3 which can be both installed simply using pip install. There are some other options which you can explore but I would caution that some are OS specific and I found these two work well on Linux although the voice may sound a bit weird. For translating text you can use the deep_translator library which works with many translating engines such as google, bing etc. Finally you need to install pytesseract to extract text from images for this to work. It may sound daunting to but the installations are very straight forward and you will be up and running in a click.
from deep_translator import GoogleTranslator
from PIL import Image
import pytesseract
import pyttsx3
engine = pyttsx3.init("espeak")
engine.setProperty('voice', 'english_rp+f3')
engine.setProperty('rate', 160)
text = pytesseract.image_to_string(Image.open('/home/console/Pictures/chinesetext.png'), lang="chi_sim")
translated = GoogleTranslator(source='zh-CN', target='en').translate(text)
print(translated)
engine.say(translated)
engine.runAndWait()
Another script is used to scrape data from wikipedia if a page exists for an inquired search criteria by a user. You need the python native OS module and the wikipediaapi library for this. Below you can see the calls to wikipediaapi to get search data and gets stored into a text file which then can be narrated back to the user.
import wikipediaapi
import os
wiki_wiki = wikipediaapi.Wikipedia('en')
page_py = wiki_wiki.page('Autodesk')
if page_py.exists():
print("Page Exists")
print(page_py.title)
text = page_py.summary
with open("Autodesk.txt", "w") as f:
f.write(text)
else:
print("Page not found!")
The next script is to search videos on YouTube which I often do specially music videos while I am multitasking. For this script we need the modules webbrowser to open the browser and traverse, pyautogui to mimic keypresses and mouse clicks, sys module for getting the argument passed when running the script which will be the search query for YouTube, time module and youtubesearchpython module for searching the video.
import webbrowser
import pyautogui
import sys
from time import sleep
from youtubesearchpython import VideosSearch
args = sys.argv
query = args[1]
Search = VideosSearch(query, limit=1)
url = Search.result()["result"][0]["link"]
print(url)
webbrowser.open(url)
sleep(2)
pyautogui.moveTo(762, 153)
sleep(2)
pyautogui.click()
pyautogui.moveTo(800, 5)
sleep(8)
pyautogui.hotkey('f')
sleep(2)
These are the scripts i have come up with so far, but this is only the beginning. You can expand on automating it for as much as you like as long as you keep your main script clean and organized. Now! for the fun stuff. Lets jump in the main script where you can customize the feedback and response from your virtual assistant.
from vosk import Model, KaldiRecognizer
import pyaudio
import pyttsx3
import os
from time import sleep
from PIL import Image
from deep_translator import GoogleTranslator
import pytesseract
from PyPDF2 import PdfReader
model = Model(r"/home/console/Downloads/vosk-model-small-en-us-0.15")
recognizer = KaldiRecognizer(model, 16000)
mic = pyaudio.PyAudio()
engine = pyttsx3.init("espeak")
engine.setProperty('voice', 'english_rp+f3')
engine.setProperty('rate', 160)
listening = False
active_mode = False
youtube_active = False
translate_text_image = False
read_file = False
wiki_search_active = False
navigate_file_system = False
def get_command():
listening = True
stream = mic.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192)
while listening:
stream.start_stream()
try:
data = stream.read(4096)
if recognizer.AcceptWaveform(data):
result = recognizer.Result()
response = result[14:-3]
listening = False
stream.close()
return response
except OSError:
pass
while True:
print("Waiting for command...")
command = get_command()
if command == "":
pass
elif command == "console":
active_mode = True
engine.say("Hello Hanoon.")
engine.runAndWait()
elif command == "sleep":
active_mode = False
engine.say("Alright, bye bye.")
engine.runAndWait()
listening = False
break
if active_mode:
command = get_command()
if command == "screen down":
engine.say("Okay. I will lower the screen")
engine.runAndWait()
os.system('python3 /home/console/Documents/automate.py screendown')
active_mode = False
elif command == "screen up":
engine.say("Okay. I will retract the screen")
engine.runAndWait()
os.system('python3 /home/console/Documents/automate.py screenup')
active_mode = False
elif command == "fan on":
engine.say("Okay. I will enable cooling fan")
engine.runAndWait()
os.system('python3 /home/console/Documents/automate.py fanon')
active_mode = False
elif command == "fan off":
engine.say("Okay. I will Disable cooling fan")
engine.runAndWait()
os.system('python3 /home/console/Documents/automate.py fanoff')
active_mode = False
elif command == "who are you":
engine.say("My name is console, I was born on May 10 2023. I was created to assist Hanoon with his daily tasks")
engine.runAndWait()
active_mode = False
elif command == "search you tube":
engine.say("What would you like me to search on youtube?")
engine.runAndWait()
youtube_active = True
active_mode = False
elif command == "exit browser":
os.system('killall -9 "chromium-browser"')
active_mode = False
elif command == "translate text":
engine.say("Would you like to translate from an image or a file")
engine.runAndWait()
response = get_command()
if response == "image" or response == "file":
translate_text_image = True
active_mode = False
elif command == "read file":
engine.say("I am ready to read from a file for you")
engine.runAndWait()
read_file = True
active_mode = False
elif command == "":
pass
active_mode = False
else:
engine.say("I don't understand that yet!")
engine.runAndWait()
active_mode = False
if youtube_active:
youtube_search_query = get_command()
engine.say("do you want me to search.")
engine.say(youtube_search_query)
engine.runAndWait()
command = get_command()
if command == "yes":
engine.say("Searching first video on youtube for.")
engine.say(youtube_search_query)
engine.runAndWait()
os.system("python3 /home/console/Documents/youtubesearch.py {}".format(youtube_search_query))
youtube_active = False
active_mode = False
elif command == "no":
engine.say("Sorry, could you please repeat the search query")
engine.runAndWait()
active_mode = False
elif command == "stop search":
engine.say("Well, Alright. Aborting you tube search")
engine.runAndWait()
youtube_active = False
active_mode = False
if translate_text_image:
engine.say("Shall I open the pictures folder for you to choose an image to translate?")
engine.runAndWait()
command = get_command()
if command == "yes":
os.system('pcmanfm /home/console/Pictures/')
engine.say("Please tell me which image you want to select to translate")
engine.runAndWait()
newdir = []
for i in os.listdir('/home/console/Pictures/'):
newdir.append(i.split(".")[0])
response = get_command()
os.system('killall -9 pcmanfm')
if response in newdir:
ind = newdir.index(response)
image_name = os.listdir('/home/console/Pictures/')[ind]
os.system('gpicview /home/console/Pictures/{} &'.format(image_name))
text = pytesseract.image_to_string(Image.open('/home/console/Pictures/{}'.format(image_name)), lang="chi_sim")
translated = GoogleTranslator(source='zh-CN', target='en').translate(text)
engine.say(translated)
engine.runAndWait()
os.system('killall -9 gpicview')
translate_text_image = False
elif response == "stop translate":
translate_text_image = False
else:
engine.say("sorry {} is not in the folder. Please repeat the name of the image file".format(response))
engine.runAndWait()
translate_text_image = False
elif command == "no":
engine.say("Oh i see. my program is still in construction for this feature at the moment")
engine.runAndWait()
translate_text_image = False
if read_file:
engine.say("Would you like me to open the folder, reading list for you")
engine.runAndWait()
response = get_command()
if response == "yes":
os.system('pcmanfm /home/console/Documents/ReadingList/')
engine.say("Which file would you like me to read from the reading list")
engine.runAndWait()
newdir = []
for i in os.listdir('/home/console/Documents/ReadingList/'):
newdir.append(i.split(".")[0])
command = get_command()
os.system('killall -9 pcmanfm')
if command in newdir:
ind = newdir.index(command)
file_name = os.listdir('/home/console/Documents/ReadingList')[ind]
reader = PdfReader('/home/console/Documents/ReadingList/{}'.format(file_name))
engine.say('{} document has {} pages'.format(file_name, len(reader.pages)))
page = reader.pages[0]
text = page.extract_text()
engine.say(text)
engine.runAndWait()
read_file = False
elif response == "no":
engine.say("Oh i see. my program is still in construction for this feature at the moment")
engine.runAndWait()
read_file = False
elif response == "stop reading":
engine.say("Very well, feel free to let me know if i need to change my reading speed")
engine.runAndWait()
read_file = False
As you might notice the first lines are importing the Vosk recognizer and the espeak and pyttsx3 modules for the virtual assistant to respond. The other modules we have discussed above except for the PyPDF2 which allows to extract text from pdf documents which can then be fed to espeak for narrating. At this point you would also need to download the model which you will be using and pass the absolute path to its location on the system. Next few lines make voice adjustments by tweaking the properties such as 'rate' at which the engine narrates.
Going into the code we set some boolean variables for what we want to do and initially set them to 'False'. Based on the user input we switch these to True which enables to run a specific function for a specific task. Make sure to reset the state back to False after performing the operation or the program will go haywire. The get_command() function gets runs repeatedly once a task is completed so the user can move to another task. Most of the task fucntions will direct to those mini scripts which keeps the main code clean and easy to manage. You might also notice that I have left responses to some commands as "I don't understand that yet!" and "Program is still in construction" to further expand those areas.
Step 8: Conclusion
After having completed the project its clear that there is much that can be done on both the design and the program to improve or further develop. From a design perspective I would have preferred to keep the head gear more compact but there were factors to consider for example when viewing the screen it needed to have a certain minimum distance from the eyes to avoid any discomfort in the eyes. For this reason I had to position the screen a bit far which also increased the force on the head gear which was somewhat balanced by the placement of components further back.
Another factor which was considered was to attach a UPS to the Raspberry Pi to power it and after considering some modules available in the market that can provide a steady 3 A and the idea was dropped due to its weight. So an external power supply such as a power bank will be needed to power it.
On the programming front it can be further developed with more functions. The Raspberry Pi is a very good choice for an application like this as it has the capacity to do all the computing as opposed to a microcontroller, which brings me to point out the fact why I used an Arduino for controlling the peripherals instead of the GPIOs of the Raspberry Pi. For one a dedicated piece of hardware for precise timed operations would be best and that is where the Arduino excels.
Having documented this I hope someone benefited from my post here and was of some information. I welcome your opinions and ideas. Thanks!
Get the 3D Model Files
Participated in the
Wear It Contest
7 Comments
7 weeks ago
Great !
It make me think about exploration suit in anticipation movies.
Your weight problem can be resolved if you turn the screen horizontally.
Reply 6 weeks ago
Thanks for your idea. Sounds good and I guess that would require a 2 degree of motion mechanism for retraction of screen. Its something worth considering..😀
Reply 6 weeks ago
Yes, one solution could be 2 rotations, first to move back the screen, second to move it on the side or the back of the head. The other solution is to put it on a robotic arm.
Your development of Google vocal command is also really interesting!
Regards.
Reply 6 weeks ago
It might just work if I am able to squeeze in the motors and retraction mechanism on one side. Also perhaps get a smaller screen such as the MHS-3.5 inch display, but then again viewing on it might not be that easy on the eyes. Appreciate your ideas. 🤩
Reply 6 weeks ago
You're right, viewing on small screen need to estrange it, like a mobile phone.
Perhaps you can use two screens side by side, try with an angle between them.
Reply 6 weeks ago
Thats an interesting idea to have two small screens but I am not sure if the Raspberry pi would be able to interface with two screens. Ive tried one small screen and it takes up a bunch of GPIO pins since it the smaller screens does not support HDMI interface.
Reply 6 weeks ago
Then perhaps two mirrors and one screen. I think you can have relief if you put an angle between the two mirrors.