Creating Your own Graphically Interfaced Voice assistant for your car using Raspberry pi and Python
As we know Python is a suitable language for script writers and developers. Let’s write a script for Personal Voice Assistant using Python. The query for the assistant can be manipulated as per the user’s need.
The implemented assistant can open up the application (if it’s installed in the system), search Google, Wikipedia and YouTube about the query, calculate any mathematical question, etc by just giving the voice command. We can process the data as per the need or can add the functionality, depends upon how we code things.
We are using Google speech recognition API and google text to speech for voice input and output respectively.
Also, for calculating mathematical expression WolframAlpha API can be used.
Playsound Package is used to play the saved mp3 sound from the system.
We are also going to use a Raspberry pi 3b, 4b, or the Pi zero as the brains of the operations.
If you already haven't installed pycharm, you can go and install it on your raspberry pi. Once you fully install it, set it up and get going.
The next step is to install all the required packages as shown.
Python external Package Requirements:
-> gTTS — Google Text To Speech, for converting the given text to speech
-> speech_recognition — for recognising the voice command and converting to text
-> selenium — for web based work from browser
-> wolframalpha — for calculation given by user
-> playsound — for playing the saved audio file.
-> pyaudio — for voice engine in python
Well, let’s get started with code. We will divide each function as a single code for easy understanding.
Here’s the main function, with get_audio()
and assistant_speaks
function. get_audio()
function is created to get the audio from user using microphone, the phrase limit is set to 5 seconds (you can change it). Assistant speaks function is created to provide the output according to the processed data.
# importing speech recognition package from google api
import
speech_recognition as sr
import
playsound # to play saved mp3 file
from
gtts import
gTTS # google text to speech
import
os # to save/open files
import
wolframalpha # to calculate strings into formula
from
selenium import
webdriver # to control browser operations
num =
1
def
assistant_speaks(output):
global
num
# num to rename every audio file
# with different name to remove ambiguity
num +=
1
print("PerSon : ", output)
toSpeak =
gTTS(text =
output, lang ='en', slow =
False)
# saving the audio file given by google text to speech
file
=
str(num)+".mp3
toSpeak.save(file)
# playsound package is used to play the same file.
playsound.playsound(file, True)
os.remove(file)
def
get_audio():
rObject =
sr.Recognizer()
audio =
''
with sr.Microphone() as source:
print("Speak...")
# recording the audio using speech recognition
audio =
rObject.listen(source, phrase_time_limit =
5)
print("Stop.") # limit 5 secs
try:
text =
rObject.recognize_google(audio, language ='en-US')
print("You : ", text)
return
text
except:
assistant_speaks("Could not understand your audio, PLease try again !")
return
0
# Driver Code
if
__name__ ==
"__main__":
assistant_speaks("What's your name, Human?")
name ='Human'
name =
get_audio()
assistant_speaks("Hello, "
+
name +
'.')
while(1):
assistant_speaks("What can i do for you?")
text =
get_audio().lower()
if
text ==
0:
continue
if
"exit"
in
str(text) or
"bye"
in
str(text) or
"sleep"
in
str(text):
assistant_speaks("Ok bye, "+
name+'.')
break
# calling process text to process the query
process_text(text)
So, we have got an idea here how we are giving voice to the machine and take input from user. The next step and the main step is how you want to process your input. This is just basic code, there is a lot of other algorithms(NLP) can be used to process the text in a proper manner. We have made it static.
Also,
Wolframalpha api
has been used to calculate the calculations part.
def
process_text(input):
try:
if
'search'
in
input
or
'play'
in
input:
# a basic web crawler using selenium
search_web(input)
return
elif
"who are you"
in
input
or
"define yourself"
in
input:
speak =
'''Hello, I am Person. Your personal Assistant.
I am here to make your life easier. You can command me to perform
various tasks such as calculating sums or opening applications etcetra'''
assistant_speaks(speak)
return
elif
"who made you"
in
input
or
"created you"
in
input:
speak =
"I have been created by Archit Sakri."
assistant_speaks(speak)
return
elif
"medium.com"
in
input:# just
speak =
""" medium.com is the Best Online site to publish and read papers."""
assistant_speaks(speak)
return
elif
"calculate"
in
input.lower():
# write your wolframalpha app_id here
app_id =
"WOLFRAMALPHA_APP_ID"
client =
wolframalpha.Client(app_id)
indx =
input.lower().split().index('calculate')
query =
input.split()[indx +
1:]
res =
client.query(' '.join(query))
answer =
next(res.results).text
assistant_speaks("The answer is "
+
answer)
return
elif
'open'
in
input:
# another function to open
# different application availaible
open_application(input.lower())
return
else:
assistant_speaks("I can search the web for you, Do you want to continue?")
ans =
get_audio()
if
'yes'
in
str(ans) or
'yeah'
in
str(ans):
search_web(input)
else:
return
except
:
assistant_speaks("I don't understand, I can search the web for you, Do you want to continue?")
ans =
get_audio()
if
'yes'
in
str(ans) or
'yeah'
in
str(ans):
search_web(input)
Now we have processed the input, it’s time for action!
There are two functions included that is search_web
and open_application
.
search_web is just a web crawler which uses selenium package to process. It can search google, wikipedia and can open YouTube. You just have to say include the name and it will open it in the Firefox browser. For other browsers, you need to install a proper browser package in selenium. Here we are using webdriver for Firefox.
open_application is just a function uses os package to open the application present in the system.
def
search_web(input):
driver =
webdriver.Firefox()
driver.implicitly_wait(1)
driver.maximize_window()
if
'youtube'
in
input.lower():
assistant_speaks("Opening in youtube")
indx =
input.lower().split().index('youtube')
query =
input.split()[indx +
1:]
driver.get("http://www.youtube.com/results?search_query ="
+
'+'.join(query))
return
elif
'wikipedia'
in
input.lower():
assistant_speaks("Opening Wikipedia")
indx =
input.lower().split().index('wikipedia')
query =
input.split()[indx +
1:]
driver.get("https://en.wikipedia.org/wiki/"
+
'_'.join(query))
return
else:
if
'google'
in
input:
indx =
input.lower().split().index('google')
query =
input.split()[indx +
1:]
driver.get("https://www.google.com/search?q ="
+
'+'.join(query))
elif
'search'
in
input:
indx =
input.lower().split().index('google')
query =
input.split()[indx +
1:]
driver.get("https://www.google.com/search?q ="
+
'+'.join(query))
else:
driver.get("https://www.google.com/search?q ="
+
'+'.join(input.split()))
return
# function used to open application
# present inside the system.
def
open_application(input):
if
"chrome"
in
input:
assistant_speaks("Google Chrome")
os.startfile('C:\Program Files (x86)\Google\Chrome\Application\chrome.exe')
return
elif
"firefox"
in
input
or
"mozilla"
in
input:
assistant_speaks("Opening Mozilla Firefox")
os.startfile('C:\Program Files\Mozilla Firefox\\firefox.exe')
return
elif
"word"
in
input:
assistant_speaks("Opening Microsoft Word")
os.startfile('C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Microsoft Office 2013\\Word 2013.lnk')
return
elif
"excel"
in
input:
assistant_speaks("Opening Microsoft Excel")
os.startfile('C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Microsoft Office 2013\\Excel 2013.lnk')
return
else:
assistant_speaks("Application not available")
return
Once you compiled it, you should get this, and then you are ready to go.
The second stage to this is when you unlock your car and the car is turned on, the program should run by itself.
To do this:
- Write your program and note down its location. We’ll be using a program called py_test.py and save it at /home/pi/Desktop/pyprog
- Now open crontab. You may need to open crontab in root (add sudo before the command!).
sudo crontab -e
3.Add a new entry at the very bottom with @reboot to specify that you want to run the command at boot, followed by the command. Here we want to run the python program and save the output in log.txt, so our entry is
@reboot sudo python /home/pi/Desktop/pyprog/pytest.py
/home/pi/Desktop/pyprog/log.txt
4.Now save the file and exit.
When you restart the pi, the command will be run and we will get the output log file.
If you are having trouble check your bootlog:
grep cron /var/log/syslog
After this, configure your bluetooth by:
- From the Raspberry Pi desktop, open a new Terminal window.
- Type sudo bluetoothctl then press enter and input the administrator password (the default password is raspberry).
- Next, enter agent on and press enter. Then type default-agent and press enter.
- Type scan on and press enter one more time. The unique addresses of all the Bluetooth devices around the Raspberry Pi will appear and look something like an alphanumeric XX:XX:XX:XX:XX:XX. If you make the device you want to pair discoverable (or put it into pairing mode), the device nickname may appear to the right of the address. If not, you will have to do a little trial and error or waiting to find the correct device.
- To pair the device, type pair [device Bluetooth address]. The command will look something like pair XX:XX:XX:XX:XX:XX.
When you are pairing your car speaker system, you will need to enter a six-digit string of numbers. You will see that the device has been paired, but it may not have connected. To connect the device, type connect XX:XX:XX:XX:XX:XX.
Graphical Interface
With this voice assistant, having a GUI(Graphical User Interface) is very good but not necessary. But for some functions, you would want a GUI. Say for example you ask the assistant to open youtube or If you want to be navigated to a particular destination, how will we do it?
You have to connect Raspberry Pi with screen and open the command line to In order to check the IP. Enter ifconfig or hostname -I. Keep the IP!
ifconfig or hostname -I
- First we need to set up a SSH client
SSH (Secure Socket Shell) client needs to be installed on your device. For example, Serverauditor is available both for Android and iOS devices.
2.VNC Server Installation and setting
You can do this either on your tablet as now your SSH is available. Of course, you can do this on your Raspberry Pi.
3.Install and Set up VNC Viewer app on your phone
Now you need to install VNC viewer app on your device. Many similar apps are available. The most famous one is the app by Real VNC. Once you install a VNC app, you need to set it up. You will need to enter Raspberry Pi IP followed by “:1” which is a display number. Also you will required VNC server password that you create in the previous step.
Now, we are finally done, and we can install this system in our car.
Here are some of the examples and output, which can help you understand how the above processing works.
1. Say "Search google medium.com"
2. Say "Play Youtube your favourite song"
3. Say "Wikipedia Ronaldo"
4. Say "Open Microsoft Word"
5. Say "Calculate anything you want"
In all the above cases, it will give do what is told. If the assistant can’t understand what is told it will ask you to google search it. For the thing which assistant can’t do is handled by this assistant.
Below are some screenshots for the talk between human and the assistant through my SSH(Secure Shell).
Well, that’s it. The above functionality can be coded in many ways, this is a basic implementation.
If you liked this, go to my youtube channel and please subscribe through this link :