ChatGPT anymore. But did you know OpenAI has a forgotten little brother, especially after the release of sora?
Whisper is “an open-sourced neural net that approaches human level robustness and accuracy on English speech recognition”.
Some cool folks even build a python wrapper around the open-sourced model for easy, free and local use.
from pywhispercpp.examples.assistant import Assistant
def commands_callback(model_output):
print(f"user said: {model_output}")
# TODO: sentiment analysis
my_assistant = Assistant(
commands_callback=commands_callback,
n_threads=8)
my_assistant.start()
And just like that we have our speech-to-text working.
Now that we have the text, we need to determine the emotion of the words. Are they positive, negative, neutral or …?
A tool I have been wanting to play around with for a while now was Hugging Face.
Hugging face allows you and me, as mere mortals, to use very sophisticated open-sourced machine learning models.
In our case we will use a text-classification model to determine the emotions of the words.
from pywhispercpp.examples.assistant import Assistant
from transformers import pipeline
model = "j-hartmann/emotion-english-distilroberta-base"
classifier = pipeline("text-classification", model=model, return_all_scores=True)
def commands_callback(model_output):
print(f"user said: {model_output}")
print("feels like:")
for sentiment in classifier(model_output)[0]:
print(f"{sentiment['label']}: {sentiment['score']}")
# TODO: stream results
my_assistant = Assistant(
commands_callback=commands_callback,
n_threads=8)
my_assistant.start()
Like magic 🪄,
Reasonably accurate sentiment analysis with a few lines of code.
In architecture rooms this would be a hot-topic. Having multiple sessions to discuss the up- and down-sides of different streaming protocols. Calculating throughput needs, determine latency requirements and write-up reliability specifications.
But we are bodging things over here, we just need to send some data from one tool to another tool.
At the university we have set-up a MQTT broker to do just that.
Even though UDP messaging would have been a better fit for the job we used MQTT as it was already there, configured, and known to work.
from pywhispercpp.examples.assistant import Assistant
from transformers import pipeline
import paho.mqtt.client as mqtt
client = mqtt.Client("talking-to-water" + str(random.randint(0, 1000)))
client.username_pw_set("*****", "*****")
client.connect("*****")
client.loop_start()
model = "j-hartmann/emotion-english-distilroberta-base"
classifier = pipeline("text-classification", model=model, return_all_scores=True)
def commands_callback(model_output):
print(f"user said: {model_output}")
print("feels like:")
for sentiment in classifier(model_output)[0]:
print(f"{sentiment['label']} {sentiment['score']}")
client.publish(f"AuraMotions/{sentiment['label']}", sentiment['score'])
my_assistant = Assistant(
commands_callback=commands_callback,
n_threads=8)
my_assistant.start()
And thats it.
We can now talk to water.
From here on out the students could use emotions sent by the MQTT messages to create any (generated) visual representation they need.
An example of generative visuals in TouchDesigner using the MQTT messages from the sentiment analysis tool.
By standing on the shoulder of giants, we can bodge together our wildest imaginations.
Thank you random strangers on the internet ❤️.
Before going down the rabbit hole and I will lose you.
You can experience the project yourself or use it as the basis for your next bodge project!
🐇
Awesome, you made it this far.
Before you begin the next part, have a
While the code presented above worked on my machine, and probably works on your machine with some technical knowledge, the bodged solution is not without its flaws.
While the project runs fine when all tools and dependencies are available, it was breaking down when the students tried to run it on their own machines.
Either (the correct version of) python was not installed or dependencies, like ffmpeg
, were not available on the students’ machines.
The infamous works on my machine
Mismatching versions and missing dependencies is a common and solved problem in software development.
We make a Dockerfile
and ship it that way.
FROM python:3.11.7
# Get dependencies to make the container to work
RUN apt update && apt install -y ffmpeg alsa-utils pulseaudio pulseaudio-utils libportaudio2 libasound-dev nano && apt clean
# Install the required packages
WORKDIR /usr/src/app
RUN pip install --upgrade pip
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
# Install pywhispercpp repository
RUN git clone --recurse-submodules https://github.com/abdeladim-s/pywhispercpp.git
# Build and install pywhispercpp
WORKDIR /usr/src/app/pywhispercpp
RUN python -m build --wheel
RUN pip install dist/pywhispercpp-*.whl
# Copy the main.py file
WORKDIR /usr/src/app
COPY main.py ./
COPY src ./
# Run the project
CMD ["python3", "-u", "main.py"]
Voilà, packaging the whole project neatly in a docker container will solve all our problems right?
Right?!
New tool, new issue
While docker solved the problem of dependencies, it introduced a new problem.
Audio was not being captured by the container, at least not on macOS machines.
We don’t have the luxury of running docker with --device /dev/snd
as you would on a linux machine.
After some googling I found a tool called PulseAudio which could ”[…] transfer audio to a different machine […]“.
This could be to a machine on the other side of the room, building, city, world or to a docker container running on the same machine.
To make installing PulseAudio
as easy as possible for the students, I wrote a small bash script.
#!/bin/bash
if [ -z "$(brew -v)" ]; then
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
fi
brew install pulseaudio
pulseaudio_version=$(echo "$(pulseaudio --version)" | awk '{print $2}')
file="/opt/homebrew/Cellar/pulseaudio/$pulseaudio_version/etc/pulse/default.pa.d"
if ! test -e "$file"; then
touch "$file"
fi
echo "$(cat .config/pulse/pulseaudio.conf)" >> /opt/homebrew/Cellar/pulseaudio/$pulseaudio_version/etc/pulse/default.pa.d
brew services restart pulseaudio
sleep 5
pulseaudio --check -v # Make sure everything is working
So finally, the students (and you) can run the project with two simple commands:
./install-pulseaudio-for-mac.sh
docker run --net=host --privileged -e PULSE_SERVER=<HOST_IP> xiduzo/whisper-sentiment-analysis:latest
The concept and execution of the project were done by the following students from the Master Digital Design program.