ml5.soundClassifier() enables you to detect pre-trained voice and speech commands

The ml5.soundClassifier() allows you to classify audio. With the right pre-trained models, you can detect whether a certain noise was made (e.g. a clapping sound or a whistle) or a certain word was said (e.g. Up, Down, Yes, No). At this moment, with the ml5.soundClassifier(), you can use your own custom pre-trained speech commands or use the the "SpeechCommands18w" which can recognize "the ten digits from "zero" to "nine", "up", "down", "left", "right", "go", "stop", "yes", "no", as well as the additional categories of "unknown word" and "background noise"."

For more information read here


// Options for the SpeechCommands18w model, the default probabilityThreshold is 0
const options = { probabilityThreshold: 0.7 };
const classifier = ml5.soundClassifier('SpeechCommands18w', options, modelReady);

function modelReady() {
  // segment the image given

function gotImage(error, result) {
  if (error) {
  // log the result


ml5.soundClassifier(?model, ?options, ?callback)

By default the soundClassifier will start the default microphone.


  • model - Optional. Model name or URL path to a model.json
  • callback - Optional. A function to run once the model has been loaded.
  • options - Optional. An object describing a model accuracy and performance. The available parameters are:
 probabilityThreshold: 0.7 // probabilityThreshold is 0



The model



Returns an array with "label" and "confidence"

  • input - an HTMLImageElement. Videos should be added in the constructor.
  • options - Object. You can change the outputStride and segmentationThreshold
  • callback - A function to handle the results of ".segment()". Likely a function to do something with the segmented image.