[Английский] Handtrack.js — реализовать распознавание и отслеживание рук в браузере на основе TensorFlow.js.

TensorFlow браузер Идентификация изображения
[Английский] Handtrack.js — реализовать распознавание и отслеживание рук в браузере на основе TensorFlow.js.
A screenshot from a demo built using Handtrack.js. Try the демо здесь.

A while ago, I was really blown away by results from an experiment using TensorFlow object detection api to track hands in an image. I made the trained model and source code available, and since then it has been used to prototype some rather interesting usecases (a tool to help kids spell, extensions to predict sign language, hand ping pong, etc). However, while many individuals wanted to experiment with the trained model, a large number still had issues setting up Tensorflow (installation, TF version issues, exporting graphs, etc). Luckily, Tensorflow.js addresses several of these installations/distribution issues, as it is optimized to run in the standardized environment of browsers. To this end, I created Handtrack.js as a library to allow developers quickly prototype hand/gesture interactions powered by a trained hand detection model.

Runtime: 22 FPS. On a Macbook Pro 2018, 2.2 Ghz, Chrome browser. 13 FPS on a Macbook Pro 2014 2.2GHz.
Вот пример интерфейса, созданного с помощью Handtrack.js для отслеживания рук с веб-камеры.демо здесь.

The goal of the library is to abstract away steps associated with loading the model files, provide helpful functions and allow a user detect hands in an image without any ML experience. You do not need to train a model (you can if you want). You do not need to export any frozen graphs or saved models. You can just get started by including handtrack.js in your web application (details below) and calling the library methods.

Interactive demo built using Handtrack.js here, and the source code on GitHub is here. Любите возиться в Codepen?handtrack.js example pen you can modify.

victordibia/handtrack.js
A library for prototyping realtime hand detection (bounding box), directly in the browser. - victordibia/handtrack.jsgithub.com

Как использовать его в веб-приложении?

You can use handtrack.js simply by including the library URL in a script tag or by importing it from npm using build tools.

Использование тега скрипта

The Handtrack.js minified js file is currently hosted using jsdelivr, a free open source cdn that lets you include any npm package in your web application.

<script src="https://cdn.jsdelivr.net/npm/handtrackjs/dist/handtrack.min.js"> </script>

Once the above script tag has been added to your html page, you can reference handtrack.js using the handTrack variable as follows.

const img = document.getElementById('img');  
handTrack.load().then(model => {
model.detect(img).then(predictions => {
console.log('Predictions: ', predictions) // bbox predictions
});
});

Приведенный выше фрагмент выводит предсказания ограничительной рамки для изображения, переданного с помощью тега img. Отправляя кадры из видео или камеры, вы можете затем:track” рук в каждом кадре (вам нужно будет сохранять состояние каждой руки по мере прохождения кадров).

Демонстрационный интерфейс с использованием handtrack.js для отслеживания рук на изображении. Вы можете использовать метод `renderPredictions()` для рисования обнаруженных ограничивающих рамок и исходного изображения в объекте холста.

Using NPM

You can install handtrack.js as an npm package using the following

npm install --save handtrackjs

An example of how you can import and use it in a React app is given below.

import * as handTrack from 'handtrackjs';

const img = document.getElementById('img');

// Load the model.
handTrack.load().then(model => {
// detect objects in the image.
console.log("model loaded")
model.detect(img).then(predictions => {
console.log('Predictions: ', predictions);
});
});
You can vary the confidence threshold (predictions below this value are discarded). Note: The model tends to work best with well lighted image conditions. The reader is encouraged to experiment with confidence threshold to accommodate various lighting conditions. E.g. a low lit scene will work better with a lower confidence threshold.

Handtrack.js API

Several methods are provided. The two main methods including the load() which loads a hand detection model and detect() method for getting predictions.

load() accepts optional model parameters that allow you control the performance of the model. This method loads a pretrained hand detection model in the web model format (also hosted via jsdelivr).

detect() accepts an input source parameter (a html img, video or canvas object) and returns bounding box predictions on the location of hands in the image.

const modelParams = {
flipHorizontal: true, // flip e.g for video
imageScaleFactor: 0.7, // reduce input image size .
maxNumBoxes: 20, // maximum number of boxes to detect
iouThreshold: 0.5, // ioU threshold for non-max suppression
scoreThreshold: 0.79, // confidence threshold for predictions.
}
const img = document.getElementById('img');
handTrack.load(modelParams).then(model => {
model.detect(img).then(predictions => {
console.log('Predictions: ', predictions);
});
});

prediction results are of the form

[{
bbox: [x, y, width, height],
class: "hand",
score: 0.8380282521247864
}, {
bbox: [x, y, width, height],
class: "hand",
score: 0.74644153267145157
}]

Other helper methods are also provided

  • model.getFPS(): получить FPS, рассчитанный как количество обнаружений в секунду.
  • model.renderPredictions(predictions, canvas, context, mediasource): draw bounding box (and the input mediasource image) on the specified canvas.
  • model.getModelParameters(): returns model parameters.
  • model.setModelParameters(modelParams): updates model parameters.
  • dispose(): Удалить экземпляр модели
  • startVideo(video): запустить видеопоток с камеры для заданного видеоэлемента.Возвращает обещание, которое можно использовать для проверки того, предоставил ли пользователь разрешение на видео.
  • stopVideo(video): остановить видеопоток.

Размер библиотеки и размер модели

  • размер библиотеки — 810 КБ, главным образом потому, что он связан с библиотекой tensorflow.js (есть некоторые открытые проблемы с последними версиями, которые ломают библиотеку).
  • Модели  —  18,5 МБ. Это вызывает начальное ожидание при загрузке страницы. Веб-модели TF.js обычно разбиваются на несколько файлов (в данном случае четыре файла по 4,2 МБ и один файл по 1,7 МБ).

Как это работает

Underneath, Handtrack.js uses the Tensorflow.js library — гибкий и интуитивно понятный API для построения и обучения моделей с нуля в браузере. Библиотека линейной алгебры JavaScript и высокоуровневый API слоев.

Creating the Handtrack.js Library

Steps in creating a Tensorflow.js -based JavaScript Library.

Data Assembly

The data used in this project is primarily from the Egohands dataset. This consists of 4800 images of the human hand with bounding box annotations in various settings (indoor, outdoor), captured using a Google glass device.

Model Training

A model is trained to detect hands using the Tensorflow Object Detection API. For this project, a Single Shot MultiBox Detector (SSD) was used with the MobileNetV2 Architecture. Results from the trained model were then exported as a savedmodel, Дополнительные сведения о том, как обучалась модель, можно найтиfound here and on the Tensorflow Object Detection API github repo.

Model Conversion

Tensorflow.js provides a model conversion tool that allows you convert a savedmodel trained in Tensorflow python to the Tensorflow.js webmodel format that can be loaded in the browser. This process is mainly around mapping operations in Tensorflow python to their equivalent implementation in Tensorflow.js. It makes sense to inspect the saved model graph to understand what is being exported. Finally, I followed the suggestion by authors of the Tensorflow coco-ssd example [2] in removing the post processing part of the object detection model graph during conversion. This optimization effectively doubled the runtime for the detection/prediction operation in the browser.

Обертка библиотеки и хостинг

The library was modeled after the tensorflowjs coco-ssd example (but not written in typescript). It consists of a main class with methods to load the model, detect hands in an image, and a set of other helpful functions e.g. startVideo, stopVideo, getFPS(), renderPredictions(), getModelParameters(), setModelParameters()etc. A full description of methods are on Github.

The source file is then bundled using rollup.js, and published (with the webmodel files) on npm. This is particularly valuable as jsdelivr automatically provides a cdn for npm packages. (It might be the case that hosting the file on other CDNs might be faster and the reader is encouraged to try out other methods). At the moment handtrackjs is bundled with tensorflowjs (v0.13.5) mainly because as at the time of writing this library, there were version issues where tfjs (v0.15) had datatype errors loading image/video tags as tensors. As new versions fix this issue, it will be updated.

When Should I Use Handtrack.js

If you are interested in prototyping gesture based (body as input) interactive experiences, Handtrack.js can be useful. The user does not need to attach any additional sensors or hardware but can immediately take advantage of engagement benefits that result from gesture based/body-as-input interactions.

Прототип простого взаимодействия «тело как ввод», созданного с помощью Handtrack.js, когда пользователь рисует на холсте, используя отслеживаемое местоположение своей руки. В этом взаимодействии значение maxNumber ofDetections modelParameter установлено равным 1, чтобы гарантировать, что только одна рука отслеживается.

Some (not all) relevant scenarios are listed below:

  • When mouse motion can be mapped to hand motion for control purposes.
  • When an overlap of hand and other objects can represent meaningful interaction signals (e.g a touch or selection event for an object).
  • Scenarios where the human hand motion can be a proxy for activity recognition (e.g. automatically tracking movement activity from a video or images of individuals playing chess, or tracking a persons golf swing). Or simply counting how many humans are present in an image or video frame.
  • Interactive art installations. Could be a fun set of controls for interactive art installations.
  • Teaching others about ML/AI. The handtrack.js libary provides a valuable interface to demonstrate how changes in the model parameters (confidence threshold, IoU threshold, image size etc) can affect detection results.
  • You want an accessible demonstration that anyone can easily run or tryout with minimal setup.
Тело в качестве входных данных на большом дисплее. Результаты Handtrack.js (примененные к веб-камере) можно сопоставить с элементами управления в игре.

Limitations

  • Browsers are single threaded: What this means is that care must be taken to ensure prediction operations do not block the UI thread. Each prediction can take between 50 and 150ms which becomes noticeable to a user. For example when integrating Handtrack.js in an application where the entire screen is rendered (e.g. in a game) many times per second, I found it useful to reduce the number of predictions requested per second.
  • Hands are tracked on a frame by frame basis: Если вы заинтересованы в идентификации рук в кадрах, вам потребуется написать дополнительный код, чтобы вывести идентификаторы обнаруженных рук, когда они входят, перемещаются и покидают последовательные кадры. состояние местоположения каждого прогноза (и евклидово расстояние)across each frame can help.
  • Incorrect predictions: There will be the occasional incorrect prediction (sometimes a face is detected as a hand). I found that each camera and lighting condition needed different settings for the model parameters (especially confidence thresholds) to get good detection. More importantly, this can be improved with additional data.

I really look forward to how others who use or extend this project solve some of these limitations.

Whats Next?

Handtrack.js represents really early steps with respect to the overall potential in enabling new forms of human computer interaction with AI. In the browser. Already, there have been excellent ideas such as posenet for human pose detection, and handsfree.js for facial expression detection in the browser.

Above all, the reader is invited to imagine. Imagine interesting use cases where knowing the location of a users hand can make for more engaging interactions.

In the meantime, I will be spending more time on the following

  • Better handmodel: Creating a robust benchmark to evaluate the underlying hand model. Collecting additional data that improves accuracy and robustness metrics.
  • Additional Vocabulary: As I worked through building the samples, one thing that becomes apparent is the limited vocabulary of this interaction method. There is clearly a need to support atleast one more state. Perhaps a fist and an open hand. This will mean re-labelling the dataset (or some semi supervised approaches).
  • Дополнительное квантование модели: прямо сейчас мы используем самую быструю модель с точки зрения размера и точности архитектуры — MobilenetV2, SSD.can make things even faster? Any ideas or contributions here are welcome.

If you would like to discuss this in more detail, feel free to reach out on Twitter, Github or Linkedin. Many thanks to Kesa Oluwafunmilola who helped with proof reading this article.

References

  • [1] Сандлер, Марк и др. «Mobilenetv2: инвертированные остатки и линейные узкие места», Материалы конференции IEEE по компьютерному зрению и распознаванию образов, 2018 г.АР Вест V.org/ABS/1801.04…
  • [2] Tensorflow.js Coco-ssd example.
    This library uses code and guidance from the Tensorflow.js coco-ssd example which provides a library for object detection trained on the MSCOCO dataset. The optimizations suggested in the repo (stripping out a post processing layer) was really helpful (2x speedup).