Zero Shot Image Classification using Transformers.js

Upload an image and enter labels to classify it

How Zero-Shot Classification Works

CLIP performs zero-shot classification by mapping both images and text descriptions into the same vector space.

It learns this shared space by training on many image–caption pairs, so that images and the sentences describing them end up near each other.

At test time, each class is turned into a short prompt like "a photo of a cat", embedded, and compared to the image's vector to find the highest similarity.

Results will be displayed here

Get instant classification results