Zero-Shot Image Classification

Enter comma-separated labels

Drag and drop an image or click to upload

How Zero-Shot Classification Works

CLIP performs zero-shot classification by mapping both images and text descriptions into the same vector space.

It learns this shared space by training on many image–caption pairs, so that images and the sentences describing them end up near each other.

At test time, each class is turned into a short prompt like "a photo of a cat", embedded, and compared to the image's vector to find the highest similarity.

Zero Shot Image Classification using Transformers.js

Results will be displayed here