Upload an image and enter labels to classify it
CLIP performs zero-shot classification by mapping both images and text descriptions into the same vector space.
It learns this shared space by training on many image–caption pairs, so that images and the sentences describing them end up near each other.
At test time, each class is turned into a short prompt like "a photo of a cat", embedded, and compared to the image's vector to find the highest similarity.
Get instant classification results