Descripción
CLIP can predict the most relevant text snippet, given an image, without directly optimizing for the task (zero-shot capabilities).
- Información general
CLIP can predict the most relevant text snippet, given an image, without directly optimizing for the task (zero-shot capabilities).