
Imagen: Text-to-Image Diffusion Models
To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models. With DrawBench, we compare Imagen with recent methods including VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons ...
Imagen Editor & EditBench
Editing Flow. The input to Imagen Editor is a masked image and a text prompt, the output is an image with the unmasked areas untouched and the masked areas filled-in.The edits are faithful to input text prompts, while consistent with input images:
Imagen: Text-to-Image Diffusion Models
Imagen: Text-to-Image Diffusion Models
Imagen Video
Imagen Video generates high resolution videos with Cascaded Diffusion Models.The first step is to take an input text prompt and encode it into textual embeddings with a T5 text encoder. A base Video Diffusion Model then generates a 16 frame video at 40×24 resolution and 3 frames per second; this is then followed by multiple Temporal Super-Resolution (TSR) and Spatial Super …