Content pfp
Content
@
0 reply
0 recast
0 reaction

POV:Frank (d/acc) 🎩 πŸ’œ pfp
POV:Frank (d/acc) 🎩 πŸ’œ
@scalinglaw.eth
CVPR paper: Instruct-Imagen: Image Generation with Multi-modal Instruction Innovations: - Multi-modal instruction for image generation: A new format that uses natural language to combine different modalities (text, edge, style, subject, etc.) to articulate complex generation intents in a uniform way. - Two-stage training approach for Instruct-Imagen: a) Retrieval-augmented training: Adapts a pre-trained text-to-image model to handle multi-modal inputs using retrieved similar (image, text) pairs. b) Multi-modal instruction-tuning: Fine-tunes the adapted model on diverse image generation tasks paired with multi-modal instructions. - Unified model architecture that can handle various image generation tasks - through multi-modal instructions, without task-specific designs. - Zero-shot generalization capability to unseen and more complex image generation tasks. - Adaptability to new tasks through fine-tuning on small datasets. source: https://arxiv.org/abs/2401.01952
0 reply
0 recast
1 reaction