Atif Quamar
Atif Quamar
Home
Experience
Publications
Projects
Contact
Light
Dark
Automatic
Reinforcement Learning
Learning Modal-Mixed Chain-of-Thought Reasoning with Latent Embeddings
Modal-mixed chain-of-thought lets a VLM interleave text with compact latent visual “sketches”, using a diffusion-based latent decoder with SFT+RL training to boost vision-intensive reasoning while adding only modest inference overhead.
Cite
×