Recent posts
🤖🤖🤖🌲🌲🌲

Fixed vision encoders like DINO have driven impressive progress in more learnable representations for generative modeling - but there is no universal variant across modalities, and they do not scale with the generative model. We introduce our self-supervised framework, Self-Flow, that builds learnability directly into flow models, working in a unified and scalable way across image, video and audio. Particularly excited about the gains on video-action prediction: Beyond the overall success rate improving substantially, more complex tasks - like "Open and Place" - see some of the clearest gains. So many interesting research questions to explore to make 🤖 go brrr Super glad to be working with my amazing colleagues @hila_chefer, Dominik, @dustin_podell, Vikash, @Vinh_Suhi, Antonio and @robrombach - as well as the whole @bfl_ml team! arxiv: https://t.co/eP7ip58Tff project page: https://t.co/GNShpBMEQ1


New paper out! We present a training method for multimodal generative models, called Self-Flow, which combines classic flow matching and representation learning. Why? Unlike most representation alignment methods, our new approach does not require external, pretrained models and thus scales gracefully to joint multimodal training on images, videos and audio. How? It combines per-timestep flow matching with dual-timestep representation learning, improving the models' internal representations. This approach outperforms prior methods and shows promising scaling behavior in multimodal pretraining. It also enables downstream applications such as action prediction for embodied AI. webpage+paper: https://t.co/qzGQGj8JYk code: https://t.co/edhfdVEqSf Credit to @hila_chefer, @pess_r, Dominik, @dustin_podell, Vikash, @Vinh_Suhi and Antonio. If you enjoy doing open research like this, come and join BFL! We are actively hiring🌲

Hey FLUX, create a gif using my profile pic. make no mistakes bitte

BFL Skills Packaged FLUX into a single install command for agents. Install once. Your coding agent handles the rest - model selection, prompting, API integration. All built in. Sub-second generation and editing with [klein]. Highest quality with [max]. Text rendering with [flex] Works in Claude Code, Cursor, and other IDEs. > npx skills add black-forest-labs/skills



