[NeurIPS 2025] Ranking-based Preference Optimization
for Diffusion Models from Implicit User Feedback
We present a learning framework that aligns text-to-image diffusion models with human preferences through inverse reinforcement learning and a balance of offline and online training.
Usage
import torch
from diffusers import StableDiffusionPipeline, UNet2DConditionModel
unet = UNet2DConditionModel.from_pretrained(
"ylwu/diffusion-dro-sd1.5",
subfolder="unet",
torch_dtype=torch.bfloat16
).to('cuda')
pipe = StableDiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
unet=unet,
torch_dtype=torch.bfloat16
).to('cuda')
prompt = "A new artwork depicting Pikachu as a superhero fighting villains with dramatic lightning"
image = pipe(prompt).images[0]
image.save("example.png")
Citation
@misc{wu2025rankingbasedpreferenceoptimizationdiffusion,
title={Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback},
author={Yi-Lun Wu and Bo-Kai Ruan and Chiang Tseng and Hong-Han Shuai},
year={2025},
eprint={2510.18353},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.18353},
}
License
The model is licensed under the CreativeML Open RAIL-M License.
- Downloads last month
- -
Model tree for ylwu/diffusion-dro-sd1.5
Base model
stable-diffusion-v1-5/stable-diffusion-v1-5