[NeurIPS 2025] Ranking-based Preference Optimization
for Diffusion Models from Implicit User Feedback

We present a learning framework that aligns text-to-image diffusion models with human preferences through inverse reinforcement learning and a balance of offline and online training.

Usage

import torch
from diffusers import StableDiffusionPipeline, UNet2DConditionModel

unet = UNet2DConditionModel.from_pretrained(
    "ylwu/diffusion-dro-sd1.5",
    subfolder="unet",
    torch_dtype=torch.bfloat16
).to('cuda')

pipe = StableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    unet=unet,
    torch_dtype=torch.bfloat16
).to('cuda')

prompt = "A new artwork depicting Pikachu as a superhero fighting villains with dramatic lightning"
image = pipe(prompt).images[0]
image.save("example.png")

Citation

@misc{wu2025rankingbasedpreferenceoptimizationdiffusion,
      title={Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback},
      author={Yi-Lun Wu and Bo-Kai Ruan and Chiang Tseng and Hong-Han Shuai},
      year={2025},
      eprint={2510.18353},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.18353},
}

License

The model is licensed under the CreativeML Open RAIL-M License.

Downloads last month: -

Model tree for ylwu/diffusion-dro-sd1.5

Base model

stable-diffusion-v1-5/stable-diffusion-v1-5

Finetuned

(323)

this model

[NeurIPS 2025] Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback

Usage

Citation

License

Model tree for ylwu/diffusion-dro-sd1.5

[NeurIPS 2025] Ranking-based Preference Optimization
for Diffusion Models from Implicit User Feedback