Building on recent successes of adversarial and data-centric challenges for classification models, we identify a new competition for discovering failure modes in generative text-to-image models. These models, like DALL-E 2, Stable Diffusion or Mid Journey, have reached large audiences in the past year owing to their impressive and flexible generation abilities. With this increasing public visibility and wider adoption of text-to-image models, it is pertinent to understand the nature of the images that they produce and how unsafe, biased or violent outputs could inflict harm on end users at scale. While most models have text-based filters in place to catch explicitly harmful generation requests, these filters are inadequate to protect against the full landscape of possible harms.
It is the responsibility of the research and developer community to ensure that there are acceptable safeguards in place to curtail harmful outputs of text-to-image models. This should be the baseline for further advancing the competitive state-of-the-art in this area, ultimately protecting end users from exposure to unethical, harmful, biased or otherwise unsafe content.