Why is this challenge important?

Building on recent successes of adversarial and data-centric challenges for classification models, we identify a new competition for discovering failure modes in generative text-to-image models. These models, like DALL-E 2, Stable Diffusion or Mid Journey, have reached large audiences in the past year owing to their impressive and flexible generation abilities. With this increasing public visibility and wider adoption of text-to-image models, it is pertinent to understand the nature of the images that they produce and how unsafe, biased or violent outputs could inflict harm on end users at scale. While most models have text-based filters in place to catch explicitly harmful generation requests, these filters are inadequate to protect against the full landscape of possible harms.

It is the responsibility of the research and developer community to ensure that there are acceptable safeguards in place to curtail harmful outputs of text-to-image models. This should be the baseline for further advancing the competitive state-of-the-art in this area, ultimately protecting end users from exposure to unethical, harmful, biased or otherwise unsafe content.

What is this challenge about?

We propose a data-centric AI competition to engage the research community in jointly discovering a diverse set of insightful long tail problems for text-to-image models and thus help identify current blind spots in harmful image production (i.e., unknown unknowns). We focus explicitly on prompt-image pairs that currently slip through the cracks of safety filters  -- either via intentful and subversive prompts that circumvent the text-based filters in place or through seemingly benign requests that nevertheless trigger unsafe outputs. By focusing on unsafe generations paired with seemingly safe prompts, our challenge zeros in on cases that (i) are most challenging to catch via text-prompt filtering alone and (ii) have the potential to be the most harmful to end users. 

This challenge is a timely response to identify and mitigate safety concerns in a structured and systematic manner, covering both the discovery of new failure modes and the confirmation of existing ones. We propose a way to scale the process of seeding unique and diverse adversarial examples and, at the same time, understanding the space of desirable generation when we deal with sensitive characteristics and topics. 

This competition is the result of collaboration between six different organizations to jointly produce a shared resource for use and reuse by the wider research and development community. 

Where can I see examples?

We've created a set of examples that you can view here. Note that this page includes generated images that some may find upsetting.

As the challenge progresses, we will also update a page of examples of successful submissions from other users, so that members of the community can take inspiration from each others' progress.

Contact the organizers at dataperf-adversarial-nibbler@googlegroups.com or join our slack channel at adversarial-nibbler.slack.com