New visual mission! ReVersion: Relation customization in image generation

Click the Card below and follow the “CVer” public account

AI/CV heavy-duty information, delivered as soon as possible

Click to enter->[Target Detection and Transformer] Communication Group

New task: Relation Inversion

This year, diffusion model and related personalization work are becoming more and more popular, such as DreamBooth, Textual Inversion, Custom Diffusion, etc. This type of method can extract the concept of a specific object from the picture and add it to In the pre-trained text-to-image diffusion model, people can customize the generation of objects they are interested in, such as specific anime characters, sculptures at home, water cups, etc.

Existing customization methods mainly focus on capturing the appearance of objects. However, in addition to the appearance of objects, there is another important pillar of the visual world, which is the inextricable relationship between objects. Currently, no work has explored how to extract a specific relationship from images and apply this relationship to the generation task. To this end, we propose a new task: Relation Inversion.


As shown in the figure above, given several reference pictures, there is a coexisting relation in these reference pictures, such as “Object A is installed in object B”. The goal of Relation Inversion is to find a relation prompt to describe this interactive relationship, and It is used to generate new scenes so that the objects in them interact according to this relationship, such as putting Spider-Man in a basket.


Reply in the background of CVer WeChat public account: ReVersion, you can download the pdf and code of this paper







ReVersion Framework

As a first attempt to address the Relation Inversion problem, we propose the ReVersion framework:


Compared with the existing Appearance Invesion task, the difficulty of the Relation Inversion task is how to tell the model that what we need to extract is the relatively abstract concept of relation, rather than aspects with significant visual features such as the appearance of the object.

We proposed a relation-focal importance sampling strategy to encourage more attention to high-level relations; at the same time, we designed relation-steering contrastive learning to guide more attention to relations rather than the appearance of objects. See the paper for more details.

ReVersion Benchmark

We collect and provide ReVersion Benchmark:

It contains a rich variety of relationships, each relationship has multiple exemplar images and manually annotated text descriptions. We also provide a large number of inference templates for common relationships. You can use these inference templates to test whether the learned relationship prompts are accurate, and they can also be used to combine them to generate some interesting interactive scenarios.

Result display

  • Rich and diverse relationships

We can invert a rich variety of relations and apply them to new objects

20a848f152523a6d7509b2e3ea164d81.jpeg fadf697cb758183f9ede8eff2789dfbc.jpeg

  • Various backgrounds and styles

The relationship we get can also connect objects in different styles and background scenes in a specific way.


  • The same Relation, rich and diverse object combinations


Reply in the background of CVer WeChat public account: ReVersion, you can download the pdf and code of this paper

Click to enter-> [Target Detection and Transformer] Communication Group

ICCV/CVPR 2023 paper and code download

Backend reply: CVPR2023, you can download the CVPR 2023 papers and code open source paper collection

Backend reply: ICCV2023, you can download the collection of ICCV 2023 papers and code open source papers
Target detection and Transformer exchange group established
Scan the QR code below, or add WeChat: CVer333 to add CVer Assistant WeChat, and then apply to join the CVer-Target Detection or Transformer WeChat communication group. In addition, other vertical directions have been covered: target detection, image segmentation, target tracking, face detection & recognition, OCR, pose estimation, super-resolution, SLAM, medical imaging, Re-ID, GAN, NAS, depth estimation, automatic Driving, reinforcement learning, lane detection, model pruning & compression, denoising, fog removal, rain removal, style transfer, remote sensing images, behavior recognition, video understanding, image fusion, image retrieval, paper submission & communication , PyTorch, TensorFlow and Transformer, NeRF, etc.
Be sure to note: Research direction + location + school/company + nickname (such as target detection or Transformer + Shanghai + hand in + Kaka). Note according to the format to be approved and invited to the group faster.

▲Scan the QR code or add WeChat ID: CVer333 to join the communication group
CVer Computer Vision (Knowledge Planet) is here! If you want to learn about the latest, fastest and best CV/DL/AI paper express delivery, high-quality practical projects, AI industry cutting-edge, and learning tutorials from entry to mastery, please scan the QR code below and join CVer Computer Vision, which has gathered thousands of people!

▲Scan the code to enter the planet
▲Click on the card above to follow the CVer public account

It’s not easy to organize, please like and watch582b43ef0e53ccbda776bd4d5748e41a.gif