Click the Card below and follow the “CVer” public account
AI/CV heavy-duty information, delivered as soon as possible
Click to enter->[Target Detection and Transformer] Communication Group
New task: Relation Inversion
This year, diffusion model and related personalization work are becoming more and more popular, such as DreamBooth, Textual Inversion, Custom Diffusion, etc. This type of method can extract the concept of a specific object from the picture and add it to In the pre-trained text-to-image diffusion model, people can customize the generation of objects they are interested in, such as specific anime characters, sculptures at home, water cups, etc.
Existing customization methods mainly focus on capturing the appearance of objects. However, in addition to the appearance of objects, there is another important pillar of the visual world, which is the inextricable relationship between objects. Currently, no work has explored how to extract a specific relationship from images and apply this relationship to the generation task. To this end, we propose a new task: Relation Inversion.
As shown in the figure above, given several reference pictures, there is a coexisting relation in these reference pictures, such as “Object A is installed in object B”. The goal of Relation Inversion is to find a relation prompt to describe this interactive relationship, and It is used to generate new scenes so that the objects in them interact according to this relationship, such as putting Spider-Man in a basket.
Reply in the background of CVer WeChat public account: ReVersion, you can download the pdf and code of this paper
●Paper: arxiv.org/abs/2303.13495
●Code: github.com/ziqihuagg/ReVersion
●Homepage: ziqihuangg.github.io/projects/reversion.html
●Video: www.youtube.com/watch?v=pkal3yjyyKQ
●Demo:huggingface.co/spaces/Ziqi/ReVersion
ReVersion Framework
As a first attempt to address the Relation Inversion problem, we propose the ReVersion framework:
Compared with the existing Appearance Invesion task, the difficulty of the Relation Inversion task is how to tell the model that what we need to extract is the relatively abstract concept of relation, rather than aspects with significant visual features such as the appearance of the object.
We proposed a relation-focal importance sampling strategy to encourage more attention to high-level relations; at the same time, we designed relation-steering contrastive learning to guide more attention to relations rather than the appearance of objects. See the paper for more details.
ReVersion Benchmark
We collect and provide ReVersion Benchmark:
https://github.com/ziqihuangg/ReVersion#the-reversion-benchmark
It contains a rich variety of relationships, each relationship has multiple exemplar images and manually annotated text descriptions. We also provide a large number of inference templates for common relationships. You can use these inference templates to test whether the learned relationship prompts are accurate, and they can also be used to combine them to generate some interesting interactive scenarios.
Result display
-
Rich and diverse relationships
We can invert a rich variety of relations and apply them to new objects
-
Various backgrounds and styles
The relationship we get can also connect objects in different styles and background scenes in a specific way.
-
The same Relation, rich and diverse object combinations
Reply in the background of CVer WeChat public account: ReVersion, you can download the pdf and code of this paper
Click to enter-> [Target Detection and Transformer] Communication Group strong>
ICCV/CVPR 2023 paper and code download
Backend reply: CVPR2023, you can download the CVPR 2023 papers and code open source paper collection
Backend reply: ICCV2023, you can download the collection of ICCV 2023 papers and code open source papers
Target detection and Transformer exchange group established Scan the QR code below, or add WeChat: CVer333 to add CVer Assistant WeChat, and then apply to join the CVer-Target Detection or Transformer WeChat communication group. In addition, other vertical directions have been covered: target detection, image segmentation, target tracking, face detection & recognition, OCR, pose estimation, super-resolution, SLAM, medical imaging, Re-ID, GAN, NAS, depth estimation, automatic Driving, reinforcement learning, lane detection, model pruning & compression, denoising, fog removal, rain removal, style transfer, remote sensing images, behavior recognition, video understanding, image fusion, image retrieval, paper submission & communication , PyTorch, TensorFlow and Transformer, NeRF, etc. Be sure to note: Research direction + location + school/company + nickname (such as target detection or Transformer + Shanghai + hand in + Kaka). Note according to the format to be approved and invited to the group faster. ▲Scan the QR code or add WeChat ID: CVer333 to join the communication group CVer Computer Vision (Knowledge Planet) is here! If you want to learn about the latest, fastest and best CV/DL/AI paper express delivery, high-quality practical projects, AI industry cutting-edge, and learning tutorials from entry to mastery, please scan the QR code below and join CVer Computer Vision, which has gathered thousands of people! ▲Scan the code to enter the planet ▲Click on the card above to follow the CVer public account
It’s not easy to organize, please like and watch