“GitHub Project Circle 01” A project to build an AI digital human is open source! Automatically realize audio and video synchronization!

Starting this week, Xiaoquan officially launches the “GitHub Project Selection Weekly” plan, which selects some fun, interesting, practical, and advanced open source projects that Xiaoquan has encountered or recommended by big players in the industry, and shares them in the form of weekly articles. For everyone to watch and learn, hoping to learn and make progress together.

This week’s GitHub project circle selection****: Mainly includes audio and video synchronization, multi-agent framework, 3D object model, presentation document construction tools suitable for developers, and web programs Popular projects such as packaging and self-building CRUD applications.

1. Video-Tetalking

Video-ReTalking is an innovative technology that uses AI to synchronize the mouth shape of video characters with the input voice. To put it simply, you input any video and an audio file, and in the generated new video, the character’s mouth shape will be synchronized with the audio.

It not only synchronizes the mouth shape with the voice, but also changes the expression of the characters in the video based on the voice. The entire process does not require user intervention and is completed automatically. Many bloggers have built their own AI digital people through this project. Isn’t it interesting?

Number of Stars: 2.5K +

Online experience: https://colab.research.google.com/github/vinthony/video-retalking/blob/main/quick_demo.ipynb

Project address: https://github.com/OpenTalker/video-retalking

To achieve this goal, the system is broken down into three consecutive tasks:

  • Face video generation with canonical expressions;

  • Audio-driven lip sync;

  • Facial enhancement to improve photo realism.

Given a video of a head talking, first use the Expression Editing Network to modify the expression of each frame according to the same expression template, thereby generating a video with canonical expressions.

This video is then input to the lip-sync network along with the given audio to generate a lip-sync video.

Finally, the photorealism of the synthesized faces is improved through an identity-aware face enhancement network and post-processing.

All three steps above use a learning-based approach, and all modules can be processed in a sequential pipeline without any user intervention.

2、Autogen

Autogen is a multi-agent framework developed by the Microsoft team. It can be used to easily customize a series of work tasks. It is simply too powerful.

Number of Stars: 14.2K +

Project address: https://github.com/microsoft/autogen

For example, we want to implement a crawler program to capture and save web page images. If you throw this task to ChatGPT, it will directly return a string of executable code, but the code usually has problems, such as execution errors, missing dependencies, etc. You need to follow ChatGPT repeatedly. code> dialogue to improve the program.

Of course, we can also set a complex Prompt and ask it to call the code execution plug-in of ChatGPT. If there is an error, continue to correct the program. (Case source: Barret Li Jing)

But if this task is handed over to AutoGen, it will become extremely simple. It can be done in just a few lines of code:

  • Define an Assistant Agent whose task is to solve the problem

  • Define a UserProxy Agent, whose task is to ask questions on behalf of people and execute the program locally.

Neither Agent needs to have prompts set for them. When we hand over the crawler task to UserProxy, it will understand the task, and then ask the Assistant what to do. The Assistant will tell the UserProxy the operation process, and then the UserProxy will install the dependencies locally according to the instructions, and then create the file Execute the code. If an error occurs during execution, it will submit a detailed error report to the Assistant and loop in sequence until the final result can be obtained. When the task ends, you will see that the target image has been saved to the local disk.

3. shap-e

shap-e is a new model open sourced by OpenAI (developer of ChatGPT), which is used to generate 3D objects conditioned on text or images. It has transformed the world of 3D applications with its innovative approach. Allowing users to run it seamlessly on their computer without the need for an OpenAI API key or internet connection.

Number of Stars: 10.5K +

Project address: https://github.com/openai/shap-e

The picture below is a 3D diagram of “An airplane that looks like a banana”. And its use is also very simple, just install a package:

pip install -e .

This project is best used with a 3D designer! Very helpful for 3D designers.

4. Slidev

Slidev is a presentation tool specially built for developers. Through Slidev, we can make cool PPT using the familiar Markdown. It also has the ability to support HTML and Vue components, and can present pixel-perfect layout.

Star amount: 28.5K +

Official address: cn.sli.dev

Project address: https://github.com/slidevjs/slidev

As an ordinary programmer who is used to coding but not used to reporting, it is really difficult to make something like PPT. Whether using PowerPoint or WPS, there is always no way to know where to start. And Slidev is just the right tool. When used with md files, it is extremely efficient, powerful and surprising.

5.Pake

Pake is a lightweight tool developed in Rust language that can package web pages into desktop applications. It is very suitable for people who want to package WEB applications. After all, web applications can only be used based on browsers, and Pake can package them and use them as independent applications.

Number of Stars: 18.9K +

Project address: https://github.com/tw93/Pake

Compared with traditional Electron packaging, the application size packaged by Pake is about 20 times smaller, and the performance and experience are better than JS framework.

Pake can not only simply package web pages, but also implement functions such as shortcut key pass-through, immersive windows, drag and drop, style rewriting, and removal of advertisements, which can deeply customize the style of the product.

Pake supports Mac, Windows and Linux systems, and can package the same web page into a desktop application for distribution on multiple platforms.

▲ The picture comes from the Internet

6. Refine

refine is a React-based project and an open source, headless web application framework designed to help users freely build CRUD (create, read, update, delete) applications without being subject to any constraints.

Number of Stars: 15.7K +

Official address: refine.dev

Project address: https://github.com/refinedev/refine

It provides flexibility and customizability, allowing developers to build CRUD applications according to their needs. Refine makes application development more free, reducing the development cycle while providing rich functionality.

It has the following key features and core benefits:

  • Internal hooks and components are provided to simplify the development process and eliminate repetitive tasks.

  • Industry-standard solutions supporting key aspects of authentication, access control, routing, network communications and more.

  • Featuring a headless design, it offers unlimited styling customization options and works perfectly with custom designs or UI frameworks like TailwindCSS.

  • Popular UI libraries such as Ant Design, Material UI, Mantine and Chakra UI can be integrated to quickly start projects.

In addition, refine has the following features:

  • Zero configuration, set up in under a minute

  • Connector supports 15+ backend services

  • Supports Next.js or Remix for server rendering (SSR)

The above are the six open source projects selected by GitHub this week. Interested students can go to GitHub to download and use them! ! If you like the content of this issue, please click “Follow” and we will regularly push high-quality open source projects.

Write to the end

Thank you for accompanying us all the way, building the world with code, and exploring a magical journey full of unknowns and wonders together. If you are interested in Python programming skills, fun and practical open source projects, new industry knowledge and various technical information, then don’t miss the exciting content I will provide you in the future! Click _Follow_ to make your exploration and learning journey more colorful. We will grow and move forward together!

Please click three links_:_Like, retweet, and watching

↓Recommended to follow↓

Reply to the keyword “e-book” in the public account “Python Interest Circle” to receive e-books in PDF format (Introduction to Python, Asynchronous Programming, Web Crawler, High-Performance Programming, Practical Data Analysis and Mining, Spring, Linux, CSS, VUE, automated testing, programmer interview guide, etc.).

● Four domestic and foreign remote desktop software are tested: ToDesk, Sunflower, TeamViewer, AnyDesk

● The new generation of open source speech library CoQui TTS has reached GitHub 20.5k Star

● The latest and most complete VSCODE plug-in recommendations (2023 version)

● Found 3 more GPT-4 tools, free and highly recommended

● Summary of 28 commonly used methods in Pandas