Text Template Technical Solution in Video Editing Scenario

Author | Lok’tar ogar

guide

Based on the development practice of Duka Editing APP text templates, this article shares the technical solution of static text template rendering capabilities in video editing scenarios. As a superset of the rich text rendering solution, this technical solution can be extended to other scenarios that require complex rich text rendering.

The full text is 6745 words, and the expected reading time is 17 minutes.

Sneak peek

Text template effect display:
image
image
Application of △Text Template in Duka Editing

01 Background

One of the core competitiveness of video creation tools is its rich material library, including various video materials, audio materials, sticker materials and so on. The text template is also an integral part. The text template provides a rich text editing function, enabling users to add more beautifully styled text information in the video, thereby increasing the diversity of video materials. In addition, through the preset style, users can more conveniently choose the text template that suits them, which saves the time of material selection and improves the user experience.
In the early version of Duka, we did not provide the material type of text template. In order to enhance the competitiveness of products and increase the penetration rate of materials, we have carried out certain research and development work, and finally launched the text template materials. These text template materials can not only meet the needs of users, but also provide users with more creative inspiration and ideas. At the same time, we are constantly updating and optimizing our material library to ensure that users can get the latest and highest quality material resources.
The graphics and text styles that the text template needs to present are relatively complex. The features supported by the Duka text template are listed below:
image

02 Overall Design

Based on the material platform we have built, we have added a new type of text template, and provided material editing, preview, configuration and online functions on the material platform. Combining material production and preview, you can preview the effect just adjusted on the same interface, directly match the font library of Duka, and directly modify image resources. This material production method is highly reusable. Using a text template, changing a background image and adding a stroke, another text template can be directly produced. After publishing this template and exporting the effect diagram, it will enter the queue to be reviewed, and it can be configured and launched after review.
image
Up to now, we have launched 361 sets of text templates, and completed the complete link of [Material Production] – [Material Platform Preview] – [Material Delivery and Client Loading] – [Client Rendering].
image

03 function realization

3.1 Material production

At present, the mainstream material format in the video editing industry usually adopts resource files and configuration files (description files). Among them, resource files include image resources and font files, while configuration files are mainly used to describe the typesetting attributes and rendering parameters of text templates. The advantage of this method is that the production side only needs to describe the relevant characteristics through specific fields, and it can be presented on the rendering side. This method is highly flexible, and can iterate related features from simple to complex according to the needs of specific scenarios, while the implementation cost is relatively low. However, the disadvantage is that the form of material production is customized, which requires a certain amount of design and learning costs.
In addition, there is another production method for professional design software, taking Photoshop (PS) as an example. PS has relatively mature file format documents, including various data structures, and can directly use PSD files to analyze graphic and text attributes for rendering. The advantage of this method is that the material production method is more general, and the design has almost no learning cost. However, the disadvantage is that some features we need cannot be simply satisfied by PSD, such as the multi-layer shadow effect, which is the effect obtained by superimposing multiple text boxes. When modifying text content, we need to modify these text boxes simultaneously, so we need to treat them as a group, and the logic becomes more complicated. If it is described by a configuration file, it can directly perform multi-layer drawing without complicated logic processing.
Considering the feasibility of business ROI and short-term online functions, we adopted the first method, and drew on the material production standards of the butter camera team, and designed a JSON? structure for describing typesetting attributes and rendering parameters.

3.2 side rendering

In the video editing scenario, text processing requires two parts: text layout and text drawing. For text typesetting, the iOS platform uses the underlying framework of CoreText for typesetting processing, while Android can obtain the results of the underlying FreeType’s font processing through FontMetrics, etc. Regardless of whether a piece of text is formatted as a whole, or the position of each text is calculated separately, the performance consumption of the overall processing is the same.
In terms of text drawing, it is necessary to strike a balance between performance overhead and development cost. In the end, iOS adopted the QuartzCore framework, and Android used Canvas for text drawing. In this way, when previewing, the text can be directly presented on the view, supporting real-time editing and previewing. When the video needs to be exported, we process it as a sticker and add it to the video. Taking iOS as an example, the Huazi component architecture is as follows:
image

3.3 Description file design

As mentioned above, we use json files to describe the typesetting attributes and rendering parameters of the text template. After the resources are delivered to the client, the client will parse the corresponding parameters to perform the typesetting of the text template and the final effect presentation. The following content will be involved in the description file:

(1) Text layout attributes
image

  • baseline: character baseline, baseline is a virtual line
  • ascent: the recommended distance from the highest point of the glyph to the baseline
  • descent: the recommended distance from the lowest point of the glyph to the baseline
  • leading: line spacing, that is, the distance between the descent of the previous line and the ascent of the next line
  • advance width: the distance from Origin to the next glyph Origin
  • left-side bearing: the distance from Origin to the leftmost of the glyph
  • right-side bearing: the distance from the rightmost side of the glyph to the Origin of the next glyph
  • bounding box: the smallest rectangle containing the glyph
  • x-height: generally refers to the recommended distance from the highest point of the lowercase letter x to the baseline
  • Cap-height: generally refers to the recommended distance from the highest point of H or I to the baseline

(2) Combination of text objects
The figure below is an example of the combination of two text drawing areas
image

3.4 Typesetting and drawing process

In our text template, typesetting and drawing are inseparable, and need to be interspersed in code logic for processing. Our drawing step is to draw layer by layer from the bottom layer to the top layer, but because some drawing processes consume a lot of time, in order to avoid blocking the main thread, we use asynchronous drawing technology. In the process of asynchronous drawing, we put some time-consuming drawing processes in the background thread for processing, so as not to affect the normal use of users. At the same time, in the process of asynchronous drawing, we will also calculate the text layout, so that we can quickly obtain the relevant information of the text in the subsequent drawing process, thereby improving the drawing efficiency. In general, by adopting the asynchronous drawing method, we can ensure that the typesetting and drawing process of the text template goes smoothly without causing too much interference to the user.
image

04 Difficulties and challenges

1. Alignment of multi-terminal effects

Our project supports web, iOS, and Android rendering, but because the general cross-end solution needs to use OpenGL rendering at the bottom layer, and the human resource constraints at that time made it difficult to achieve in the short term. Therefore, we adopted a multi-terminal independent rendering method, and each platform has an independent rendering solution. This method also brings a problem: the rendering effect of different platforms will be different.
To solve this problem, we need to ensure the consistency of multi-terminal effects. Since it is difficult to smooth out the differences at the technical level, we decided to achieve consistency through the unification of rules and standards. When designing the format of the json file, we unified the standards for multi-terminal rendering, such as whether the initial position of the text decoration relative to the text is aligned in the upper left corner or centered, and the origin of the coordinates is unified. At the same time, we also unified the units used by the corresponding parameters, so as to ensure the consistency of the final rendering effect to the greatest extent. In this way, no matter which platform is rendered on, we can get consistent results, making the user experience more uniform and good.

2. Text pre-typesetting

In the text template, we divide the font size into two types: fixed font size and non-fixed font size. For a fixed font size, we can directly perform text layout calculation and drawing. However, for fonts with a non-fixed font size, we need to perform pre-typesetting calculations to calculate the corresponding font size under the current text content. Some solutions here use the dichotomy method, first set a larger font size value, and gradually approach the correct value within the range of 0 to the font size value, but this actually causes unnecessary time loss, combined with the basic logic and limitations of text layout Conditions, we can make an algorithm with a time complexity close to O(1): calculate the maximum character height -> calculate the minimum character height -> calculate the character height of the longest line of words -> calculate the character height according to the number of lines -> calculate the final character Height -> Calculate the font size according to the character height. When CoreText typesetting technology is used in iOS, in a small number of cases, the text that cannot be filled will be automatically cut. Directly using the calculated font size will cause part of the text to be cut. Therefore, the above results should be used as estimation results, and the size will be reduced by 1 step by step until Can fill in the path.

 CGFloat ascent, descent;
        UIFont *font = [self. calFont fontWithSize:size];
        CTFontRef fontRefMeasure = (__bridge CTFontRef)font;
        [attrString addAttribute:(id)kCTFontAttributeName value:(__bridge id)fontRefMeasure range:NSMakeRange(0, attrString.length)];
        CTLineRef line = CTLineCreateWithAttributedString((__bridge CFAttributedStringRef)attrString);
        CTLineGetTypographicBounds(line, &ascent, &descent, NULL);
        
        //calculate max font size
        CGFloat calFontHeight = MIN(height, width);
        self.maxFontHeight = calFontHeight;
        
        //calculate min font size
        CGFloat maxLine = self.document.maxLine * BDTZBigFontDataOriginScale;
        if (maxLine <= 0) {
            maxLine = 1;
        }
        calFontHeight = [self itemWidth] / (maxLine + (maxLine - 1) * (self.leadingRatio * BDTZBigFontDataOriginScale - 1));
        self.minFontHeight = MIN(self.maxFontHeight, calFontHeight);
        
        // longest column
        int64_t n = 0;
        NSArray *strArray = [self.document.content componentsSeparatedByString:@"\
"];
        NSString *measureStr = self.document.content;
        // Here is the processing for multi-line text, the number of loops is the number of lines, and the magnitude is small (generally 1-10 lines)
        for (NSString *str in strArray) {
            if (str. length > n) {
                n = str. length;
                measureStr = str;
            }
        }
        CGFloat fontWidthRatioOrigin = (self.document.fontWidthRatio * BDTZBigFontDataOriginScale);
        CGFloat trackingRatio = (self.document.trackingRatio * BDTZBigFontDataOriginScale) * (ascent + descent) / ascent;
        CGRect rect = [@"我" boundingRectWithSize:CGSizeMake(CGFLOAT_MAX, CGFLOAT_MAX) options:NSStringDrawingUsesLineFragmentOrigin attributes:@{NSFontAttributeName:self.calFont} context:nil];
        CGFloat fontWidthRatio = fontWidthRatioOrigin > 0 ? fontWidthRatioOrigin * (ascent + descent) / ascent : rect.size.width / rect.size.height;
        CGFloat fontHeight = width / (n * fontWidthRatio + n * trackingRatio);
        
        if (strArray. count > 1) {
            //calculate font size accoring column count
            calFontHeight = [self itemWidth] / (strArray.count + (strArray.count - 1) * (self.leadingRatio * BDTZBigFontDataOriginScale - 1));
            //take the min value of the above two font sizes
            fontHeight = MIN(fontHeight, calFontHeight);
        }
                
        if (fontHeight > self. maxFontHeight) {
            fontHeight = self. maxFontHeight;
        } else if (fontHeight < self. minFontHeight) {
            fontHeight = self.minFontHeight;
        }
        
        CGFloat calSize = fontHeight;
        calFontHeight = [self calculateFontHeightSize:calSize];
        calSize = floorf(calSize / (calFontHeight * (ascent + descent) / ascent) * calSize);
        
        //exact value, calculate repeatedly with frame until the path can be filled
        
        //According to the estimated result, reduce the size by 1 until the path can be filled in, the code is omitted here
        
        if (calSize <= 0) {
            return calSize;
        }
        calFontHeight = [self calculateFontHeightSize:calSize];
        self.fontHeight = calFontHeight * (ascent + descent) / ascent;
        
        self.font = [self.calFont fontWithSize:calSize];

3. Drawing performance

The real-time preview of the text template needs to be drawn frequently, which will bring a large burden to the CPU, resulting in lag. In order to solve this problem, we must use asynchronous drawing. Specifically, we can create an asynchronous serial queue to store the text content status of each user operation. Whenever the user makes a modification, we queue the current state and wait for the background thread to draw asynchronously. After the previous state is drawn, we take out the next state to be drawn from the queue until all states are drawn. In this way, asynchronous drawing is realized to prevent the main thread from being stuck, and the results of each modification by the user are fully presented.
In order to further optimize the user experience of text templates, in addition to asynchronous drawing, you can also consider using a caching mechanism to improve rendering performance. When the user operates on the text template, the text view will re-layout and draw. If the entire template is redrawn every time, it will not only take up a lot of CPU resources, but also reduce the user experience. Therefore, we can use the cache to store the drawn template view. When the user modifies the text content, only the modified part needs to be redrawn instead of the entire view. In this way, we can improve rendering performance while reducing resource consumption and improving system responsiveness.

4. Memory optimization

Our text templates are mainly used in video editing scenarios, and users need to zoom in or out of the text templates according to specific situations. If the pure vector drawing refresh method is used, when the user enlarges the text template to a certain extent, the memory usage will be very high. In addition, our users usually add many materials in the editor, such as stickers, special effects, and subtitles, etc., and each of these materials takes up a lot of memory. After a period of use, the memory is likely to increase to the OOM threshold, causing the application to crash. Therefore, we currently control the memory of a single text template to less than 20M, and calculate the width and height thresholds required for the text template to achieve the expected clarity according to different video widths and heights, so as to achieve a balance between clarity and memory usage. Each text template has a different balance parameter. Although this is just a memory optimization detail, it has played a great role in controlling the memory usage of the material and the online OOM rate.

05 Conclusion

In the world of video editing, rich text rendering is a rather complex process. As far as end rendering is concerned, there is no one-size-fits-all solution, only the most suitable solution for a specific scene. In the process of designing and implementing a text template rendering solution, there are many details to consider. At the same time, it is also necessary to have an in-depth understanding of the file formats of mainstream design software such as PS and Figma.
Our team provides technical solutions related to static text templates, which can meet the more common rich text rendering scenarios. The overall idea is roughly similar for text layout and drawing. In this article, we introduce basic concepts and rich text features to help readers better understand our technical solutions.
However, even with the solution we provide, many details need to be considered in its implementation. We need to consider font size, color, alignment, character spacing, line spacing and other factors to ensure that the rendered rich text can achieve the expected effect.
Therefore, in order to achieve the best effect of rich text rendering, it needs to invest a lot of time and effort in design and implementation. Only by deeply understanding the characteristics and design principles of rich text can we provide users with a high-quality video editing experience.

–END–

Recommended reading:
Talking about the application of graph algorithm in the activity scene in anti-cheating

Serverless: Flexible Scaling Practice Based on Personalized Service Portraits

Action decomposition method in image animation application

Performance Platform Data Acceleration Road

Editing AIGC Video Production Process Arrangement Practice

Baidu engineers talk about video understanding