[Front-end monitoring] Important indicators of front-end performance

For website monitoring, front and back ends are responsible for different tasks.

The server side is mainly responsible for analysis and alarm:

  1. Log receiving and processing: Build back-end applications and provide log reporting interfaces to the collection SDK
  2. Data publishing: the backend accepts logs and processes them into data that can be calculated by real-time streaming: dataHub/sls/Kafaka
  3. Log processing: based on Flink/spark/storm
  4. Monitoring and alarming: front-end applications or robots to achieve monitoring and alarming Common services: application server + DataHub + Flink + Hologres (RDS)

The front end is mainly responsible for two parts: log collection and log reporting.

Log collection is divided into two levels: page stability and page fluency.

Page Stability

The most important part is the front-end exceptions, including JS errors, interface exceptions, resource exceptions, etc. These have been sorted out in the front-end exception capture and processing, and can be used directly from the copy code.

There are also page white screens and page crashes.

Page white screen

Scheme Implementation Advantages Disadvantages
Based on the Native container 3s after the page is loaded, the page is still full-screen white pixels Do not rely on js, do not lose points Depend The container
Based on PaitingTiming After the page is loaded, there is no first-paint for the 3s page It is simple to implement and does not depend on the container. There is almost no impact on performance Poor compatibility
Based on MutationObserver Whether there is a node change within 3 seconds after the page is loaded The implementation is simple and does not depend on the container No changes in nodes does not mean a white screen, there may be styles

page crash

Scheme Implementation Advantages Disadvantages
Based on Native container Monitor WebView process status, send Crash log Do not depend on js, do not lose points Depend on container
Based on Service Worker HTML request enters the SW and the mark page starts to load, the page sends a heartbeat to the SW at regular intervals, and the heartbeat is not received for a period of time Then it is considered that the page Crash does not depend on the container and has poor compatibility. SW is highly intrusive and risky; heartbeat detection cannot be performed after page Pause
Based on the page leaving status in LocalStorage Mark starts loading when the page loads , When the page is pagehide/beforeunload, mark it to leave, and when entering the page for the second time, judge whether it is normal to leave Do not depend on the container, the compatibility is better Buy Sending lag, can not play the role of monitoring alarm

Navigator.sendBeacon You can try this. No embedding points will be lost, and page unloading will not be delayed, but there will be compatibility issues.

Page fluency

Loading performance

image.png

Indicators Collection method
load NavigationTiming
FCP PerformancePaintTiming
LCP PerformancePaintTiming
FID first input delay PerformanceEventTiming
TTI time to interactive No API, lighthouse detection
TBT total blocking time No API, lighthouse detection
Cumulative layout shift Layout Instability API

There are also cases of lazy loading, and these indicators cannot reflect the real loading performance. To do this piece of content at this stage, we can only actively bury points intrusively.

First Paint (FP)

The time when the browser changes for the first time The time from when the browser responds to the URL address entered by the user to when the browser starts to display the content. Included is any user-defined background painting, which is the moment pixels are first drawn to the screen.

First Contentful Paint (FCP)

The time from when the browser renders the first DOM to the screen in response to user input of a network address. When drawing text for the first time on the page, images (including background images), non-white canvas or SVG are considered FCP.

Time to Interactive (TTI)

The point at which a web page first becomes fully interactive (can respond to button clicks, text input into text boxes, etc.). At this time, the main thread has reached the level of “smoothness”, and the tasks of the main thread do not exceed 50 milliseconds. In a general management system, TTI is a very important indicator.

First Meaning Paint (FMP)

The point at which the “main content” of the page begins to appear on the screen, which used to be our main metric for measuring the user’s loading experience. In essence, an algorithm is used to guess that a certain point in time may be FMP, but in the best case, the accuracy rate is only 77%. This indicator was discarded in lighthouse6.0 and replaced by the LCP indicator.

Largest Contentful Paint (LCP)

Viewable Area “Content” The point at which the largest visible element begins to appear on the screen. LCP is not perfect, and it is also prone to errors. It will stop capturing after user interaction, and may get wrong results. If there is a carousel that occupies a large page, it will also cause problems and will continue to update LCP.

// The browser visits the initial time measurement point
performance.navigationStart
// navigationStart to current milliseconds
performance. now()
// Each key time point includes the above navigationStart
performance.timing
// Time statistics for each resource request
performance. getEntries()

// Calculate the page white screen time
performance. getEntriesByType("paint")
// Get the current time before the body tag - performance.timing.navigationStart
// White screen time = the time when the page starts displaying - the time when the request starts.

// FCP first screen content drawing
const fcp_time_by_interactive = performance.timing.interactive - performance.timing.fetchStart
const fcp_time_by_loadEventEnd = = performance.timing.loadEventEnd - performance.timing.navigationStart // The end time of the first screen content rendering - the time to start the request
// Get the current time before the element page that needs to be displayed - performance.timing.navigationStart

// TTI interactive time
const tti = domContentLoadedEventEnd - navigationStart

// LCP maximum content drawing
// Directly use PerformanceObserver to capture LCP
const observer = new PerformanceObserver((entryList) => {<!-- -->
  const entries = entryList. getEntries();
  const lastEntry = entries[entries. length - 1];
  const lcp = lastEntry.renderTime || lastEntry.loadTime;
  console.log('LCP:', lcp);
});
observer.observe({<!-- --> entryTypes: ['largest-contentful-paint'] });
// or use a library to do it
import {<!-- --> getLCP } from 'web-vitals';

// Measure and log the current LCP value,
// any time it's ready to be reported.
getLCP(console.log);

Custom Performance Collection

Indicators Collection method Description
UserTiming PerformanceUserTiming The business is marked by performance.mark(...) when the first screen rendering is completed, and the collection SDK can be passedPerformanceObserver collects and reports
ElementTiming PerformanceElementTiming The problem with UserTiming is that business students cannot know the first screen After the HTML is created, when will the content inside be rendered by the browser, and ElementTiming is to give users the ability to detect that a certain text or image content has been rendered, just add the elementtiming attribute to a specific node, so that we The rendering of some representative nodes is completed as a symbol of the completion of the first screen rendering. The collection SDK can collect and report through PerformanceObserver

Response speed

The time-consuming from user operation to page response is usually required to be less than 100ms

// Based on PerformanceEventTiming, monitor the delay time from any user input to browser response
var observer = new PerformanceObserver((list) => {<!-- -->
  list.getEntries().forEach(entry => {<!-- -->
    // name: entry.name
    // Overall time-consuming: entry.duration
    // Time-consuming event processing function: entry.processingEnd - entry.processingStart
  })
})
observer.observe({<!-- --> type: 'event', buffered: true })

Animation fluency

Monitor whether the frame rate of any animation on the page is stable

Monitor the execution of each requestAnimationFrame during the animation running, and calculate:

  1. Frame rate animation running frame number/animation running time
  2. Frame drop rate (the number of frames that should be run in 60FPS standard – the number of actual running frames)/the number of frames that should be run in 60FPS standard
// At 60 frames per second, the interval between each frame
const msInOneFrame = 1000/60
const expectedFrames = Math. floor(e. elapsedTime*1000/msInOneFrame)
const error_rate = (expectedFrames-number of running frames)/expectedFrames

Log report

Front-end exception capture and processing The exception-triggering hooks mentioned in this article, as well as some browser hooks mentioned above, can be used.

References

  • Build a monitoring system from scratch — SDK
  • Don’t be confused about the construction steps of front-end monitoring!
  • Build front-end monitoring, how to collect abnormal data