[Front-end monitoring] Important indicators of front-end performance

For website monitoring, front and back ends are responsible for different tasks.

The server side is mainly responsible for analysis and alarm:

Log receiving and processing: Build back-end applications and provide log reporting interfaces to the collection SDK
Data publishing: the backend accepts logs and processes them into data that can be calculated by real-time streaming: dataHub/sls/Kafaka
Log processing: based on Flink/spark/storm
Monitoring and alarming: front-end applications or robots to achieve monitoring and alarming Common services: application server + DataHub + Flink + Hologres (RDS)

The front end is mainly responsible for two parts: log collection and log reporting.

Log collection is divided into two levels: page stability and page fluency.

Page Stability

The most important part is the front-end exceptions, including JS errors, interface exceptions, resource exceptions, etc. These have been sorted out in the front-end exception capture and processing, and can be used directly from the copy code.

There are also page white screens and page crashes.

Page white screen

Scheme	Implementation	Advantages	Disadvantages
Based on the Native container	3s after the page is loaded, the page is still full-screen white pixels	Do not rely on js, do not lose points	Depend The container
Based on PaitingTiming	After the page is loaded, there is no first-paint for the 3s page	It is simple to implement and does not depend on the container. There is almost no impact on performance	Poor compatibility
Based on MutationObserver	Whether there is a node change within 3 seconds after the page is loaded	The implementation is simple and does not depend on the container	No changes in nodes does not mean a white screen, there may be styles

page crash

Scheme	Implementation	Advantages	Disadvantages
Based on Native container	Monitor WebView process status, send Crash log	Do not depend on js, do not lose points	Depend on container
Based on Service Worker	HTML request enters the SW and the mark page starts to load, the page sends a heartbeat to the SW at regular intervals, and the heartbeat is not received for a period of time Then it is considered that the page Crash	does not depend on the container	and has poor compatibility. SW is highly intrusive and risky; heartbeat detection cannot be performed after page Pause
Based on the page leaving status in LocalStorage	Mark starts loading when the page loads , When the page is `pagehide/beforeunload`, mark it to leave, and when entering the page for the second time, judge whether it is normal to leave	Do not depend on the container, the compatibility is better	Buy Sending lag, can not play the role of monitoring alarm

Navigator.sendBeacon You can try this. No embedding points will be lost, and page unloading will not be delayed, but there will be compatibility issues.

Page fluency

Loading performance

Indicators	Collection method
load	NavigationTiming
FCP	PerformancePaintTiming
LCP	PerformancePaintTiming
FID first input delay	PerformanceEventTiming
TTI time to interactive	No API, lighthouse detection
TBT total blocking time	No API, lighthouse detection
Cumulative layout shift	Layout Instability API

There are also cases of lazy loading, and these indicators cannot reflect the real loading performance. To do this piece of content at this stage, we can only actively bury points intrusively.

First Paint (FP)

The time when the browser changes for the first time The time from when the browser responds to the URL address entered by the user to when the browser starts to display the content. Included is any user-defined background painting, which is the moment pixels are first drawn to the screen.

First Contentful Paint (FCP)

The time from when the browser renders the first DOM to the screen in response to user input of a network address. When drawing text for the first time on the page, images (including background images), non-white canvas or SVG are considered FCP.

Time to Interactive (TTI)

The point at which a web page first becomes fully interactive (can respond to button clicks, text input into text boxes, etc.). At this time, the main thread has reached the level of “smoothness”, and the tasks of the main thread do not exceed 50 milliseconds. In a general management system, TTI is a very important indicator.

First Meaning Paint (FMP)

The point at which the “main content” of the page begins to appear on the screen, which used to be our main metric for measuring the user’s loading experience. In essence, an algorithm is used to guess that a certain point in time may be FMP, but in the best case, the accuracy rate is only 77%. This indicator was discarded in lighthouse6.0 and replaced by the LCP indicator.

Largest Contentful Paint (LCP)

Viewable Area “Content” The point at which the largest visible element begins to appear on the screen. LCP is not perfect, and it is also prone to errors. It will stop capturing after user interaction, and may get wrong results. If there is a carousel that occupies a large page, it will also cause problems and will continue to update LCP.

// The browser visits the initial time measurement point
performance.navigationStart
// navigationStart to current milliseconds
performance. now()
// Each key time point includes the above navigationStart
performance.timing
// Time statistics for each resource request
performance. getEntries()

// Calculate the page white screen time
performance. getEntriesByType("paint")
// Get the current time before the body tag - performance.timing.navigationStart
// White screen time = the time when the page starts displaying - the time when the request starts.

// FCP first screen content drawing
const fcp_time_by_interactive = performance.timing.interactive - performance.timing.fetchStart
const fcp_time_by_loadEventEnd = = performance.timing.loadEventEnd - performance.timing.navigationStart // The end time of the first screen content rendering - the time to start the request
// Get the current time before the element page that needs to be displayed - performance.timing.navigationStart

// TTI interactive time
const tti = domContentLoadedEventEnd - navigationStart

// LCP maximum content drawing
// Directly use PerformanceObserver to capture LCP
const observer = new PerformanceObserver((entryList) => {<!-- -->
  const entries = entryList. getEntries();
  const lastEntry = entries[entries. length - 1];
  const lcp = lastEntry.renderTime || lastEntry.loadTime;
  console.log('LCP:', lcp);
});
observer.observe({<!-- --> entryTypes: ['largest-contentful-paint'] });
// or use a library to do it
import {<!-- --> getLCP } from 'web-vitals';

// Measure and log the current LCP value,
// any time it's ready to be reported.
getLCP(console.log);

Custom Performance Collection

Indicators	Collection method	Description
UserTiming	PerformanceUserTiming	The business is marked by `performance.mark(...)` when the first screen rendering is completed, and the collection SDK can be passed`PerformanceObserver` collects and reports
ElementTiming	PerformanceElementTiming	The problem with UserTiming is that business students cannot know the first screen After the HTML is created, when will the content inside be rendered by the browser, and ElementTiming is to give users the ability to detect that a certain text or image content has been rendered, just add the elementtiming attribute to a specific node, so that we The rendering of some representative nodes is completed as a symbol of the completion of the first screen rendering. The collection SDK can collect and report through `PerformanceObserver`

Response speed

The time-consuming from user operation to page response is usually required to be less than 100ms

// Based on PerformanceEventTiming, monitor the delay time from any user input to browser response
var observer = new PerformanceObserver((list) => {<!-- -->
  list.getEntries().forEach(entry => {<!-- -->
    // name: entry.name
    // Overall time-consuming: entry.duration
    // Time-consuming event processing function: entry.processingEnd - entry.processingStart
  })
})
observer.observe({<!-- --> type: 'event', buffered: true })

Animation fluency

Monitor whether the frame rate of any animation on the page is stable

Monitor the execution of each requestAnimationFrame during the animation running, and calculate:

Frame rate animation running frame number/animation running time
Frame drop rate (the number of frames that should be run in 60FPS standard – the number of actual running frames)/the number of frames that should be run in 60FPS standard

// At 60 frames per second, the interval between each frame
const msInOneFrame = 1000/60
const expectedFrames = Math. floor(e. elapsedTime*1000/msInOneFrame)
const error_rate = (expectedFrames-number of running frames)/expectedFrames

Log report

Front-end exception capture and processing The exception-triggering hooks mentioned in this article, as well as some browser hooks mentioned above, can be used.

References

Build a monitoring system from scratch — SDK
Don’t be confused about the construction steps of front-end monitoring!
Build front-end monitoring, how to collect abnormal data