Service log performance tuning, huge pit caused by log

Onlythosewhohavebeenseverelybeatenbyonlineserviceproblemsunderstandhowimportantlogsare!

Whoisinfavorandwhoisagainst?Ifyoufeelthesameway,congratulationsonbecomingasocialperson:)

Theimportanceoflogstotheprogramisself-evident.Itislightweight,simple,andrequiresnobraineffort.Itcanbefoundeverywhereintheprogramcodeandhelpsustroubleshootandlocateproblems.However,theseeminglyinconspicuouslogshidevarious”pits”.Ifusedimproperly,notonlywilltheynothelpus,buttheywillbecomeservice”killers”.

Thisarticlemainlyintroducesthe”pits”causedbyimproperuseofproductionenvironmentlogsandhowtoavoidthem,whichisespeciallyobviousinhigh-concurrencysystems.Atthesametime,asetofimplementationsolutionsareprovidedtoallowprogramsandlogsto”coexistharmoniously”.

Avoidpitfallsandpointnorth

Inthischapter,Iwillintroducethelogproblemsencounteredonlineinthepast,andanalyzetherootcausesoftheproblemsonebyone.

Irregularlogwritingformat

//Format1
log.debug("getuser"+uid+"fromDBisEmpty!");

//Format2
if(log.isdebugEnable()){
log.debug("getuser"+uid+"fromDBisEmpty!");
}

//Format3
log.debug("getuser{}fromDBisEmpty!",uid);

Ibelievethateveryonehasseentheabovethreewritingmethodsmoreorlessintheprojectcode,sowhatarethedifferencesbetweenthembefore,andwhatimpactwilltheyhaveonperformance?

IfyouturnofftheDEBUGloglevelatthistime,thedifferenceappears:

Format1stillneedstoperformstringconcatenation,evenifitdoesnotoutputlogs,itisawaste.

Thedisadvantageofformat2isthatadditionaljudgmentlogicneedstobeadded,whichaddswastecodeandisnotelegantatall.

Therefore,format3isrecommended.Itwillbedynamicallysplicedonlyduringexecution.Afterturningoffthecorrespondingloglevel,therewillbenoperformanceloss.

Productionprintingalargenumberoflogsconsumesperformance

Havingasmanylogsaspossiblecanstringtogetheruserrequests,makingiteasiertodeterminethelocationoftheproblematiccode.Duetothecurrentdistributedsystemandcomplexbusiness,thelackofanylogisahugeobstacleforprogrammerstolocateproblems.Therefore,programmerswhohavesufferedfromproductionproblemsmustlogasmuchaspossibleduringthecodedevelopmentprocess.

Inordertolocateandfixproblemsassoonaspossibleiftheyoccuronlineinthefuture,programmerswilllogasmanykeylogsaspossibleduringtheprogrammingimplementationphase.Aftergoingonline,theproblemcanbequicklylocated,butthentherewillbenewchallenges:withtherapiddevelopmentofthebusiness,uservisitscontinuetoincrease,andthesystempressureisincreasing.Atthistime,therearealargenumberofINFOlogsonline,especiallyinDuringpeakperiods,alargenumberoflogdiskwritesconsumeserviceperformance.

Thenthisbecomesagametheory.Iftherearemorelogs,itwillbeeasiertotroubleshootproblems,butserviceperformancewillbe”eaten”.Iftherearefewerlogs,servicestabilitywillhavenoimpact,buttroubleshootingwillbedifficult,andprogrammerswillsuffer.

Question:WhydoestheperformancesufferiftherearetoomanyINFOlogs(CPUusageisveryhighatthistime)?

Rootcause1:SynchronousprintinglogdiskI/Ohasbecomeabottleneck,resultinginalargenumberofthreadblocks

Itisconceivablethatifthelogsarealloutputtothesamelogfileandmultiplethreadsarewritingtothefile,itwillbechaotic.Thesolutionistoaddlockstoensurethatthelogfileoutputwillnotbeconfused.Ifitisduringpeakperiods,lockcontentionwillundoubtedlyconsumethemostperformance.Whenonethreadgrabsthelock,otherthreadscanonlyblockandwait,whichseriouslydragsdowntheuserthread.Theperformanceisthattheupstreamcalltimesoutandtheuserfeelsstuck.

Thefollowingisthestackwhenthethreadisstuckwritingafile:

StackTraceis:
java.lang.Thread.State:BLOCKED(onobjectmonitor)
atorg.apache.logging.log4j.core.appender.OutputStreamManager.writeBytes(OutputStreamManager.java:352)
-waitingtolock<0x000000063d668298>(aorg.apache.logging.log4j.core.appender.rolling.RollingFileManager)
atorg.apache.logging.log4j.core.layout.TextEncoderHelper.writeEncodedText(TextEncoderHelper.java:96)
atorg.apache.logging.log4j.core.layout.TextEncoderHelper.encodeText(TextEncoderHelper.java:65)
atorg.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:68)
atorg.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32)
atorg.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:228)
.....

SoisitokaytoreducetheINFOlogonline?Similarly,theamountofERRORlogscannotbeunderestimated.Ifalargeamountofabnormaldataappearsonlineoralargenumberofdownstreamtimeoutsoccur,alargenumberofERRORlogswillbegeneratedinstantly.Atthistime,thediskI/Owillstillbefull,causinguserthreadstoblock.

Question:Assumingyoudon’tcareaboutINFOtroubleshooting,istherenoperformanceproblemifyouonlyprintERRORlogsinproduction?

Rootcause2:ThreadBlockcausedbylogprintingexceptionstackunderhighconcurrency

Onceuponatime,alargenumberoftimeoutsoccurredonlineanddownstream,andtheexceptionswereallcaughtbyourservice.Fortunately,thedisasterrecoverydesignanticipatedthatthisproblemwouldoccur,andimplementedabottom-linelogic.Fortunately,therewasnoimpact,buttheserverstarted”Teachpeoplehowtobeagoodperson.”Theonlinemonitoringstartedtoalarm.TheCPUusageincreasedtoofast,andtheCPUincreaseddirectlyto90%+.Atthistime,weurgentlyexpandedthecapacitytostoptheloss,andfoundamachinetopulldownthetrafficandpullthestack.

Aftercheckingthedumpedthreadstack,combinedwiththeflameregressionanalysis,mostoftheready-madethreadsarestuckatthefollowingstacklocations:

StackTraceis:
java.lang.Thread.State:BLOCKED(onobjectmonitor)
atjava.lang.ClassLoader.loadClass(ClassLoader.java:404)
-waitingtolock<0x000000064c514c88>(ajava.lang.Object)
atsun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
atjava.lang.ClassLoader.loadClass(ClassLoader.java:357)
atorg.apache.logging.log4j.core.impl.ThrowableProxyHelper.loadClass(ThrowableProxyHelper.java:205)
atorg.apache.logging.log4j.core.impl.ThrowableProxyHelper.toExtendedStackTrace(ThrowableProxyHelper.java:112)
atorg.apache.logging.log4j.core.impl.ThrowableProxy.(ThrowableProxy.java:112)
atorg.apache.logging.log4j.core.impl.ThrowableProxy.(ThrowableProxy.java:96)
atorg.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy(Log4jLogEvent.java:629)
...

Thestackhereislong,andmostoftheblocksinthesceneareinjava.lang.ClassLoader.loadClass,andwhenyoulookdownthestack,youfindthattheyarealltriggeredbythislineofcode.

atorg.apache.logging.slf4j.Log4jLogger.error(Log4jLogger.java:319)

//Thecorrespondingbusinesscodeis
log.error("dsfetchergeterror",e);

Ahthis…That’soutrageous.Whywouldyouloadaclasswhenyoulogit?Whydoesloadingaclassblocksomanythreads?

Aftersomereviewandanalysis,wecametothefollowingconclusions:

WhenusingLog4j’sLogger.errortoprinttheexceptionstack,inordertoprintoutthelocationinformationoftheclassesinthestack,youneedtouseClassloaderforclassloading;
Classloaderloadingisthread-safe.Althoughparallelloadingcanimprovetheefficiencyofloadingdifferentclasses,whenmultiplethreadsloadthesameclass,theystillneedtowaitforeachothersynchronously.Especiallywhentheexceptionstacksprintedbydifferentthreadsareexactlythesame,additionalthreadswillbeadded.Blockrisk,andwhentheClassloaderloadsaclassthatcannotbeloaded,theefficiencywilldropsharply,furtherworseningthethreadBlocksituation;
Becauseoftheefficiencyproblemofreflectioncalls,JDKoptimizesreflectioncallsanddynamicallygeneratesJavaclassesformethodcalls,replacingtheoriginalnativecalls.ThegenerateddynamicclassesareloadedbyDelegatingClassLoaderandcannotbeloadedbyotherClassloaders.IntheexceptionstackDynamicclasseswithreflectionoptimizationareverypronetothreadblocksituationsunderhighconcurrencyconditions.
Combinedwiththeabovestack,itisverycleartogetstuckhere:

Theinfluxofalargenumberofthreadscausesdownstreamservicestotimeout,causingthetimeoutexceptionstacktobeprintedfrequently.Eachlayerofthestackneedstoobtainthecorrespondingclass,version,linenumberandotherinformationthroughreflection.LoadClassneedstowaitsynchronously.AthreadLockingcausesmostthreadstoblockandwaitfortheclasstobeloadedsuccessfully,affectingperformance.
Tobefair,evenifmostthreadsarewaitingforathreadtoloadClass,itwillonlybestuckforamoment.WhydoesthiserrorkeeploadingClass?Combiningtheaboveconclusionsandanalyzingtheprogramcode,itisconcludedthattherequestdownstreamservicelogicinthethreadherecontainsGroovyscriptexecutionlogic,whichbelongstodynamicclassgeneration.Thethirdconclusionaboveshowsthatdynamicclassescannotbecorrectlyreflectedandloadedbylog4junderhighconcurrencyconditions.,thenstackreflectionneedstobeusedagain,anditentersaninfiniteloop.Moreandmorethreadscanonlyjoinandwait,blocking.

BestPractices

1.Removeunnecessaryexceptionstackprinting

Forobviousexceptions,don’tprintthestackandsavesomeperformance.Anything+highconcurrencymeansadifferentmeaning:)

try{
System.out.println(Integer.parseInt(number)+100);
}catch(Exceptione){
//Beforeimprovement
log.error("parseinterror:"+number,e);
//Afterimprovement
log.error("parseinterror:"+number);
}

IfanexceptionoccursinInteger.parseInt,thecauseoftheexceptionmustbethattheincomingandoutgoingnumberisillegal.Inthiscase,printingtheexceptionstackiscompletelyunnecessary,andtheprintingofthestackcanberemoved.

2.Convertthestackinformationintoastringandprintit

publicstaticStringstacktraceToString(Throwablethrowable){
StringWriterstringWriter=newStringWriter();
throwable.printStackTrace(newPrintWriter(stringWriter));
returnstringWriter.toString();
}

Thestackinformationobtainedbylog.errorwillbemorecomplete.TheJDKversion,Classpathinformation,andtheclassesinthejarpackagewillalsoprintthenameandversioninformationofthejar.Theseareallinformationobtainedbyloadingtheclassreflection,whichisgreatlylossperformance.

CallstacktraceToStringtoconverttheexceptionstackintoastring.Relativelyspeaking,itconfirmssomeversionandjarmetadatainformation.Atthistime,youneedtomakeyourowndecisionwhetheritisnecessarytoprintoutthisinformation(forexample,classconflicttroubleshootingisstillbasedonversion).useful).

3.Disablereflectionoptimization

UseLog4jtoprintstackinformation.Ifthereisadynamicproxyclassgeneratedbyreflectionoptimizationinthestack,thisproxyclasscannotbeloadedbyotherClassloaders.Printingthestackatthistimewillseriouslyaffecttheexecutionefficiency.However,disablingreflectionoptimizationwillalsohavesideeffects,resultinginreducedefficiencyofreflectionexecution.

4.Asynchronousprintinglog

Inproductionenvironments,especiallyserviceswithhighQPS,asynchronousprintingmustbeenabled.Ofcourse,ifasynchronousprintingisenabled,thereisapossibilityoflosinglogs,suchastheserverbeingforcibly”killed”,whichisalsoaprocessoftrade-offs.

5.Logoutputformat

Thedifferencebetweenourtheaterlogoutputformats

//Format1
[%d{yyyy/MM/ddHH:mm:ss.SSS}[%X{traceId}]%t[%p]%C{1}(%F:%M:%L)%msg%n

//Format2
[%d{yy-MM-dd.HH:mm:ss.SSS}][%thread][%-5p%-22c{0}-]%m%n

Theofficialwebsitealsohasclearperformancecomparisontips.Ifthefollowingfieldsareusedforoutput,itwillgreatlyreduceperformance.

%Cor$class,%For%file,%lor%location,%Lor%line,%Mor%method

Inordertogetthefunctionnameandlinenumberinformation,log4jusestheexceptionmechanism.Itfirstthrowsanexception,thencapturestheexceptionandprintsoutthestackcontentoftheexceptioninformation,andthenparsesthelinenumberfromthestackcontent.Thelockacquisitionandparsingprocessisaddedtotheimplementationsourcecode.Underhighconcurrency,theperformancelosscanbeimagined.

Thefollowingareparameterconfigurationsthataffectperformance.Pleaseconfigurethemasappropriate:

%C-thecaller’sclassname(slow,notrecommended)
%F-thefilenameofthecaller(extremelyslow,notrecommended)
%l-thecaller’sfunctionname,filename,andlinenumber(extremelynotrecommended,veryperformanceconsuming)
%L-thelinenumberofthecaller(extremelyslow,notrecommended)
%M-thecaller’sfunctionname(extremelyslow,notrecommended)

Solution-Dynamicadjustmentofloglevel

TheprojectcodeneedstoprintalargenumberofINFOlevellogstosupportproblemlocationandtesttroubleshooting.However,theselargeamountsofINFOlogsareineffectiveintheproductionenvironment.AlargenumberoflogswilleatupCPUperformance.Atthistime,theloglevelneedstobedynamicallyadjustedsothattheINFOlogscanbeviewedatanytimeandcloseddynamicallywhennotneeded.Impactserviceperformanceneeds.

Solution:CombineApolloandlog4j2featurestodynamicallyandfine-grainedcontroltheloglevelgloballyorwithinasingleClassfilefromtheAPIlevel.Theadvantageisthatittakeseffectatanytime.Forproductiontroubleshooting,youcanspecifythelogleveltoopenasingleclassfile,anditcanbeclosedatanytimeaftertroubleshooting.

Duetothelengthofthisarticle,Iwillnotpostthespecificimplementationcode.Infact,theimplementationisverysimple,whichistocleverlyuseApollo’sdynamicnotificationmechanismtoresettheloglevel.Ifyouareinterested,youcansendmeaprivatemessageorleaveamessage,andIwillwriteanarticle.Thisarticleisdedicatedtoexplainingindetailhowtoachievethis.

SummaryandOutlook

Thisarticletakesyouthroughthecommonproblemsoflogsindailysoftwareservicesandthecorrespondingsolutions.Remember,simplethings+highconcurrency=notsimple!Berespectfulofproduction!

Finally,Iwouldliketothankeveryonewhoreadmyarticlecarefully.Reciprocityisalwaysnecessary.Althoughitisnotaveryvaluablething,ifyoucanuseit,youcantakeitdirectly:

Thisinformationshouldbethemostcomprehensiveandcompletepreparationwarehousefor[softwaretesting]friends.Thiswarehousehasalsoaccompaniedtensofthousandsoftestengineersthroughthemostdifficultjourney.Ihopeitcanalsohelpyou!

Theknowledgepointsofthearticlematchtheofficialknowledgefiles,andyoucanfurtherlearnrelatedknowledge.JavaSkillTreeHomepageOverview133681peoplearelearningthesystem