Onlythosewhohavebeenseverelybeatenbyonlineserviceproblemsunderstandhowimportantlogsare!
Whoisinfavorandwhoisagainst?Ifyoufeelthesameway,congratulationsonbecomingasocialperson:)
Theimportanceoflogstotheprogramisself-evident.Itislightweight,simple,andrequiresnobraineffort.Itcanbefoundeverywhereintheprogramcodeandhelpsustroubleshootandlocateproblems.However,theseeminglyinconspicuouslogshidevarious”pits”.Ifusedimproperly,notonlywilltheynothelpus,buttheywillbecomeservice”killers”.
Thisarticlemainlyintroducesthe”pits”causedbyimproperuseofproductionenvironmentlogsandhowtoavoidthem,whichisespeciallyobviousinhigh-concurrencysystems.Atthesametime,asetofimplementationsolutionsareprovidedtoallowprogramsandlogsto”coexistharmoniously”.
Avoidpitfallsandpointnorth
Inthischapter,Iwillintroducethelogproblemsencounteredonlineinthepast,andanalyzetherootcausesoftheproblemsonebyone.
Irregularlogwritingformat
//Format1 log.debug("getuser"+uid+"fromDBisEmpty!"); //Format2 if(log.isdebugEnable()){ log.debug("getuser"+uid+"fromDBisEmpty!"); } //Format3 log.debug("getuser{}fromDBisEmpty!",uid);
Ibelievethateveryonehasseentheabovethreewritingmethodsmoreorlessintheprojectcode,sowhatarethedifferencesbetweenthembefore,andwhatimpactwilltheyhaveonperformance?
IfyouturnofftheDEBUGloglevelatthistime,thedifferenceappears:
Format1stillneedstoperformstringconcatenation,evenifitdoesnotoutputlogs,itisawaste.
Thedisadvantageofformat2isthatadditionaljudgmentlogicneedstobeadded,whichaddswastecodeandisnotelegantatall.
Therefore,format3isrecommended.Itwillbedynamicallysplicedonlyduringexecution.Afterturningoffthecorrespondingloglevel,therewillbenoperformanceloss.
Productionprintingalargenumberoflogsconsumesperformance
Havingasmanylogsaspossiblecanstringtogetheruserrequests,makingiteasiertodeterminethelocationoftheproblematiccode.Duetothecurrentdistributedsystemandcomplexbusiness,thelackofanylogisahugeobstacleforprogrammerstolocateproblems.Therefore,programmerswhohavesufferedfromproductionproblemsmustlogasmuchaspossibleduringthecodedevelopmentprocess.
Inordertolocateandfixproblemsassoonaspossibleiftheyoccuronlineinthefuture,programmerswilllogasmanykeylogsaspossibleduringtheprogrammingimplementationphase.Aftergoingonline,theproblemcanbequicklylocated,butthentherewillbenewchallenges:withtherapiddevelopmentofthebusiness,uservisitscontinuetoincrease,andthesystempressureisincreasing.Atthistime,therearealargenumberofINFOlogsonline,especiallyinDuringpeakperiods,alargenumberoflogdiskwritesconsumeserviceperformance.
Thenthisbecomesagametheory.Iftherearemorelogs,itwillbeeasiertotroubleshootproblems,butserviceperformancewillbe”eaten”.Iftherearefewerlogs,servicestabilitywillhavenoimpact,buttroubleshootingwillbedifficult,andprogrammerswillsuffer.
Question:WhydoestheperformancesufferiftherearetoomanyINFOlogs(CPUusageisveryhighatthistime)?
Rootcause1:SynchronousprintinglogdiskI/Ohasbecomeabottleneck,resultinginalargenumberofthreadblocks
Itisconceivablethatifthelogsarealloutputtothesamelogfileandmultiplethreadsarewritingtothefile,itwillbechaotic.Thesolutionistoaddlockstoensurethatthelogfileoutputwillnotbeconfused.Ifitisduringpeakperiods,lockcontentionwillundoubtedlyconsumethemostperformance.Whenonethreadgrabsthelock,otherthreadscanonlyblockandwait,whichseriouslydragsdowntheuserthread.Theperformanceisthattheupstreamcalltimesoutandtheuserfeelsstuck.
Thefollowingisthestackwhenthethreadisstuckwritingafile:
StackTraceis: java.lang.Thread.State:BLOCKED(onobjectmonitor) atorg.apache.logging.log4j.core.appender.OutputStreamManager.writeBytes(OutputStreamManager.java:352) -waitingtolock<0x000000063d668298>(aorg.apache.logging.log4j.core.appender.rolling.RollingFileManager) atorg.apache.logging.log4j.core.layout.TextEncoderHelper.writeEncodedText(TextEncoderHelper.java:96) atorg.apache.logging.log4j.core.layout.TextEncoderHelper.encodeText(TextEncoderHelper.java:65) atorg.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:68) atorg.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32) atorg.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:228) .....
SoisitokaytoreducetheINFOlogonline?Similarly,theamountofERRORlogscannotbeunderestimated.Ifalargeamountofabnormaldataappearsonlineoralargenumberofdownstreamtimeoutsoccur,alargenumberofERRORlogswillbegeneratedinstantly.Atthistime,thediskI/Owillstillbefull,causinguserthreadstoblock.
Question:Assumingyoudon’tcareaboutINFOtroubleshooting,istherenoperformanceproblemifyouonlyprintERRORlogsinproduction?
Rootcause2:ThreadBlockcausedbylogprintingexceptionstackunderhighconcurrency
Onceuponatime,alargenumberoftimeoutsoccurredonlineanddownstream,andtheexceptionswereallcaughtbyourservice.Fortunately,thedisasterrecoverydesignanticipatedthatthisproblemwouldoccur,andimplementedabottom-linelogic.Fortunately,therewasnoimpact,buttheserverstarted”Teachpeoplehowtobeagoodperson.”Theonlinemonitoringstartedtoalarm.TheCPUusageincreasedtoofast,andtheCPUincreaseddirectlyto90%+.Atthistime,weurgentlyexpandedthecapacitytostoptheloss,andfoundamachinetopulldownthetrafficandpullthestack.
Aftercheckingthedumpedthreadstack,combinedwiththeflameregressionanalysis,mostoftheready-madethreadsarestuckatthefollowingstacklocations:
StackTraceis: java.lang.Thread.State:BLOCKED(onobjectmonitor) atjava.lang.ClassLoader.loadClass(ClassLoader.java:404) -waitingtolock<0x000000064c514c88>(ajava.lang.Object) atsun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) atjava.lang.ClassLoader.loadClass(ClassLoader.java:357) atorg.apache.logging.log4j.core.impl.ThrowableProxyHelper.loadClass(ThrowableProxyHelper.java:205) atorg.apache.logging.log4j.core.impl.ThrowableProxyHelper.toExtendedStackTrace(ThrowableProxyHelper.java:112) atorg.apache.logging.log4j.core.impl.ThrowableProxy.(ThrowableProxy.java:112) atorg.apache.logging.log4j.core.impl.ThrowableProxy.(ThrowableProxy.java:96) atorg.apache.logging.log4j.core.impl.Log4jLogEvent.getThrownProxy(Log4jLogEvent.java:629) ...
Thestackhereislong,andmostoftheblocksinthesceneareinjava.lang.ClassLoader.loadClass,andwhenyoulookdownthestack,youfindthattheyarealltriggeredbythislineofcode.
atorg.apache.logging.slf4j.Log4jLogger.error(Log4jLogger.java:319) //Thecorrespondingbusinesscodeis log.error("dsfetchergeterror",e);
Ahthis…That’soutrageous.Whywouldyouloadaclasswhenyoulogit?Whydoesloadingaclassblocksomanythreads?
Aftersomereviewandanalysis,wecametothefollowingconclusions:
WhenusingLog4j’sLogger.errortoprinttheexceptionstack,inordertoprintoutthelocationinformationoftheclassesinthestack,youneedtouseClassloaderforclassloading;
Classloaderloadingisthread-safe.Althoughparallelloadingcanimprovetheefficiencyofloadingdifferentclasses,whenmultiplethreadsloadthesameclass,theystillneedtowaitforeachothersynchronously.Especiallywhentheexceptionstacksprintedbydifferentthreadsareexactlythesame,additionalthreadswillbeadded.Blockrisk,andwhentheClassloaderloadsaclassthatcannotbeloaded,theefficiencywilldropsharply,furtherworseningthethreadBlocksituation;
Becauseoftheefficiencyproblemofreflectioncalls,JDKoptimizesreflectioncallsanddynamicallygeneratesJavaclassesformethodcalls,replacingtheoriginalnativecalls.ThegenerateddynamicclassesareloadedbyDelegatingClassLoaderandcannotbeloadedbyotherClassloaders.IntheexceptionstackDynamicclasseswithreflectionoptimizationareverypronetothreadblocksituationsunderhighconcurrencyconditions.
Combinedwiththeabovestack,itisverycleartogetstuckhere:
Theinfluxofalargenumberofthreadscausesdownstreamservicestotimeout,causingthetimeoutexceptionstacktobeprintedfrequently.Eachlayerofthestackneedstoobtainthecorrespondingclass,version,linenumberandotherinformationthroughreflection.LoadClassneedstowaitsynchronously.AthreadLockingcausesmostthreadstoblockandwaitfortheclasstobeloadedsuccessfully,affectingperformance.
Tobefair,evenifmostthreadsarewaitingforathreadtoloadClass,itwillonlybestuckforamoment.WhydoesthiserrorkeeploadingClass?Combiningtheaboveconclusionsandanalyzingtheprogramcode,itisconcludedthattherequestdownstreamservicelogicinthethreadherecontainsGroovyscriptexecutionlogic,whichbelongstodynamicclassgeneration.Thethirdconclusionaboveshowsthatdynamicclassescannotbecorrectlyreflectedandloadedbylog4junderhighconcurrencyconditions.,thenstackreflectionneedstobeusedagain,anditentersaninfiniteloop.Moreandmorethreadscanonlyjoinandwait,blocking.
BestPractices
1.Removeunnecessaryexceptionstackprinting
Forobviousexceptions,don’tprintthestackandsavesomeperformance.Anything+highconcurrencymeansadifferentmeaning:)
try{ System.out.println(Integer.parseInt(number)+100); }catch(Exceptione){ //Beforeimprovement log.error("parseinterror:"+number,e); //Afterimprovement log.error("parseinterror:"+number); }
IfanexceptionoccursinInteger.parseInt,thecauseoftheexceptionmustbethattheincomingandoutgoingnumberisillegal.Inthiscase,printingtheexceptionstackiscompletelyunnecessary,andtheprintingofthestackcanberemoved.
2.Convertthestackinformationintoastringandprintit
publicstaticStringstacktraceToString(Throwablethrowable){ StringWriterstringWriter=newStringWriter(); throwable.printStackTrace(newPrintWriter(stringWriter)); returnstringWriter.toString(); }
Thestackinformationobtainedbylog.errorwillbemorecomplete.TheJDKversion,Classpathinformation,andtheclassesinthejarpackagewillalsoprintthenameandversioninformationofthejar.Theseareallinformationobtainedbyloadingtheclassreflection,whichisgreatlylossperformance.
CallstacktraceToStringtoconverttheexceptionstackintoastring.Relativelyspeaking,itconfirmssomeversionandjarmetadatainformation.Atthistime,youneedtomakeyourowndecisionwhetheritisnecessarytoprintoutthisinformation(forexample,classconflicttroubleshootingisstillbasedonversion).useful).
3.Disablereflectionoptimization
UseLog4jtoprintstackinformation.Ifthereisadynamicproxyclassgeneratedbyreflectionoptimizationinthestack,thisproxyclasscannotbeloadedbyotherClassloaders.Printingthestackatthistimewillseriouslyaffecttheexecutionefficiency.However,disablingreflectionoptimizationwillalsohavesideeffects,resultinginreducedefficiencyofreflectionexecution.
4.Asynchronousprintinglog
Inproductionenvironments,especiallyserviceswithhighQPS,asynchronousprintingmustbeenabled.Ofcourse,ifasynchronousprintingisenabled,thereisapossibilityoflosinglogs,suchastheserverbeingforcibly”killed”,whichisalsoaprocessoftrade-offs.
5.Logoutputformat
Thedifferencebetweenourtheaterlogoutputformats
//Format1 [%d{yyyy/MM/ddHH:mm:ss.SSS}[%X{traceId}]%t[%p]%C{1}(%F:%M:%L)%msg%n //Format2 [%d{yy-MM-dd.HH:mm:ss.SSS}][%thread][%-5p%-22c{0}-]%m%n
Theofficialwebsitealsohasclearperformancecomparisontips.Ifthefollowingfieldsareusedforoutput,itwillgreatlyreduceperformance.
%Cor$class,%For%file,%lor%location,%Lor%line,%Mor%method
Inordertogetthefunctionnameandlinenumberinformation,log4jusestheexceptionmechanism.Itfirstthrowsanexception,thencapturestheexceptionandprintsoutthestackcontentoftheexceptioninformation,andthenparsesthelinenumberfromthestackcontent.Thelockacquisitionandparsingprocessisaddedtotheimplementationsourcecode.Underhighconcurrency,theperformancelosscanbeimagined.
Thefollowingareparameterconfigurationsthataffectperformance.Pleaseconfigurethemasappropriate:
%C-thecaller’sclassname(slow,notrecommended)
%F-thefilenameofthecaller(extremelyslow,notrecommended)
%l-thecaller’sfunctionname,filename,andlinenumber(extremelynotrecommended,veryperformanceconsuming)
%L-thelinenumberofthecaller(extremelyslow,notrecommended)
%M-thecaller’sfunctionname(extremelyslow,notrecommended)
Solution-Dynamicadjustmentofloglevel
TheprojectcodeneedstoprintalargenumberofINFOlevellogstosupportproblemlocationandtesttroubleshooting.However,theselargeamountsofINFOlogsareineffectiveintheproductionenvironment.AlargenumberoflogswilleatupCPUperformance.Atthistime,theloglevelneedstobedynamicallyadjustedsothattheINFOlogscanbeviewedatanytimeandcloseddynamicallywhennotneeded.Impactserviceperformanceneeds.
Solution:CombineApolloandlog4j2featurestodynamicallyandfine-grainedcontroltheloglevelgloballyorwithinasingleClassfilefromtheAPIlevel.Theadvantageisthatittakeseffectatanytime.Forproductiontroubleshooting,youcanspecifythelogleveltoopenasingleclassfile,anditcanbeclosedatanytimeaftertroubleshooting.
Duetothelengthofthisarticle,Iwillnotpostthespecificimplementationcode.Infact,theimplementationisverysimple,whichistocleverlyuseApollo’sdynamicnotificationmechanismtoresettheloglevel.Ifyouareinterested,youcansendmeaprivatemessageorleaveamessage,andIwillwriteanarticle.Thisarticleisdedicatedtoexplainingindetailhowtoachievethis.
SummaryandOutlook
Thisarticletakesyouthroughthecommonproblemsoflogsindailysoftwareservicesandthecorrespondingsolutions.Remember,simplethings+highconcurrency=notsimple!Berespectfulofproduction!
Finally,Iwouldliketothankeveryonewhoreadmyarticlecarefully.Reciprocityisalwaysnecessary.Althoughitisnotaveryvaluablething,ifyoucanuseit,youcantakeitdirectly:
Thisinformationshouldbethemostcomprehensiveandcompletepreparationwarehousefor[softwaretesting]friends.Thiswarehousehasalsoaccompaniedtensofthousandsoftestengineersthroughthemostdifficultjourney.Ihopeitcanalsohelpyou!
Theknowledgepointsofthearticlematchtheofficialknowledgefiles,andyoucanfurtherlearnrelatedknowledge.JavaSkillTreeHomepageOverview133681peoplearelearningthesystem