Actual combat! Use Arthas to locate the timeout problem of the Spring Boot interface and let the application take off ~

Helloeveryone,IamBucaiChen~

Background

Thecompanyhasachannelsystemthatisspecificallyusedtointerfacewiththird-partychannels.Ithasnobusinesslogicandmainlydoesworksuchasmessageconversionandparameterverification,whichservesasalinkbetweenthepreviousandthefollowing.

Recently,Ihavebeenoptimizingtheresponsetimeoftheinterfaceandoptimizingthecode,butthetimestilldoesnotmeettherequirements;thereisastrangetime-consumingproblemofabout100ms.Aftertherequestprocessingtimeisprintedintheinterface,itisstilldifferentfromthecaller’sresponsetime.Thedifferenceisabout100ms.Forexample,theprogramrecords150ms,butthecaller’swaitingtimeisabout250ms.

Thedetailedpositioning&solutionprocessatthattimeisrecordedbelow(infact,thesolutionisverysimple,thekeyliesinhowtopositionandfindasolutiontotheproblem)

Locationprocess

Analyzingcode

Thechannelsystemisacommonspring-bootwebproject,usingintegratedtomcat.Afteranalyzingthecode,Ifoundthattherewasnothingspecial,nospecialfiltersorinterceptors,soIinitiallyruledoutthatitwasabusinesscodeproblem.

Analyzethecallingprocess

Afterthisproblemoccurred,wefirstconfirmedthecallingprocessoftheinterface.Sinceitisaninternaltest,therearefewercallprocesses.

Nginx-reverseproxy->channelsystem

Thecompanyisacloudserver,andthenetworkisalsothecloud’sintranet.Sincethecauseoftheproblemisnotclear,weusethetroubleshootingmethodtofirstconfirmwhetherthereisaproblemwiththeservernetwork.

FirstconfirmwhetherthereisaproblemwiththesendingendtoNginxHost:

[jboss@VM_0_139_centos~]$ping10.0.0.139
PING10.0.0.139(10.0.0.139)56(84)bytesofdata.
64bytesfrom10.0.0.139:icmp_seq=1ttl=64time=0.029ms
64bytesfrom10.0.0.139:icmp_seq=2ttl=64time=0.041ms
64bytesfrom10.0.0.139:icmp_seq=3ttl=64time=0.040ms
64bytesfrom10.0.0.139:icmp_seq=4ttl=64time=0.040ms

Judgingfromthepingresults,thereisnoproblemwiththedelayfromthesendingendtotheNginxhost.Next,checkthenetworkfromNginxtothechannelsystem.

#Sincethelogisok,Iwillcopytheabovelogdirectlyhere.
[jboss@VM_0_139_centos~]$ping10.0.0.139
PING10.0.0.139(10.0.0.139)56(84)bytesofdata.
64bytesfrom10.0.0.139:icmp_seq=1ttl=64time=0.029ms
64bytesfrom10.0.0.139:icmp_seq=2ttl=64time=0.041ms
64bytesfrom10.0.0.139:icmp_seq=3ttl=64time=0.040ms
64bytesfrom10.0.0.139:icmp_seq=4ttl=64time=0.040ms

Judgingfromthepingresults,thereisnoproblemwiththenetworkdelayfromNginxtothechannelsystemserver.

Sincethenetworkseemstobefine,youcancontinuethetroubleshootingmethod,cutoffNginx,andconnecttheclientdirectlytotheserverofthechannelsystemthroughtheloopbackaddress(localhost)toavoidgoingthroughthenetworkcard/dns,narrowthescopeoftheproblemandseeifitcanberepaired.Now(thisapplicationandaddressweresimulatedbymelater,andthetestwasanemptyinterface):

[jboss@VM_10_91_centostmp]$curl-w"@curl-time.txt"http://127.0.0.1:7744/send
success
http:200
dns:0.001s
redirect:0.000s
time_connect:0.001s
time_appconnect:0.000s
time_pretransfer:0.001s
time_starttransfer:0.073s
size_download:7bytes
speed_download:95.000B/s
----------
time_total:0.073stotalrequesttime

Judgingfromthecurllog,callinganemptyinterfacethroughtheloopbackaddressalsotakes73ms.Thisisstrange.Itskipsalltheintermediatecallingnodes(includingfilters&interceptors,etc.)anddirectlyrequeststheapplicationofanemptyinterface,whichtakes73ms.Pleaserequestagainandsee:

[jboss@VM_10_91_centostmp]$curl-w"@curl-time.txt"http://127.0.0.1:7744/send
success
http:200
dns:0.001s
redirect:0.000s
time_connect:0.001s
time_appconnect:0.000s
time_pretransfer:0.001s
time_starttransfer:0.003s
size_download:7bytes
speed_download:2611.000B/s
----------
time_total:0.003s

What’sevenweirderisthatthesecondrequesttimeisnormalandbecomes3ms.Afterconsultingtheinformation,Linuxcurlturnsonhttpkeep-alivebydefault.Evenifkeep-aliveisnotturnedon,itwillnottake70mseachtimetorehandshake.

Aftercontinuousanalysisandtesting,wefoundthatthetimeforcontinuousrequestswillbeveryshort,eachrequestonlytakesafewmilliseconds,butifyourequestagainafteraperiodoftime,itwilltakemorethan70ms.

Itisspeculatedfromthisphenomenonthatitmaybecausedbysomecachingmechanism.Continuousrequestsarefastbecauseofthecache.However,ittakesalongtimeafterthecachefails.

Sowheredoesthisproblemlie?Tomcatlayerorspring-webmvc?

Youcan’tlocatetheproblembyjustguessing,youstillhavetoactuallytestit.PutthecodeofthechannelsystemintothelocalIDEandstartthetesttoseeifitcanbereproduced.

However,afterimportingintothelocalIde,theproblemcannotbereproducedafterstartingintheIde,andthereisno70+msdelayproblem.NowIhaveaheadache.Itcannotbereproducedlocallyandcannotbedebugged.Sincetheproblemisnotinthebusinesscode,itcannotbedebuggedbyaddinglogs.

Atthistime,youcansacrificetheartifactArthas.

Arthasanalysisproblem

ArthasisAlibaba’sopensourceJavadiagnostictool,whichisverypopularamongdevelopers.Whenyouencounterthefollowingsimilarproblemsandareataloss,Arthascanhelpyousolvethem:

  • Whichjarpackageisthisclassloadedfrom?Whyarevariousclass-relatedExceptionsreported?

  • WhyisthecodeIchangednotexecuted?CoulditbethatIdidn’tcommit?Wrongbranch?

  • Ifyouencounteraproblemandcannotdebugitonline,canyouonlyaddlogsandre-releaseit?

  • Thereisaproblemwithauser’sdataprocessingonline,butitcannotbedebuggedonlineandcannotbereproducedoffline!

  • Isthereaglobalviewofhowthesystemisperforming?

  • Isthereanywaytomonitorthereal-timerunningstatusoftheJVM?

TheaboveistheofficialintroductionofArthas.ThistimeIonlyneedtouseoneofhissmallfunctionstrace.DynamicallycalculatemethodcallpathsandtimessoIcanlocatewheretimeisbeingspent.

  • Thetracemethodcallsthepathinternallyandoutputsthetimespentoneachnodeonthemethodpath.

  • Thetracecommandcanactivelysearchforclass-pattern/method-pattern

  • Correspondingmethodcallpath,renderandcountallperformanceoverheadontheentirecalllinkandtracethecalllink.

Nowthatwehavetheartifact,whatmethodshouldweusetotrackit?SinceIamnotveryfamiliarwiththeTomcatsourcecode,Icanonlystartwithspringmvc.First,let’stracetheentranceofspringmvc:

~

[jboss@VM_10_91_centostmp]$curl-w"@curl-time.txt"http://127.0.0.1:7744/send
success
http:200
dns:0.001s
redirect:0.000s
time_connect:0.001s
time_appconnect:0.000s
time_pretransfer:0.001s
time_starttransfer:0.115s
size_download:7bytes
speed_download:60.000B/s
----------
time_total:0.115s

Thiscalltook115msonthecallingside,butfromthearthastrace,springmvconlyconsumed18ms,sowheredidtheremaining97msgo?

Afterlocaltesting,theproblemofspringmvccanbeeliminated.Thelastandonlypossibleproblemistomcat.

However,Iamnotfamiliarwiththesourcecodeintomcat,andIdon’tevenknowtherequestentry.Itisdifficulttofindtheclassesthatrequiretraceintomcat…

Butitdoesn’tmatter.WiththeartifactArthas,youcanusethestackcommandtoreverselysearchthecallingpath,withorg.springframework.web.servlet.DispatcherServletastheparameter:

stackoutputsthecallingpathofthecurrentmethodbeingcalled

Manytimesweknowthatamethodisexecuted,buttherearemanypathsforthismethodtobeexecuted,oryousimplydon’tknowwherethismethodisexecuted.Atthistime,whatyouneedisthestackcommand.

Fromthestacklog,wecanintuitivelyseethecallstackofDispatchServlet.Sowithsuchalongpath,whichclassshouldbetraced(thetraceprocessofthefilterinspringmvcisskippedhere,andthetraceprocessisalsotracedduringtheactualtroubleshooting,butThisweirdtimeconsumptionisnotcausedbythisfilter)?

Veterandriverswithcertainexperiencecanprobablyguesswheretostartfromthename,thatisorg.apache.coyote.http11.Http11Processor.service.Judgingfromthename,http1.1processingdevice,thismaybeabetterentrypoint.Let’straceit:

Thereisatime-consumingpointof129msinthelog(thetimeislongerthanwhenarthasisnotturnedonbecauseoftheperformanceconsumptioncausedbyarthasitself,souseitwithcautionintheproductionenvironment).Thisistheproblempointtobefound.

Nowthattheproblempointhasbeenfound,howdoyoulocatewhatcausedtheproblemandhowdoyousolveit?

Let’scontinuetracinganddrilldowntospecificcodeblocksorcontent.Duetoperformanceconsiderations,tracewillnotdisplayallcallpaths.Ifthecallpathistoodeep,youcanonlymanuallygointothetrace.Theprincipleistotracethemethodthattakesthelongesttime:

Afteraboringmanualin-depthtrace…………

Foundapointworthpausingtothinkabout:

+---[min=0.004452ms,max=34.479307ms,total=74.206249ms,count=31]org.apache.catalina.webresources.TomcatJarInputStream:getNextJarEntry()#117

Thislineofcodeisloaded31times,takingatotalof74ms.Judgingfromthename,itshouldbethetimeittakeswhentomcatloadsthejarpackage,soisitthetimeittakestoload31jarpackages,orisitthetimeittakestoloadsomeofthejarpackages?Howlongdoesittakeforresourcestobeused31times?

ThecommentinthesourcecodeoftheTomcatJarInputStreamclassreads:

Thepurposeofthissub-classistoobtainreferencestotheJarEntryobjectsforMETA-INF/andMETA-INF/MANIFEST.MFthatareotherwiseswallowedbytheJarInputStreamimplementation.

ThegeneralmeaningistoobtaintheresourcesofMETA-INF/andMETA-INF/MANIFESTinthejarpackage.Thisisasubclass,andmorefunctionsareintheparentclassJarInputStream.

Infact,youcanprobablyguesstheproblemafterseeingthis.TomcatloadstheMETA-INF/,META-INF/MANIFESTresourcesinthejarpackage,whichcausestime-consuming.Asforwhycontinuousrequestsdonottaketime,itshouldbetomcat’scachingmechanism(Sourcecodeanalysisisintroducedbelow)

Don’trushtolocatetheproblem,trytofinallylocatethedetailsoftheproblemthroughArthas,andcontinuetomanuallyin-depthtrace

[arthas@24851]$traceorg.apache.catalina.webresources.TomcatJarInputStream*
PressQorCtrl+Ctoabort.
Affect(class-cnt:1,method-cnt:4)costin44ms.
`---ts=2019-09-1421:37:47;thread_name=http-nio-7744-exec-5;id=14;is_daemon=true;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@20ad9418
`---[0.234952ms]org.apache.catalina.webresources.TomcatJarInputStream:createZipEntry()
+---[0.039455ms]java.util.jar.JarInputStream:createZipEntry()#43
`---[0.007827ms]java.lang.String:equals()#44

`---ts=2019-09-1421:37:47;thread_name=http-nio-7744-exec-5;id=14;is_daemon=true;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@20ad9418
`---[0.050222ms]org.apache.catalina.webresources.TomcatJarInputStream:createZipEntry()
+---[0.001889ms]java.util.jar.JarInputStream:createZipEntry()#43
`---[0.001643ms]java.lang.String:equals()#46
#Thereareatotalof31tracelogshere,andtheresthavebeendeleted

Judgingfromthemethodname,itstillmeansloadingresources.Wehavealreadyreachedthejdksourcecode.Nowlet’stakealookatthesourcecodeoftheTomcatJarInputStreamclass:

/**
*Createsanew<code>JarEntry</code>(<code>ZipEntry</code>)forthe
*specifiedJARfileentryname.Themanifestattributesof
*thespecifiedJARfileentrynamewillbecopiedtothenew
*<CODE>JarEntry</CODE>.
*
*@paramnamethenameoftheJAR/ZIPfileentry
*@returnthe<code>JarEntry</code>objectjustcreated
*/
protectedZipEntrycreateZipEntry(Stringname){
JarEntrye=newJarEntry(name);
if(man!=null){
e.attr=man.getAttributes(name);
}
returne;
}

ThiscreateZipEntryhasanameparameter.Judgingfromthecomments,itisthejar/zipfilename.Ifyoucangetkeyinformationsuchasthefilename,youcandirectlylocatetheproblem;youcanstilluseArthasandusethewatchcommand.,dynamicmonitoringmethodcalldata

Thewatchmethodperformsdataobservation

Allowsyoutoeasilyobservethecallingstatusofthespecifiedmethod.Thescopethatcanbeobservedis:returnvalue,thrownexception,inputparameters,andthecorrespondingvariablescanbeviewedbywritingOGNLexpressions.

watchtheinputparametersofthismethod

NowIdirectlyseethespecificloadedresourcename,suchafamiliarname:swagger-ui,aforeignRESTinterfacedocumentationtool,andadomesticdeveloperhasmadeasetofspringmvcintegrationtoolsbasedonswagger-ui,whichcanbeusedthroughannotations.Itcanautomaticallygeneratetheinterfacedefinitionjsonfilerequiredbyswagger-ui.Itisrelativelyconvenienttouse,butitishighlyintrusive.

Afterdeletingthejarpackageofswagger,theproblemdisappearedaftertheweird70+ms.

<!--Deletethesetworeferencesinthepom.Thesetwopackagesarepackagedbydomesticdevelopers.swagger-uidoesnotprovideasupportpackageforjavaspring-mvc.swaggerisjustabrowser-sideui+editor-->
<dependency>
<groupId>io.springfox</groupId>
<artifactId>springfox-swagger2</artifactId>
<version>2.9.2</version>
</dependency>
<dependency>
<groupId>io.springfox</groupId>
<artifactId>springfox-swagger-ui</artifactId>
<version>2.9.2</version>
</dependency>

Sowhydoesswaggercausetherequesttobetime-consuming?Whydoesthestaticresourceinsideswaggerloadeverytimeitisrequested?

Infact,thisisabugoftomcat-embed.Let’sintroducethebugindetailbelow.

TomcatembedBugAnalysis&Solution

Thesourcecodeanalysisprocessistoolongandisnotthefocusofthisarticle,soIwillnotintroduceit.Let’sdirectlyintroducetheanalysisresults.

Bytheway,postacoreclassdiagramoftomcatprocessingrequests.

WhydoeseachrequestloadthestaticresourcesintheJarpackage

Thekeyliesintheorg.apache.catalina.mapper.Mapper#internalMapWrappermethod.Thereisaproblemwiththewayrequestsareprocessedinthisversion,causingstaticresourcestobeverifiedeverytime.

Whythereisnoproblemwithcontinuousrequests

BecauseTomcathasacacheforparsingsuchstaticresources,itwillfirstsearchfromthecache,andthenparseagainafterthecacheexpires.Pleaserefertoorg.apache.catalina.webresources.Cachefordetails.Thedefaultexpirationtimettlis5000ms.

Whydoesitnotreappearlocally

Infact,tobeprecise,itcannotbereproducedafterpackagingtheplug-inthroughspring-boot.Duetodifferentstartupmethods,tomcatusesdifferentclassestohandlestaticresources,sothereisnoproblem

Howtosolve

Justupgradethetomcat-embedversion

Theversionwherethebugcurrentlyappearsis:

spring-boot:2.0.2.RELEASE,thebuilt-intomcatembedversionis8.5.31

Thisproblemcanbesolvedbyupgradingtomcatembedversionto8.5.40+.Thenewversionhasbeenfixed.

Byreplacingspringbootpomproperties

Iftheprojectismaven,itinheritsspringboot,thatis,theparentisconfiguredasspringboot,orthespringbootpackageisimportedindependencyManagement.

<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.0.2.RELEASE</version>
<relativePath/><!--lookupparentfromrepository-->
</parent>

Justoverwritethepropertiesdirectlyinthepom:

<properties>
<tomcat.version>8.5.40</tomcat.version>
</properties>

Upgradespringbootversion

Thetomcatembedversioninspringboot2.1.0.RELEASEisalreadygreaterthan8.5.31,sodirectlyupgradingspringboottothisversionandabovecansolvethisproblem.

Onefinalword(don’tprostitutefornothing,pleasepayattention)

EveryarticlewrittenbyCheniscarefullywritten.Ifthisarticleishelpfulorinspiringtoyou,pleaselike,read,repost,andcollectit.Yoursupportismybiggestmotivationtopersevere!

Inaddition,Chen’sKnowledgePlanethasbeenopened.Itonlycosts199yuantojoin.Thevalueofplanetfeedbackishuge.Currently,ithasupdatedtheCodeYuanchronicdiseasecloudmanagementpracticalproject,theSpringfamilybucketpracticalseries,thebillion-leveldatasub-databaseandtablepractical,andtheDDDmicroServicepracticecolumn,Iwanttojoinabigfactory,Spring,Mybatisandotherframeworksourcecodes,22lecturesonarchitecturepractice,RocketMQ,etc…

Moreintroduction

Ifyouneedtojointheplanet,addChen’sWeChataccount:special_coder

Theknowledgepointsofthearticlematchtheofficialknowledgefiles,andyoucanfurtherlearnrelatedknowledge.JavaSkillTreeHomepageOverview137233peoplearelearningthesystem