Helloeveryone,IamBucaiChen~
Background
Thecompanyhasachannelsystemthatisspecificallyusedtointerfacewiththird-partychannels.Ithasnobusinesslogicandmainlydoesworksuchasmessageconversionandparameterverification,whichservesasalinkbetweenthepreviousandthefollowing.
Recently,Ihavebeenoptimizingtheresponsetimeoftheinterfaceandoptimizingthecode,butthetimestilldoesnotmeettherequirements;thereisastrangetime-consumingproblemofabout100ms.Aftertherequestprocessingtimeisprintedintheinterface,itisstilldifferentfromthecaller’sresponsetime.Thedifferenceisabout100ms.Forexample,theprogramrecords150ms,butthecaller’swaitingtimeisabout250ms.
Thedetailedpositioning&solutionprocessatthattimeisrecordedbelow(infact,thesolutionisverysimple,thekeyliesinhowtopositionandfindasolutiontotheproblem)
Locationprocess
Analyzingcode
Thechannelsystemisacommonspring-bootwebproject,usingintegratedtomcat.Afteranalyzingthecode,Ifoundthattherewasnothingspecial,nospecialfiltersorinterceptors,soIinitiallyruledoutthatitwasabusinesscodeproblem.
Analyzethecallingprocess
Afterthisproblemoccurred,wefirstconfirmedthecallingprocessoftheinterface.Sinceitisaninternaltest,therearefewercallprocesses.
Nginx-reverseproxy->channelsystem
Thecompanyisacloudserver,andthenetworkisalsothecloud’sintranet.Sincethecauseoftheproblemisnotclear,weusethetroubleshootingmethodtofirstconfirmwhetherthereisaproblemwiththeservernetwork.
FirstconfirmwhetherthereisaproblemwiththesendingendtoNginxHost:
[jboss@VM_0_139_centos~]$ping10.0.0.139 PING10.0.0.139(10.0.0.139)56(84)bytesofdata. 64bytesfrom10.0.0.139:icmp_seq=1ttl=64time=0.029ms 64bytesfrom10.0.0.139:icmp_seq=2ttl=64time=0.041ms 64bytesfrom10.0.0.139:icmp_seq=3ttl=64time=0.040ms 64bytesfrom10.0.0.139:icmp_seq=4ttl=64time=0.040ms
Judgingfromthepingresults,thereisnoproblemwiththedelayfromthesendingendtotheNginxhost.Next,checkthenetworkfromNginxtothechannelsystem.
#Sincethelogisok,Iwillcopytheabovelogdirectlyhere. [jboss@VM_0_139_centos~]$ping10.0.0.139 PING10.0.0.139(10.0.0.139)56(84)bytesofdata. 64bytesfrom10.0.0.139:icmp_seq=1ttl=64time=0.029ms 64bytesfrom10.0.0.139:icmp_seq=2ttl=64time=0.041ms 64bytesfrom10.0.0.139:icmp_seq=3ttl=64time=0.040ms 64bytesfrom10.0.0.139:icmp_seq=4ttl=64time=0.040ms
Judgingfromthepingresults,thereisnoproblemwiththenetworkdelayfromNginxtothechannelsystemserver.
Sincethenetworkseemstobefine,youcancontinuethetroubleshootingmethod,cutoffNginx,andconnecttheclientdirectlytotheserverofthechannelsystemthroughtheloopbackaddress(localhost)toavoidgoingthroughthenetworkcard/dns,narrowthescopeoftheproblemandseeifitcanberepaired.Now(thisapplicationandaddressweresimulatedbymelater,andthetestwasanemptyinterface):
[jboss@VM_10_91_centostmp]$curl-w"@curl-time.txt"http://127.0.0.1:7744/send success http:200 dns:0.001s redirect:0.000s time_connect:0.001s time_appconnect:0.000s time_pretransfer:0.001s time_starttransfer:0.073s size_download:7bytes speed_download:95.000B/s ---------- time_total:0.073stotalrequesttime
Judgingfromthecurllog,callinganemptyinterfacethroughtheloopbackaddressalsotakes73ms.Thisisstrange.Itskipsalltheintermediatecallingnodes(includingfilters&interceptors,etc.)anddirectlyrequeststheapplicationofanemptyinterface,whichtakes73ms.Pleaserequestagainandsee:
[jboss@VM_10_91_centostmp]$curl-w"@curl-time.txt"http://127.0.0.1:7744/send success http:200 dns:0.001s redirect:0.000s time_connect:0.001s time_appconnect:0.000s time_pretransfer:0.001s time_starttransfer:0.003s size_download:7bytes speed_download:2611.000B/s ---------- time_total:0.003s
What’sevenweirderisthatthesecondrequesttimeisnormalandbecomes3ms.Afterconsultingtheinformation,Linuxcurlturnsonhttpkeep-alivebydefault.Evenifkeep-aliveisnotturnedon,itwillnottake70mseachtimetorehandshake.
Aftercontinuousanalysisandtesting,wefoundthatthetimeforcontinuousrequestswillbeveryshort,eachrequestonlytakesafewmilliseconds,butifyourequestagainafteraperiodoftime,itwilltakemorethan70ms.
Itisspeculatedfromthisphenomenonthatitmaybecausedbysomecachingmechanism.Continuousrequestsarefastbecauseofthecache.However,ittakesalongtimeafterthecachefails.
Sowheredoesthisproblemlie?Tomcatlayerorspring-webmvc?
Youcan’tlocatetheproblembyjustguessing,youstillhavetoactuallytestit.PutthecodeofthechannelsystemintothelocalIDEandstartthetesttoseeifitcanbereproduced.
However,afterimportingintothelocalIde,theproblemcannotbereproducedafterstartingintheIde,andthereisno70+msdelayproblem.NowIhaveaheadache.Itcannotbereproducedlocallyandcannotbedebugged.Sincetheproblemisnotinthebusinesscode,itcannotbedebuggedbyaddinglogs.
Atthistime,youcansacrificetheartifactArthas.
Arthasanalysisproblem
ArthasisAlibaba’sopensourceJavadiagnostictool,whichisverypopularamongdevelopers.Whenyouencounterthefollowingsimilarproblemsandareataloss,Arthascanhelpyousolvethem:
-
Whichjarpackageisthisclassloadedfrom?Whyarevariousclass-relatedExceptionsreported?
-
WhyisthecodeIchangednotexecuted?CoulditbethatIdidn’tcommit?Wrongbranch?
-
Ifyouencounteraproblemandcannotdebugitonline,canyouonlyaddlogsandre-releaseit?
-
Thereisaproblemwithauser’sdataprocessingonline,butitcannotbedebuggedonlineandcannotbereproducedoffline!
-
Isthereaglobalviewofhowthesystemisperforming?
-
Isthereanywaytomonitorthereal-timerunningstatusoftheJVM?
TheaboveistheofficialintroductionofArthas.ThistimeIonlyneedtouseoneofhissmallfunctionstrace.DynamicallycalculatemethodcallpathsandtimessoIcanlocatewheretimeisbeingspent.
-
Thetracemethodcallsthepathinternallyandoutputsthetimespentoneachnodeonthemethodpath.
-
Thetracecommandcanactivelysearchforclass-pattern/method-pattern
-
Correspondingmethodcallpath,renderandcountallperformanceoverheadontheentirecalllinkandtracethecalllink.
Nowthatwehavetheartifact,whatmethodshouldweusetotrackit?SinceIamnotveryfamiliarwiththeTomcatsourcecode,Icanonlystartwithspringmvc.First,let’stracetheentranceofspringmvc:
~
[jboss@VM_10_91_centostmp]$curl-w"@curl-time.txt"http://127.0.0.1:7744/send success http:200 dns:0.001s redirect:0.000s time_connect:0.001s time_appconnect:0.000s time_pretransfer:0.001s time_starttransfer:0.115s size_download:7bytes speed_download:60.000B/s ---------- time_total:0.115s
Thiscalltook115msonthecallingside,butfromthearthastrace,springmvconlyconsumed18ms,sowheredidtheremaining97msgo?
Afterlocaltesting,theproblemofspringmvccanbeeliminated.Thelastandonlypossibleproblemistomcat.
However,Iamnotfamiliarwiththesourcecodeintomcat,andIdon’tevenknowtherequestentry.Itisdifficulttofindtheclassesthatrequiretraceintomcat…
Butitdoesn’tmatter.WiththeartifactArthas,youcanusethestackcommandtoreverselysearchthecallingpath,withorg.springframework.web.servlet.DispatcherServlet
astheparameter:
”
stackoutputsthecallingpathofthecurrentmethodbeingcalled
Manytimesweknowthatamethodisexecuted,buttherearemanypathsforthismethodtobeexecuted,oryousimplydon’tknowwherethismethodisexecuted.Atthistime,whatyouneedisthestackcommand.
”
Fromthestacklog,wecanintuitivelyseethecallstackofDispatchServlet.Sowithsuchalongpath,whichclassshouldbetraced(thetraceprocessofthefilterinspringmvcisskippedhere,andthetraceprocessisalsotracedduringtheactualtroubleshooting,butThisweirdtimeconsumptionisnotcausedbythisfilter)?
Veterandriverswithcertainexperiencecanprobablyguesswheretostartfromthename,thatisorg.apache.coyote.http11.Http11Processor.service
.Judgingfromthename,http1.1processingdevice,thismaybeabetterentrypoint.Let’straceit:
Thereisatime-consumingpointof129msinthelog(thetimeislongerthanwhenarthasisnotturnedonbecauseoftheperformanceconsumptioncausedbyarthasitself,souseitwithcautionintheproductionenvironment).Thisistheproblempointtobefound.
Nowthattheproblempointhasbeenfound,howdoyoulocatewhatcausedtheproblemandhowdoyousolveit?
Let’scontinuetracinganddrilldowntospecificcodeblocksorcontent.Duetoperformanceconsiderations,tracewillnotdisplayallcallpaths.Ifthecallpathistoodeep,youcanonlymanuallygointothetrace.Theprincipleistotracethemethodthattakesthelongesttime:
Afteraboringmanualin-depthtrace…………
Foundapointworthpausingtothinkabout:
+---[min=0.004452ms,max=34.479307ms,total=74.206249ms,count=31]org.apache.catalina.webresources.TomcatJarInputStream:getNextJarEntry()#117
Thislineofcodeisloaded31times,takingatotalof74ms.Judgingfromthename,itshouldbethetimeittakeswhentomcatloadsthejarpackage,soisitthetimeittakestoload31jarpackages,orisitthetimeittakestoloadsomeofthejarpackages?Howlongdoesittakeforresourcestobeused31times?
ThecommentinthesourcecodeoftheTomcatJarInputStreamclassreads:
Thepurposeofthissub-classistoobtainreferencestotheJarEntryobjectsforMETA-INF/andMETA-INF/MANIFEST.MFthatareotherwiseswallowedbytheJarInputStreamimplementation.
ThegeneralmeaningistoobtaintheresourcesofMETA-INF/andMETA-INF/MANIFESTinthejarpackage.Thisisasubclass,andmorefunctionsareintheparentclassJarInputStream.
Infact,youcanprobablyguesstheproblemafterseeingthis.TomcatloadstheMETA-INF/,META-INF/MANIFESTresourcesinthejarpackage,whichcausestime-consuming.Asforwhycontinuousrequestsdonottaketime,itshouldbetomcat’scachingmechanism(Sourcecodeanalysisisintroducedbelow)
Don’trushtolocatetheproblem,trytofinallylocatethedetailsoftheproblemthroughArthas,andcontinuetomanuallyin-depthtrace
[arthas@24851]$traceorg.apache.catalina.webresources.TomcatJarInputStream* PressQorCtrl+Ctoabort. Affect(class-cnt:1,method-cnt:4)costin44ms. `---ts=2019-09-1421:37:47;thread_name=http-nio-7744-exec-5;id=14;is_daemon=true;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@20ad9418 `---[0.234952ms]org.apache.catalina.webresources.TomcatJarInputStream:createZipEntry() +---[0.039455ms]java.util.jar.JarInputStream:createZipEntry()#43 `---[0.007827ms]java.lang.String:equals()#44 `---ts=2019-09-1421:37:47;thread_name=http-nio-7744-exec-5;id=14;is_daemon=true;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@20ad9418 `---[0.050222ms]org.apache.catalina.webresources.TomcatJarInputStream:createZipEntry() +---[0.001889ms]java.util.jar.JarInputStream:createZipEntry()#43 `---[0.001643ms]java.lang.String:equals()#46 #Thereareatotalof31tracelogshere,andtheresthavebeendeleted
Judgingfromthemethodname,itstillmeansloadingresources.Wehavealreadyreachedthejdksourcecode.Nowlet’stakealookatthesourcecodeoftheTomcatJarInputStream
class:
/** *Createsanew<code>JarEntry</code>(<code>ZipEntry</code>)forthe *specifiedJARfileentryname.Themanifestattributesof *thespecifiedJARfileentrynamewillbecopiedtothenew *<CODE>JarEntry</CODE>. * *@paramnamethenameoftheJAR/ZIPfileentry *@returnthe<code>JarEntry</code>objectjustcreated */ protectedZipEntrycreateZipEntry(Stringname){ JarEntrye=newJarEntry(name); if(man!=null){ e.attr=man.getAttributes(name); } returne; }
ThiscreateZipEntry
hasanameparameter.Judgingfromthecomments,itisthejar/zipfilename.Ifyoucangetkeyinformationsuchasthefilename,youcandirectlylocatetheproblem;youcanstilluseArthasandusethewatchcommand.,dynamicmonitoringmethodcalldata
Thewatchmethodperformsdataobservation
”
Allowsyoutoeasilyobservethecallingstatusofthespecifiedmethod.Thescopethatcanbeobservedis:returnvalue,thrownexception,inputparameters,andthecorrespondingvariablescanbeviewedbywritingOGNLexpressions.
”
watchtheinputparametersofthismethod
NowIdirectlyseethespecificloadedresourcename,suchafamiliarname:swagger-ui,aforeignRESTinterfacedocumentationtool,andadomesticdeveloperhasmadeasetofspringmvcintegrationtoolsbasedonswagger-ui,whichcanbeusedthroughannotations.Itcanautomaticallygeneratetheinterfacedefinitionjsonfilerequiredbyswagger-ui.Itisrelativelyconvenienttouse,butitishighlyintrusive.
Afterdeletingthejarpackageofswagger,theproblemdisappearedaftertheweird70+ms.
<!--Deletethesetworeferencesinthepom.Thesetwopackagesarepackagedbydomesticdevelopers.swagger-uidoesnotprovideasupportpackageforjavaspring-mvc.swaggerisjustabrowser-sideui+editor--> <dependency> <groupId>io.springfox</groupId> <artifactId>springfox-swagger2</artifactId> <version>2.9.2</version> </dependency> <dependency> <groupId>io.springfox</groupId> <artifactId>springfox-swagger-ui</artifactId> <version>2.9.2</version> </dependency>
Sowhydoesswaggercausetherequesttobetime-consuming?Whydoesthestaticresourceinsideswaggerloadeverytimeitisrequested?
Infact,thisisabugoftomcat-embed.Let’sintroducethebugindetailbelow.
TomcatembedBugAnalysis&Solution
Thesourcecodeanalysisprocessistoolongandisnotthefocusofthisarticle,soIwillnotintroduceit.Let’sdirectlyintroducetheanalysisresults.
Bytheway,postacoreclassdiagramoftomcatprocessingrequests.
WhydoeseachrequestloadthestaticresourcesintheJarpackage
Thekeyliesintheorg.apache.catalina.mapper.Mapper#internalMapWrapper
method.Thereisaproblemwiththewayrequestsareprocessedinthisversion,causingstaticresourcestobeverifiedeverytime.
Whythereisnoproblemwithcontinuousrequests
BecauseTomcathasacacheforparsingsuchstaticresources,itwillfirstsearchfromthecache,andthenparseagainafterthecacheexpires.Pleaserefertoorg.apache.catalina.webresources.Cache
fordetails.Thedefaultexpirationtimettlis5000ms.
Whydoesitnotreappearlocally
Infact,tobeprecise,itcannotbereproducedafterpackagingtheplug-inthroughspring-boot.Duetodifferentstartupmethods,tomcatusesdifferentclassestohandlestaticresources,sothereisnoproblem
Howtosolve
Justupgradethetomcat-embedversion
Theversionwherethebugcurrentlyappearsis:
spring-boot:2.0.2.RELEASE,thebuilt-intomcatembedversionis8.5.31
Thisproblemcanbesolvedbyupgradingtomcatembedversionto8.5.40+.Thenewversionhasbeenfixed.
Byreplacingspringbootpomproperties
Iftheprojectismaven,itinheritsspringboot,thatis,theparentisconfiguredasspringboot,orthespringbootpackageisimportedindependencyManagement.
<parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>2.0.2.RELEASE</version> <relativePath/><!--lookupparentfromrepository--> </parent>
Justoverwritethepropertiesdirectlyinthepom:
<properties> <tomcat.version>8.5.40</tomcat.version> </properties>
Upgradespringbootversion
Thetomcatembedversioninspringboot2.1.0.RELEASEisalreadygreaterthan8.5.31,sodirectlyupgradingspringboottothisversionandabovecansolvethisproblem.
Onefinalword(don’tprostitutefornothing,pleasepayattention)
EveryarticlewrittenbyCheniscarefullywritten.Ifthisarticleishelpfulorinspiringtoyou,pleaselike,read,repost,andcollectit.Yoursupportismybiggestmotivationtopersevere!
Inaddition,Chen’sKnowledgePlanethasbeenopened.Itonlycosts199yuantojoin.Thevalueofplanetfeedbackishuge.Currently,ithasupdatedtheCodeYuanchronicdiseasecloudmanagementpracticalproject,theSpringfamilybucketpracticalseries,thebillion-leveldatasub-databaseandtablepractical,andtheDDDmicroServicepracticecolumn,Iwanttojoinabigfactory,Spring,Mybatisandotherframeworksourcecodes,22lecturesonarchitecturepractice,RocketMQ,etc…
Moreintroduction
Ifyouneedtojointheplanet,addChen’sWeChataccount:special_coder
Theknowledgepointsofthearticlematchtheofficialknowledgefiles,andyoucanfurtherlearnrelatedknowledge.JavaSkillTreeHomepageOverview137233peoplearelearningthesystem