Laggy response after a quiet time

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Laggy response after a quiet time

JumpStart
Hi all,

My app is working brilliantly under load, but after a quiet time it can be very slow to respond, leading our first user of the day to tap the same thing multiple times, and the next thing you know is CPU hits 100% and is stuck there, and none of those requests returns a response. Nor do any new requests return a response. Apache logs show that all the requests time out after 60 secs, unanswered, and the health checkers start messaging the support staff.

Has anyone else experienced this kind of thing?

Perhaps it’s something to do with our infrastructure? We’re running Tapestry from an EAR in Wildfly in Docker in an AWS EC2 instance. Also in that EC2 instance is Apache HTTPD in Docker.

Any thoughts, please! It’s a crazy problem.

Cheers,

Geoff
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Laggy response after a quiet time

Mats Andersson-2
Do you have some scheduled services that could eat memory? I would
suspect that the garbage collector is using most of that CPU. Monitor
the GC and how memory is used over time. Maybe there is a mismatch
between the memory configurations in Docker and the JVM.

It seems like it is repeatable, that is good for troubleshooting at least.

Mats


On 2019-05-16 03:33, JumpStart wrote:

> Hi all,
>
> My app is working brilliantly under load, but after a quiet time it can be very slow to respond, leading our first user of the day to tap the same thing multiple times, and the next thing you know is CPU hits 100% and is stuck there, and none of those requests returns a response. Nor do any new requests return a response. Apache logs show that all the requests time out after 60 secs, unanswered, and the health checkers start messaging the support staff.
>
> Has anyone else experienced this kind of thing?
>
> Perhaps it’s something to do with our infrastructure? We’re running Tapestry from an EAR in Wildfly in Docker in an AWS EC2 instance. Also in that EC2 instance is Apache HTTPD in Docker.
>
> Any thoughts, please! It’s a crazy problem.
>
> Cheers,
>
> Geoff
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
--
---------------------- Mats Andersson | Ronsoft AB | +46(0)73 368 79 82

Reply | Threaded
Open this post in threaded view
|

Re: Laggy response after a quiet time

Dmitry Gusev
Hi,

I'd also try to collect thread dump when you see this happening to get an
idea where the CPU is wasted.

On Thu, May 16, 2019 at 10:38 AM Mats Andersson <[hidden email]>
wrote:

> Do you have some scheduled services that could eat memory? I would
> suspect that the garbage collector is using most of that CPU. Monitor
> the GC and how memory is used over time. Maybe there is a mismatch
> between the memory configurations in Docker and the JVM.
>
> It seems like it is repeatable, that is good for troubleshooting at least.
>
> Mats
>
>
> On 2019-05-16 03:33, JumpStart wrote:
> > Hi all,
> >
> > My app is working brilliantly under load, but after a quiet time it can
> be very slow to respond, leading our first user of the day to tap the same
> thing multiple times, and the next thing you know is CPU hits 100% and is
> stuck there, and none of those requests returns a response. Nor do any new
> requests return a response. Apache logs show that all the requests time out
> after 60 secs, unanswered, and the health checkers start messaging the
> support staff.
> >
> > Has anyone else experienced this kind of thing?
> >
> > Perhaps it’s something to do with our infrastructure? We’re running
> Tapestry from an EAR in Wildfly in Docker in an AWS EC2 instance. Also in
> that EC2 instance is Apache HTTPD in Docker.
> >
> > Any thoughts, please! It’s a crazy problem.
> >
> > Cheers,
> >
> > Geoff
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> --
> ---------------------- Mats Andersson | Ronsoft AB | +46(0)73 368 79 82
>
>

--
Dmitry Gusev

AnjLab Team
http://anjlab.com
Reply | Threaded
Open this post in threaded view
|

Re: Laggy response after a quiet time

Diego Dakszewicz
Hi Dmitry!
Tapestry since 5.3 minify css and js on a fly. That's why your cpu hits
100%, at startup when tapestry warm up resources on first request and when
your site has low traffic.
A workaround to solve this problem is disable this feature.
Take a look at your AppModule in

   @Contribute(ResourceMinimizer.class)
    public static void
contributeMinimizers(MappedConfiguration<String, ResourceMinimizer>
configuration) {}

You can implement your own minimizer or just disable it. I prefer minify my
resources at deploy time. It's recommended if you have to warm up a lot of
apps hosted on the same server.
Cheers,
Diego

El jue., 16 may. 2019 a las 4:59, Dmitry Gusev (<[hidden email]>)
escribió:

> Hi,
>
> I'd also try to collect thread dump when you see this happening to get an
> idea where the CPU is wasted.
>
> On Thu, May 16, 2019 at 10:38 AM Mats Andersson <[hidden email]
> >
> wrote:
>
> > Do you have some scheduled services that could eat memory? I would
> > suspect that the garbage collector is using most of that CPU. Monitor
> > the GC and how memory is used over time. Maybe there is a mismatch
> > between the memory configurations in Docker and the JVM.
> >
> > It seems like it is repeatable, that is good for troubleshooting at
> least.
> >
> > Mats
> >
> >
> > On 2019-05-16 03:33, JumpStart wrote:
> > > Hi all,
> > >
> > > My app is working brilliantly under load, but after a quiet time it can
> > be very slow to respond, leading our first user of the day to tap the
> same
> > thing multiple times, and the next thing you know is CPU hits 100% and is
> > stuck there, and none of those requests returns a response. Nor do any
> new
> > requests return a response. Apache logs show that all the requests time
> out
> > after 60 secs, unanswered, and the health checkers start messaging the
> > support staff.
> > >
> > > Has anyone else experienced this kind of thing?
> > >
> > > Perhaps it’s something to do with our infrastructure? We’re running
> > Tapestry from an EAR in Wildfly in Docker in an AWS EC2 instance. Also in
> > that EC2 instance is Apache HTTPD in Docker.
> > >
> > > Any thoughts, please! It’s a crazy problem.
> > >
> > > Cheers,
> > >
> > > Geoff
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > --
> > ---------------------- Mats Andersson | Ronsoft AB | +46(0)73 368 79 82
> >
> >
>
> --
> Dmitry Gusev
>
> AnjLab Team
> http://anjlab.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Laggy response after a quiet time

Cezary Biernacki
In reply to this post by JumpStart
Obviously it would help to learn what CPU is doing. Does the application
become unstuck and start working normally after initial spike of activity?

I have not experienced exactly the same behaviour, and I have relatively
similar technology - a Tapestry application running inside Docker on AWS
EC2, thought it is a standalone process with Dropwizard/Jetty. But years
ago I had related problems with a different WAR-based application that
crashed after not being used some time. It was caused by the "tmpwatch"
daemon on Linux deleting files in /tmp that had not been accessed after
some time.

I recommend checking if anything in your application can be described as
"it can be removed if not used after predetermined amount of time" or "it
needs refreshing from time to time and this determined on access".
Potential culprits:

   - in-application caches, e.g. one built with Guava's Cache;
   - database connections or other TCP connections where an external
   service can terminate idle sessions;
   - an auto-scalling mechanism;
   - the application server determining that last day user sessions are
   expired and killing them with a listener doing something expensive on
   session's termination.

Another option is weird garbage collection interactions, e.g. caused by
non-trivial work in the "finalize" method somewhere in the application.

Cezary


On Thu, May 16, 2019 at 3:33 AM JumpStart <
[hidden email]> wrote:

> Hi all,
>
> My app is working brilliantly under load, but after a quiet time it can be
> very slow to respond, leading our first user of the day to tap the same
> thing multiple times, and the next thing you know is CPU hits 100% and is
> stuck there, and none of those requests returns a response. Nor do any new
> requests return a response. Apache logs show that all the requests time out
> after 60 secs, unanswered, and the health checkers start messaging the
> support staff.
>
> Has anyone else experienced this kind of thing?
>
> Perhaps it’s something to do with our infrastructure? We’re running
> Tapestry from an EAR in Wildfly in Docker in an AWS EC2 instance. Also in
> that EC2 instance is Apache HTTPD in Docker.
>
> Any thoughts, please! It’s a crazy problem.
>
> Cheers,
>
> Geoff
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Laggy response after a quiet time

JumpStart
In reply to this post by Diego Dakszewicz
Thanks guys. These are all helpful suggestions that I will look into.

Geoff

> On 16 May 2019, at 8:59 pm, Dakszewicz Diego <[hidden email]> wrote:
>
> Hi Dmitry!
> Tapestry since 5.3 minify css and js on a fly. That's why your cpu hits
> 100%, at startup when tapestry warm up resources on first request and when
> your site has low traffic.
> A workaround to solve this problem is disable this feature.
> Take a look at your AppModule in
>
>   @Contribute(ResourceMinimizer.class)
>    public static void
> contributeMinimizers(MappedConfiguration<String, ResourceMinimizer>
> configuration) {}
>
> You can implement your own minimizer or just disable it. I prefer minify my
> resources at deploy time. It's recommended if you have to warm up a lot of
> apps hosted on the same server.
> Cheers,
> Diego
>
> El jue., 16 may. 2019 a las 4:59, Dmitry Gusev (<[hidden email]>)
> escribió:
>
>> Hi,
>>
>> I'd also try to collect thread dump when you see this happening to get an
>> idea where the CPU is wasted.
>>
>> On Thu, May 16, 2019 at 10:38 AM Mats Andersson <[hidden email]
>>>
>> wrote:
>>
>>> Do you have some scheduled services that could eat memory? I would
>>> suspect that the garbage collector is using most of that CPU. Monitor
>>> the GC and how memory is used over time. Maybe there is a mismatch
>>> between the memory configurations in Docker and the JVM.
>>>
>>> It seems like it is repeatable, that is good for troubleshooting at
>> least.
>>>
>>> Mats
>>>
>>>
>>> On 2019-05-16 03:33, JumpStart wrote:
>>>> Hi all,
>>>>
>>>> My app is working brilliantly under load, but after a quiet time it can
>>> be very slow to respond, leading our first user of the day to tap the
>> same
>>> thing multiple times, and the next thing you know is CPU hits 100% and is
>>> stuck there, and none of those requests returns a response. Nor do any
>> new
>>> requests return a response. Apache logs show that all the requests time
>> out
>>> after 60 secs, unanswered, and the health checkers start messaging the
>>> support staff.
>>>>
>>>> Has anyone else experienced this kind of thing?
>>>>
>>>> Perhaps it’s something to do with our infrastructure? We’re running
>>> Tapestry from an EAR in Wildfly in Docker in an AWS EC2 instance. Also in
>>> that EC2 instance is Apache HTTPD in Docker.
>>>>
>>>> Any thoughts, please! It’s a crazy problem.
>>>>
>>>> Cheers,
>>>>
>>>> Geoff
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>> --
>>> ---------------------- Mats Andersson | Ronsoft AB | +46(0)73 368 79 82
>>>
>>>
>>
>> --
>> Dmitry Gusev
>>
>> AnjLab Team
>> http://anjlab.com
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Laggy response after a quiet time

JumpStart
In reply to this post by Cezary Biernacki
Thanks Cezary. As I feared, there are many potential culprits to sniff out.

Geoff

> On 16 May 2019, at 10:05 pm, Cezary Biernacki <[hidden email]> wrote:
>
> Obviously it would help to learn what CPU is doing. Does the application
> become unstuck and start working normally after initial spike of activity?
>
> I have not experienced exactly the same behaviour, and I have relatively
> similar technology - a Tapestry application running inside Docker on AWS
> EC2, thought it is a standalone process with Dropwizard/Jetty. But years
> ago I had related problems with a different WAR-based application that
> crashed after not being used some time. It was caused by the "tmpwatch"
> daemon on Linux deleting files in /tmp that had not been accessed after
> some time.
>
> I recommend checking if anything in your application can be described as
> "it can be removed if not used after predetermined amount of time" or "it
> needs refreshing from time to time and this determined on access".
> Potential culprits:
>
>   - in-application caches, e.g. one built with Guava's Cache;
>   - database connections or other TCP connections where an external
>   service can terminate idle sessions;
>   - an auto-scalling mechanism;
>   - the application server determining that last day user sessions are
>   expired and killing them with a listener doing something expensive on
>   session's termination.
>
> Another option is weird garbage collection interactions, e.g. caused by
> non-trivial work in the "finalize" method somewhere in the application.
>
> Cezary
>
>
> On Thu, May 16, 2019 at 3:33 AM JumpStart <
> [hidden email]> wrote:
>
>> Hi all,
>>
>> My app is working brilliantly under load, but after a quiet time it can be
>> very slow to respond, leading our first user of the day to tap the same
>> thing multiple times, and the next thing you know is CPU hits 100% and is
>> stuck there, and none of those requests returns a response. Nor do any new
>> requests return a response. Apache logs show that all the requests time out
>> after 60 secs, unanswered, and the health checkers start messaging the
>> support staff.
>>
>> Has anyone else experienced this kind of thing?
>>
>> Perhaps it’s something to do with our infrastructure? We’re running
>> Tapestry from an EAR in Wildfly in Docker in an AWS EC2 instance. Also in
>> that EC2 instance is Apache HTTPD in Docker.
>>
>> Any thoughts, please! It’s a crazy problem.
>>
>> Cheers,
>>
>> Geoff
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]