APM-less SAP Commerce performance analysis

Today’s day and age it’s all about SAP Commerce Cloud (CCV2). New implementations are deployed directly there or you are on your way to upgrading your SAP Commerce version and migrating to the “Cloud” ⛈

CCV2 comes with that fancy Dynatrace but there are a lot of projects still running as on-premise deployments on VMware ESXi, GCE, AWS or Azure. It would not come as a surprise that many of those are without an APM, maybe it was “too expensive” or “no budget for it”.

Next I’ll introduce different ways to investigate performance issues on a SAP Commerce deployment where there is no monitoring or alerting set in place.

Everything is going well.. business as usual.. until one day you get that phone call: the e-commerce is slooow 🐢 in the middle of a sales campaign, no one knows why since the only alarm you have is customers complaining from every available channel.

Maybe is the deployment from two weeks ago that introduced some bottleneck? 🤔 Or is that there is not enough resources to handle the current traffic? But it was working fine hours ago! Did something change?

WHAT .. DO .. YOU .. DO?? It’s time to play detective 🕵️‍♀️

Download access logs from every node that serves client requests and generate stacktraces every X seconds for further analysis. All this can be done from HAC (I leave it to you the amount of time and frequency to generate them).

  • Access logs: Platform ➡️ Support
  • Stacktraces: Monitoring ➡️ Thread Dumps ➡️ Record Dumps

Setup proper logging in Tomcat. By default Tomcat uses the combined pattern for the AccessLogValve and this is not very useful because it does not include the thread, real user IP or time taken to process the request.

Update the server.xml to obtain this vital information (more info at 🔗 Apache Tomcat 8 Configuration reference)

SAP Commerce Access Log Valve configuration

SAP Commerce issue analysis

Now that you have all the information needed let’s start!

1. Check for BLOCKED threads in stracktraces

This is a common issue and most of the time is related to Cart recalculations / promotions, an integration that has no timeout configuration or Populators. You can use the grep command or any Java Thread dumps analizyer tool like 🔗 FastThread.io to look for those threads that are in BLOCKED or WAITING state.

2. Find the most executed code

I don’t like to point fingers 👉 but it’s not uncommon that custom code it’s likely to be the reason your e-commerce is unintentionally suffering. Let’s suppose you are working on the COOL project and your custom Java classes all contain COOL as a prefix name, then a simple concatenation of commands can go a long way.

Script to filter stracktraces for most common executed code

Oops! seems like some custom cart calculations are the issue! 😟

3. Find which endpoints are failing or taking too long to respond

This is were the change in the access log valve in Tomcat comes handy. Also, the list of the resources that are failing and exceptions being thrown will point directly to the incident.

Script to filter slow interfaces, 40x /50x responses and ERRORs / WARNs in logs

4. Check HAC monitoring information

  • Cache regions Hit/Miss ratio: is the cache working properly? do you have a lot of invalidations?
  • Cluster status: are all the nodes listed?
  • Database: too many open connections? problems on obtaining a connection to the DB?
  • Memory & CPU load: Is CPU usage over 100%? are there any peaks?
hybris administration console (HAC)

Congratulations! By now you should have a pretty good idea on what could be the issue. Good job! 👍

You can continue with the case investigation 🔎 and combine all the evidence; For example: from the access log’s most consuming requests obtain the Java thread Id and look it up in the thread dumps to find what is causing the bottleneck.

Wrapping up

Monitoring is an essential tool to have in place. It’s an investment that will save you time 🕐 and money💰; Options are plentiful. Making a good use of it will optimize your SAP Commerce solution where trivial changes in configuration or custom code can have a huge impact on performance and conversion rate.

I’ll leave to a future post on how to execute a performance review of a SAP Commerce implementation with load testing and what to check on your APM for hints of improvements to be made.

Cheers! 🍻

Nico

Tech Lead and e-commerce Architect. Specialized in SAP Commerce