Debugging AEM: A Primer
Follow along with my quest to solve a bug with the DAM Update Asset workflow
One of the hardest things for me when starting with AEM was learning how to break problems down to figure out the root cause. There are so many moving parts in AEM, and most of them were either unfamiliar territory (Sling, Felix, the JCR), or are unique to AEM itself (things like Granite, Coral, the workflow engine, etc.) So, when I recently was forced to debug a problem which occurs in AEM out-of-the-box, I decided to write a detailed article laying some of the techniques I use to debug AEM when it isn’t doing what I think it should.
On the other hand, if you’re someone who is finding that their “DAM Update Asset” workflow disappears from the “Timeline” view, and you’d like to skip to the end, jump directly to “Restoring the DAM Update Asset Workflow”.
Introducing the Problem
For the past few AEM releases, every few months my end-users would re-open the same internal support ticket. “‘DAM Update Asset’ Workflow is unavailable in the siderail.”
We have a fairly large implementation of AEM Assets, disconnected from any AEM Sites instance, with a number of users who work in AEM every day but aren’t exactly…well, they’re graphics people. They refer to me as their “propeller-head”, so I think I’ll refer to them as my “Luddites”.
One of the operations they need to do on a fairly regular basis is to perform the “DAM Update Asset” workflow on 20-30 assets at a time. To do this, they often use the “Timeline” rail in the Assets view:
So you can perhaps imagine the consternation when, every few days, that workflow just completely disappears from the rail.
Even more frustrating, it would happen between releases, when we were pretty sure no one should be changing anything. At first we suspected it was a permissions issue, but as we dug deeper we found that even the built-in “admin” user couldn’t see the desired workflow in this select list. So began a hunt for what may be the most subtle bug I’ve ever found in AEM.
How does that rail work, anyway?
As I began to dig into the problem, I decided the first task was to see what builds that siderail, and specifically the dropdown list of Workflow Models that we are allowed to select from. Of course, AEM is all built on top of a series of HTTP Requests, so the first thing I did was open my Chrome DevTools panel and load the page.
…238 requests. OK, this is going to take some time.
My first assumption was that there would be a .json request for the list of Workflow models, so I tried using the “filter” box to just show requests containing .json, but after 30 minutes or so of going through the JSON responses, I realized this assumption was flawed. Nothing I could find led to the list of workflows.
Taking a different tack, I cleared the DevTools and reloaded the page, this time instructing Chrome to capture screenshots as the page loaded. I had a strong hunch that whatever it was would be loaded before the Workflow Selector UI was visible, so this led me to focus in on the requests made sometime between 2.24 seconds into the request and 2.39 seconds; a much smaller body of requests to work with.
Here, I have filtered the DevTools requests for those items which came in between approximately 2.1 and 2.3 seconds:
I got excited seeing those
timeline.js entries, but upon further inspection these turned out to simply be the code that defines some of rendering of that side rail, but it didn’t include any of the actual content. When none of the others panned out either, I moved back to that moment between 2.0s and 2.2s, where you can see the long green bar.
From my previous dive into the JSON requests, I already know
assetdetails.assetnavigation.json weren’t the culprits, so I zeroed in on that
timeline.html request, and found the data I was looking for!
Now we have a path we can use to dig into the CRX:
/mnt/overlay/dam/gui/content/assetdetails/jcr:content/rails/timeline/items/timeline.html. My first guess when looking for anything that lives behind
/mnt/overlay is always to start with
/libs - because that’s the OOTB configuration for how AEM itself decodes these. So let’s take a peek in the CRX DeLite at
/libs/dam/gui/content/assetdetails/jcr:content/rails/timeline/items/timeline. Stick with me for a bit longer, because crawling around the dependency tree of built-in components can feel more than a little tedious.
/libs/dam/gui/content/assetdetails/jcr:content/rails/timeline/items/timeline is a Granite Include (
granite/ui/components/coral/foundation/include) referencing the path
/mnt/overlay/dam/gui/coral/content/commons/sidepanels/timeline. Again, we replace
/libs, and find that
/libs/dam/gui/coral/content/commons/sidepanels/timeline has a
cq/gui/components/coral/common/admin/timeline; again, we know we need to resolve that, most likely under
/libs. At first blush, it feels like the trail may have gone a bit cold on us here, because that path has a ton of structure:
But ultimately I’m looking for some datasources, and it didn’t take very many expansions of childnodes to find
/libs/cq/gui/components/coral/common/admin/timeline/events/workflow/datasources, which has a very useful JSP nested under it at
/libs/cq/gui/components/coral/common/admin/timeline/events/workflow/datasources/models/models.jsp. I wish I could link to a GitHub Gist of the code behind this, marking up the lines of code that are the most interesting, but there’s a big, scary header at the top of the file:
<%-- ADOBE CONFIDENTIAL Copyright 2015 Adobe Systems Incorporated All Rights Reserved. NOTICE: All information contained herein is, and remains the property of Adobe Systems Incorporated and its suppliers, if any. The intellectual and technical concepts contained herein are proprietary to Adobe Systems Incorporated and its suppliers and may be covered by U.S. and Foreign Patents, patents in process, and are protected by trade secret or copyright law. Dissemination of this information or reproduction of this material is strictly forbidden unless prior written permission is obtained from Adobe Systems Incorporated. --%>
Anyway, if you’ve got a copy of AEM, you can open it yourself and follow along as I explain at a higher level what’s going on in this JSP.
As the path suggests, this JSP generates the list of workflows for the “Timeline” siderail; it uses a ComponentHandler to look for a property named “workflowtags”, parses that as CSV data, and then it looks for workflows that have cq:tags attributes on them that exist in that CSV data. So we need to find where AEM is reading that “workflowtags” property from, in order to determine why the “DAM Update Asset” workflow isn’t visible here.
After hunting around manually for a while, I returned to our
timeline.html rendered content to look for more clues, and saw an identifier that seemed like it might pretty useful for digging further:
I took this value to the “Search” box on the main page of CRX DeLite, and one of the first hits it led me to was
So, we have found that only things with the cq:tag “workflow:dam” will appear in the Timeline siderail. Venturing out to
/var/workflow/models/dam/update_asset/metaData, we can see it has actually two tags:
Well, that didn’t seem to help at ALL! It should be included in our Timelines siderail, based on all this logic. It was at this point that I spun up a new version of AEM 6.5, installed SP1, and to confirm that I wasn’t losing my mind, added a logger to the
shouldBeIncluded method found in the aforementioned
models.jsp file. Watching my logs, I could see that the workflow came through just fine, so I cloned our production AEM Assets instance and added the same logger, and found to my amazement that the “DAM Update Asset” workflow was missing the “dam” tag!
A diversion on
cq:tags have, more than once, been a source of frustration in an AEM implementation. For example, if you create a bunch of content, and tag it up, and then export a package to load that content onto another server, you might be frustrated to discover many of your tags have gone missing.
I really feel this is an under-documented area of AEM, as I could only find one single line that explained the behaviour, buried in the middle of a page about “programmatically working with tags”:
The TagManager ensures that tags entered as values on the cq:tags string array property are not duplicated, it removes TagIDs pointing to non-existing tags and updates TagIDs for moved or merged tags.
Even more interestingly, these are not direct
cq:tag properties, but rather a sort of hybrid use of tags, with enforcement enabled by the
model.jsp file. With the repository restructuring, I knew that
cq:tags moved from
/content/cq:tags (not the most intuitive of the moves in the repository restructuring, I would really have expected those to be part of the new
/conf tree, but ), and that there was a special “Backwards Compatibility” mode where even in 6.4 instances, if AEM saw an
/etc/tags tree it would continue to persist new
cq:tag nodes there, rather than in the preferred location.
Returning back to our hunt for the missing model, we are expecting this model to have the requisite “dam” tag, but our logger has shown us that something is off. At this point, I went into the new /conf tree and found that the workflow had been modified
some-user-who-should-know-better has modified the workflow, and you can see RIGHT THERE - the modified the
cq:tags, so that now only
Workflow:WCM is part of this workflow. After hours of hunting around in AEM to find the root cause, I had my smoking gun!
I proceeded to
some-user-who-should-know-better’s desk and read them the riot act about changing things in Prod where they had no business being. They insisted they had not touched the tags, they had only marked the workflow as non-transient; because we have an internal utility which queries our AEM Assets implementation for in-progress workflows to show them on a dashboard, we need the workflow history to be persisted in the tree. Trying to calm my raging emotions, I allowed for how perhaps they had ACCIDENTALLY removed the tag, not realizing they’d done it. They seemed really confident this wasn’t the case, so we took a little trip to the office of some-user-who-should-know-better’s boss to find a mediator for the discussion.
To demonstrate what I believed to have happened, I sat at the computer of some-user-who-should-know-better’s boss and went to my newly created AEM 6.5 SP1 instance, navigated to the “DAM Update Asset” workflow as I explained my theory of the crime.
“So when you load this page here,” I explained “you’ll see there are two tags on the…”. I stopped, mid-sentence. There was only ONE TAG on the workflow - “Workflow: WCM”.
“Well, ” I vamped. “You can see how easy it is to accidentally remove the tags, as I appear to have done the same myself. Now, we can just go into the Tag Manager here and add that critical
Workflow:DAM tag back onto this workflow…”
Again, stunned silence. There WAS NO TAG for
Workflow:Dam. It simply wasn’t there!
Restoring the DAM Update Asset Workflow
I backed my way out of the conversation and headed back to my desk, wondering when I had deleted that so-critical tag. I nuked my AEM 6.5 SP1 instance and created a new one, navigating to the same workflow. There it was again: the
Workflow:DAM tag was missing, and this was a BRAND NEW instance of AEM. I opened the tag browser and found the tag was missing, and it all finally came together.
At this point, I have checked every release between 184.108.40.206 and 220.127.116.11, and I cannot find this
Workflow:DAM tag in ANY of these versions OOTB.
Because the original nodes in
/var/workflows use a normal string property named ‘tags’ rather than actually using true a
cq:tag property, this tag being missing doesn’t actually get caught until someone tries to edit one of these workflows; when saving those edits, the new version of the workflow gets saved under
/conf and the property is converted to a real
cq:tag property, at which point AEM deletes the
Workflow:DAM tag, because the tag doesn’t ACTUALLY exist in an OOTB instance of AEM.
The simple resolution, until Adobe fixes this, is to manually create the
Workflow:DAM tag in your
cq:tags list; or, you can install this package via your Package Manager, which will create the tag for you.