Use case : You want to write monitoring script to find all pending and blocking job in the replication queue and do some action.
Solution : Use following curl commands (Change server name, Port, UID and PWD)
To find all pending Jobs
curl -s -u admin:admin "http://localhost:4504/bin/querybuilder.json?&path=/var/eventing/jobs&type=slingevent:Job&p.limit=-1&fulltext=/com/day/cq/replication/job&fulltext.relPath=@slingevent:topic&property.and=true&property=slingevent:finished&property.operation=not&orderby=slingevent:created&orderby.sort=asc" | tr ",[" "\n" | grep path | awk -F \" '{print $4 "\n"}'
To find all Blocking Jobs
curl -s -u admin:admin "http://localhost:4504/bin/querybuilder.json?path=/var/eventing/jobs/anon&type=slingevent:Job&rangeproperty.property=event.job.retrycount&rangeproperty.lowerBound=1" | tr ",[" "\n" | grep path | awk -F \" '{print $4 "\n"}'
Once you have blocking jobs, You can go to CRX and remove blocking entry (Before that make sure that blocking entry is causing problem).
You can also use replication clean up script (It is custom script I wrote to remove one entry) to remove one entry from the queue and then if necessary activate them again.
Blocking replication queue can happen because,
1) There is some problem with sling eventing and queue is not getting processed (For that restart sling event support bundle from felix console)
2) There is a blocking job in the queue (For that find blocking entry in the queue using above curl command and remove it)
3) There is some problem with publish server (503 or OOM etc, In that case restarting publish server should resolve the issue)
How often do you run these commands for monitoring? every minute, 15 seconds, once in a while... only when there is a problem?
ReplyDeleteYou can run this command when there is a problem. Also in CQ 5.5 you can use JMX console to find out these information.
DeleteHi Yogesh,
DeleteCan you provide details on how to analyse the blocking job details in JMX or any support document
Hi
ReplyDeleteCan you please tell me how exactly do we look for blocking jobs? I mean what does the query actually does? Because I am facing similar issue but this query isn't returning anything and we are sure that there is a blocking event which is causing issues in our replication queues.
Thanks
Dipti
Dipti,
DeleteYou should be able to see that in replication queue. go to tools -> replication agents for that. If that does not help then modify query to look only under /var/eventing/jobs
Yogesh
Where can I get the script to clean one entry from a replication queue ? Is there any feature like priority item should go first then others. If I get one more priority page I want to activate I should be inject it into queue in first place. Hence that item will go first and later other items in queue will process. I am not sure it is possible or not. If possible let me know how can I do it ?
ReplyDeleteHarry,
DeleteReplication Jobs are always processed in FIFO manner for obvious reason. So if you have one bad replication job it will block all other one. Please check http://www.wemblog.com/2012/07/how-to-clear-replication-queue-in-cq.html for an example of how to clear replication queue.
Yogesh
Hi Yogesh,
ReplyDeleteMahesh here, i would like to know the script for replication queue pending and also if replication is disabled it should let me know.Version is AEM CQ5.6.1
You can simply curl /etc/replication/agents.author/publish/jcr%3Acontent.json | jsawk -a 'return this.enabled'
DeleteMore info about jsawk
https://github.com/micha/jsawk
Is there anyway I can use a curl to get the number of queued jobs on /system/console/slingevent page?
ReplyDeleteFor that you can curl page and use sed or any other tool to parse info. I would suggest to use query builder as mention above.
DeleteHello,
ReplyDeleteI have to build a curl command for setting up the replication agent from author to publisher. Can you help me out.
thank you