There are various approach, I am listing them with pros and cons
Approach 1: Use clustering, but direct the requests only to one node. In case of problems with this node just switch over to the other node. Essentially this is an active-passive scenario.
+ proven technology, documented and fully supported feature
+ automatic fail over easily possible
- additional license cost
- Latency issue might effect performance
- Managing cluster is sometime difficult
Approach 2: Use replication (on modification, no version, no status update).
+ proven technology, documented and fully supported
+ easy handling, reconfiguration during runtime
- manual reconfiguration of replication in case of switch
- "on modification" only works for cq:pages and not for other type like DAM
workflows, Events, Replication queues. (But if you activate DAM asset it will be there is stand by system)
So, this approach 2 doesn't achieve a "full-standby" system, but it more looks like a way not to loose content. Everything else is probably gone, so it's only for a really worst-case scenario.
Approach 3: Build periodically content packages and replicate them to the standby-system
+ you can also do a provisioning of other staging systems with this content
+ load on the active system predictable, normal editing actions are not
loaded with this
- self-written, not supported
- you can package workflows and events when grabbing them from repository
- Need testing to see if it will work.
Some Questions:
Q: What is best way to create active passive clustering node and what should be I careful about.
A: Create multi node cluster and make sure that DR nodes are not taking any request. You have to careful about that there is not a lot of latency between DR node active nodes.
Also it is very difficult to keep one node from active node as master in case master goes down. There is no harm of having DR as master node but not recommended (As write always goes through master). You can use either felix console or preferredMaster property to set up master in advance. Please read http://crxcluster.wemblog.com very carefully to understand CRX better.
To make a node master:
http://HOST:PORT/system/console/jmx/com.adobe.granite%3Atype%3DRepository
Since this is exposed as a JMX you can monitor it and invoke it during run time.
Q: Do I have to take cold backup of all the nodes
A: No, if you are using clustering then taking backup of any node (usually master) is enough.
Q: How about publish instance
A: It is not recommended to use clustering in publish instance, Unless there is no other way to support some use case. In that case each publish instane is Hotback of each other. Note that you can not recover publish instance from nightly backup (As things might have been updated from the time back up was created). Usually it is recommended to have a backup publish instance which does not take load but configured as replication agent in author.
Q: I am just left with nightly backup, How should I create new publish instance ?
A: In this case you have to find out things replicated from the time last nightly backup was created and now. You can use nightly backup in conjunction with http://www.wemblog.com/2011/10/how-to-find-all-pages-modified-or.html to support this use case.
Approach 1: Use clustering, but direct the requests only to one node. In case of problems with this node just switch over to the other node. Essentially this is an active-passive scenario.
+ proven technology, documented and fully supported feature
+ automatic fail over easily possible
- additional license cost
- Latency issue might effect performance
- Managing cluster is sometime difficult
Approach 2: Use replication (on modification, no version, no status update).
+ proven technology, documented and fully supported
+ easy handling, reconfiguration during runtime
- manual reconfiguration of replication in case of switch
- "on modification" only works for cq:pages and not for other type like DAM
workflows, Events, Replication queues. (But if you activate DAM asset it will be there is stand by system)
So, this approach 2 doesn't achieve a "full-standby" system, but it more looks like a way not to loose content. Everything else is probably gone, so it's only for a really worst-case scenario.
Approach 3: Build periodically content packages and replicate them to the standby-system
+ you can also do a provisioning of other staging systems with this content
+ load on the active system predictable, normal editing actions are not
loaded with this
- self-written, not supported
- you can package workflows and events when grabbing them from repository
- Need testing to see if it will work.
Some Questions:
Q: What is best way to create active passive clustering node and what should be I careful about.
A: Create multi node cluster and make sure that DR nodes are not taking any request. You have to careful about that there is not a lot of latency between DR node active nodes.
Also it is very difficult to keep one node from active node as master in case master goes down. There is no harm of having DR as master node but not recommended (As write always goes through master). You can use either felix console or preferredMaster property to set up master in advance. Please read http://crxcluster.wemblog.com very carefully to understand CRX better.
To make a node master:
http://HOST:PORT/system/console/jmx/com.adobe.granite%3Atype%3DRepository
Since this is exposed as a JMX you can monitor it and invoke it during run time.
Q: Do I have to take cold backup of all the nodes
A: No, if you are using clustering then taking backup of any node (usually master) is enough.
Q: How about publish instance
A: It is not recommended to use clustering in publish instance, Unless there is no other way to support some use case. In that case each publish instane is Hotback of each other. Note that you can not recover publish instance from nightly backup (As things might have been updated from the time back up was created). Usually it is recommended to have a backup publish instance which does not take load but configured as replication agent in author.
Q: I am just left with nightly backup, How should I create new publish instance ?
A: In this case you have to find out things replicated from the time last nightly backup was created and now. You can use nightly backup in conjunction with http://www.wemblog.com/2011/10/how-to-find-all-pages-modified-or.html to support this use case.