Thursday, May 29, 2014

Fault-tolerant data replication and synchronization with Infinispan

Figure 1. Deployment
Fault-tolerance
Having multiple instances running over different nodes provide fault-tolerance, as when one node terminates, the other nodes have the backup replica of the partitions stored in the terminated node. Figure 1 shows the higher level deployment view of the solution.

Design
Two distributed cache instances exist in InfDataAccessIntegration.
    protected static Cache userReplicasMap;
    protected static Cache replicaSetsMap;
userReplicasMap is a mapping of userId -> Array of replicaSetIDs. UserID could be the logged in user name. (for now, testing with random strings).
replicaSetsMap is a mapping of replicaSetID -> replicaSet
Figure 2. Core class hierarchy

Though this could be replaced with a single cache instance with the mapping of userID -> replicaSets, I decided to go with this design, as having two cache instances will be more efficient during searches, duplicates, and push changes. Hence, I decided to go with two cache instances design.

InfDataAccessIntegration provides the API for publisher/consumer, TCIAInvoker (which extends InterfaceManager, an abstract class I created) implements the TCIA integration to invoke these methods. Figure 2 provides a core class hierarchy of the system.
 
Figure 3. Execution Flow
Execution Flow
The execution flow is depicted by Figure 3.
* User logs in -> logIn() checks whether the user has already stored replicaSets from the Infinispan distributed Cache. If so, execute them all again. This would be changed later as we do not have to execute all. Rather, we need to execute for the diffs.

* The user performs new searches, for the images, series, collections, and the other meta data. New searches will create and write the replicaSet to the distributed cache, before returning the results.

The replicaSet for the image will be as,
TCIAConstants.IMAGE_TAG + "getImage?SeriesInstanceUID=" + seriesInstanceUID

For other information (meta data), such as collections, series, etc,
TCIAConstants.META_TAG + query;
Here, query is something like, "getSeries?format=" + format +
                "&Collection=" + collection +
                "&PatientID=" + patientID +
                "&StudyInstanceUID=" + studyInstanceUID +
                "&Modality=" + modality;
When a new instance starts now, and invokes the log in action for the same user, it will execute the queries for the stored replicaSets again, and reproduce the same results.

Further updates will be posted, when they are available. :-)

No comments:

Post a Comment

You are welcome to provide your opinions in the comments. Spam comments and comments with random links will be deleted.