Tech Note 0034
Load Balancing Strategies
Recommendations for server load balancing and fail-over
ExpeDat and SyncDat transactions may be distributed among multiple servers for balanced loads and automatic fail-over. This improves performance and reliability by pooling infrastructure resources and by automatically routing around problems. Servers can be grouped into a single active-active pool, or prioritized for active-passive fail-over.
Rules are specified per-transaction, allowing each operation to customize its use of server resources. Load balancing and fail-over logic is executed by the clients so that there is no central point of failure. Each client, and each transaction, can operate independently across many servers.
Servers may be grouped as singletons, peers, and fallbacks.
A singleton is one server host, specified by DNS name or IP address, plus a port number.
A peer group is a comma separated list of one or more singletons which will all be queried at the same time. The client will choose the best available server from the list. Peer groups allow transactions from many clients to be evenly spread among multiple servers. If one of the servers in a peer group becomes unreachable, loads automatically shift to the others.
A fallback group is a semi-colon separated list of one or more peer groups which will be queried in the order given. Peer groups later in the list will only be used if no servers are reachable in an earlier group. This can be used to specify servers which should only be used in case of disaster, such as those with limited resources or disaster recovery (DR) restricted licenses.
Host groups are specified per-transaction and are managed entirely by individual clients. A server does not need any special configuration to participate in a group. However, with respect to any given transaction, all servers in a group must be functionally equivalent. For example, to download a file with a host group target, that file must be accessible at the same path using the same credentials on all servers in the host group. All servers in a host group must share the following properties:
Because the client may fail-over to a different server in the middle of a session, all servers in a host group should be as identical as possible. Ideally, they should have the same bandwidth, storage, and load capacity.
Load Balancing & Active-Active Failover
At the start of each transaction to a host group, the client evaluates the availability of the first peer group. If one or more servers is reachable, the client will choose the one with the greatest availability as measured by each server's load and capacity. If no servers are reachable, the client will try the fallback groups.
Under normal operations, all clients in the first peer group will have approximately the same load. If one server becomes unreachable, clients will automatically choose from among the other servers. No transactions will be directed to a fallback group unless all servers in the first peer group are unreachable.
If all reachable servers in a peer group are at maximum capacity, clients will auto-retry until capacity becomes available. The fallback group will not be used to extend capacity, it will only be used when all servers in the first peer group are unreachable.
Disaster Recovery & Active-Passive Failover
Some servers may be undesirable for normal use because they have limited resources, are licensed only for disaster recovery operation, or are reserved for special functions. Including such servers in a fallback group ensures that they will only be used when all of the first peer group servers are unreachable.
AWS EC2 instances are allowed up to 5 gigabits per second of internet bandwidth. By operating four such instances and including them in a peer group, the total bandwidth becomes 20 gigabits per second.
A data center in New York may have two ExpeDat servers, with content mirrored at a disaster recovery location in Las Vegas. With two ExpeDat Disaster Recovery licenses deployed in Las Vegas, the New York servers would be specified as the first peer group, and the Las Vegas DR servers as the fallback group.
Two workgroups may maintain separate High Performance Computing clusters in Virginia and Singapore. Scheduling requirements dictate that workers in the US should use the Virginia servers and workers in Asia should use the Singapore servers, but sharing resources is permitted for disaster recovery. Workers in the US would use the Virginia servers as their first peer group and the Singapore servers as their fallback. Workers in Asia would use the Singapore servers as their first peer group, and the Virginia servers as their fallback group.
Tech Note History
|Jan||03||2022||Updated AWS Example|
|Feb||20||2019||Updated CloudDat Example|