Calendar Thursday, March 28, 2024
Text Size
   

gary

 

When I heard that oxygen and magnesium hooked up I was like OMg.
A Simple Server Cluster Design - Part 5 PDF Print E-mail
Written by Administrator   

This series is intended to provide documentation of a simple server cluster based on two physical servers and four virtual servers per machine, and configured to host an instance of the Moodle Learning Management System.

Topics

 

 

File Server, Repositories, Synchronisation & Network Issues


 

Moodle stores uploaded files in an area commonly referred to as "Moodledata". With Moodle 2.x the concept of repositories was created, with the ability to link to products such as Alfresco which manage documents and other file based media. The Moodledata area is now used to manage a local repository, where files are stored as file blobs, with the file data managed in the database.

In a replicated system the file data must be replicated locally (if the Moodle repository is used), or within the repository itself (beyond the scope of this article)

We will also look at some of the network issues associated with clustering.

Replicating the Repository

If you are using a repository such as Alfresco, Harvest Hive, then you will need to refer to their documentation.

In the rest of this article we will be looking at the tools required to manage replication of the Moodledata area on a Linux server (why would you want to use anything else?). The first possibilities presented are tools that work on top of the file system, while DRDB works on top of block devices, at the fundamental level of the file system.

Rsync

Rsync is an open source utility that provides fast incremental file transfer, and can be used to synchronise file data on two network servers. It must be configured to run as a background process and so the changes are not available immediately on the alternate server.

Here is an article on how to configure Rsync for file synchronisation. And this one from IBM is quite good.

Note that Rsync does not track deleted files, only new ones, so will retain any deleleted files. If this is a problem you will need a more sophisticated system.


Csync

Csync is a client only bidirectional file synchronizer. You can use csync for different things. It was developed to provide Roaming Home Directories for Linux but you can use it to synchronise any file directory.


Unison

Unison is a file-synchronization tool for Unix and Windows. It allows two replicas of a collection of files and directories to be stored on different hosts (or different disks on the same host), modified separately, and then brought up to date by propagating the changes in each replica to the other.


File Synchronisation using DRDB

DRBD® (Distributed Replicated Block Device) is an open source distributed storage system for the GNU/Linux platform. It consists of a kernel module, several userspace management applications and some shell scripts and is normally used on high availability (HA) clusters.

If you are looking for a true "High Availability" solution, then DRDB is the right choice, as it provides real-time mirroring across a network.

DRBD refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an assigned network. DRBD can be understood as network based raid-1.


DRDB

In the illustration above, the two orange boxes represent two servers that form an HA cluster. The boxes contain the usual components of a Linux™ kernel: file system, buffer cache, disk scheduler, disk drivers, TCP/IP stack and network interface card (NIC) driver. The black arrows illustrate the flow of data between these components.

The orange arrows show the flow of data, as DRBD mirrors the data of a highly available service from the active node of the HA cluster to the standby node of the HA cluster.

If you are going to implement DRDB, it is not for the faint-hearted, and requires you to have a good understanding of Linux File System & networking fundamentals.



Network Issues

The network configuration provides the pathways, or roads between differnet system elements. Choosing a configuration requires an analysis of the use of those pathways in the overall system operation.

Let's look at some of the issues:

External interface

The data speeds coming into the cluster are generally less demanding than within the network, and in a medium HA configuration you could get away with a 100Mb/s network.  This of course will depend upon ultimate demand, and this should be implemented mindful of anticipated usage over time.


Internal Interfaces

In a cluster there will need to be fast access between Web Server, Database Server, and File System / Repository. Also if the VM's are running on a NAS you will need 1Gb/s or faster.

One way to ensure the fastest speed is to have separate networks, or single connections between system elements. That may be facilitated by having mutliple network cards, that also provide redundancy in the system design. This also reduces the cost of multiple and costly switches.

Once you have an overall picture of your cluster elements, then you need to look at the usage of each network path, and decide whether it needs a dedicated pathway, or a shared pathway, and then design the network accordingly.

Note that in a truly HA design, you need to build in redundancy so that if any element, including the network switches and cabling, can be removed without the affecting the functioning of the cluster. Check out this article for some more open source help. Otherwise call Cisco or Juniper etc.