Many Linux servers have critical data that must be available from a redundant source in case the primary server fails. Sometimes this requires a high availability cluster, and sometimes it just requires of copy of the data somewhere else. When configuring some form of data replication on a Linux server, an important point to consider is the nature of the data itself. Sometimes this can severely limit the replication methods available to you.
I recently spoke with a System Administrator who wanted a general purpose replication scheme that could be used on any kind of data. He wasn't sure what kinds of data he would eventually want to replicate, and really just wanted to cover his bases and make sure that if he chose our product it would have broad applicability within his datacenter. As it turns out, this led to an interesting discussion of what replication methods are best for different types of data, and why.
Asynchronous vs. Synchronous
Generally speaking there are two choices. The first and easiest way is to replicate files asynchronously with a tool like rsync. This can be as simple as running rsync manually or via cron every once in a while, or as complicated as having custom tools that watch for changes in your data and automatically synchronize machines when changes are detected. There is a kernel feature called Inotify that can help with the latter if you want to write your own custom app to perform certain actions when specified files or directories change. Whether you keep it simple or not, this method is nice because the fact that you are replicating data won't slow down your application. But on the downside, the data is not replicated immediately. Whether the delay is one second or one day, there is still a delay and if you have a failover, you can be assured that your secondary server is not guaranteed to pick up the data where the first server left off.
The second choice is to use a replicated cluster filesystem or DRBD, both of which are typically used in synchronous mode. This can ensure that two current copies of the data exist at all times. If one server goes down, the data on the other will pick up at the exact same spot and no data will be lost. All filesystem writes are guaranteed to be flushed at both servers before either one considers the transaction complete. While this sounds far better than asynchronous replication, it is much more complicated and therefore comes at a cost.
Because cluster filesystems put an additional layer of software between the block device and the application, and because they often have to perform many additional tasks when writing data (like talking to remote servers, storing version information, etc.), they can be orders of magnitude slower than normal Linux filesystems like ext3. If the setup is optimized and especially if it uses a high speed and low latency interconnect like Infiniband or Myrinet, the delays can be minimized. But these interconnects are expensive compared to traditional gigabit Ethernet, so to get rid of the speed problem the cost of even very small clusters is now increased by a few thousand dollars at a minimum. The complexity is still much greater as well, which can reduce reliability.
Problem Summary
Is either method necessarily better? Not really. It depends on what's most important: speed or not losing your most recent data updates. The first kind of replication (asynchronous) gives us fast writes and is easy to implement, but lacks strict data integrity guarantees. The second method (synchronous) will guarantee that both copies are identical, but it's slow, more complex, and possibly very expensive as well. As it turns out, the type of data you are replicating is also a critical factor in choosing the correct method. You can't just choose either one based on your preference.
Large Files
If you are replicating large binary files like the ones used by MySql, there is an inherent problem with rsync'ing the data to another server. If you change even one byte of that file and then try to update the mirror (i.e., run rsync again), the whole file needs to be scanned to see what has changed. It's the nature of rsync. It checks to see if the checksum of the file is different, and if it is, then it has to look for what changed and replicate the differences. But on a huge file, just scanning for the change takes a long time and quickly bogs the system down. In practice, rsync-style asynchronous replication generally fails for large files that change continuously. We are left with two alternatives: synchronous replication, and/ or some kind replication capability that is built in to the application itself.
Synchronous replication using a cluster filesystem is advantageous here because no lengthy scanning needs to take place. A filesystem write is captured before it is written to disk and the small change is immediately sent to the replication partner. Then both systems flush the change at once and report back to the application that the write was completed. Compared to the rsync method, flushing the change was incredibly fast.
The other choice for large files is to rely on replication that's built in to the application itself. MySql has replication mechanisms but not all applications do. In addition, these replication devices have pitfalls of their own. What happens when a transaction fails to complete on a slave node? Generally, replication will stop and wait for you to manually resolve the problem. Will you be notified? It's possible but probably not unless you setup a special facility for this. If the slave node is now out of sync, can it be re-synced? Again, probably not. The slave needs to be re-created from a fresh copy of the master database, and replication restarted from a known point. If this happens frequently it can be very time consuming. Can you be certain that the slaves are identical to the master? After all, it is not a copy per se, but a set of data on another server that is supposed to have undergone the same transactions as the master. You can check externally to make sure they are identical, but this is not built in and over time divergence due to missed or failed transactions is a real administrative concern. All of these issues need to be considered when you are trying to decide the best way to implement database or large file replication.
Small Files
Small and / or rarely changing files require an entirely different set of strategies. As a general rule rsync and other asynchronous methods are very efficient in this case. Say you have a thousand smaller files and you randomly change 20 of them. Because everything is local and low overhead, the writes are fast and the filesystems are simple. Rsync will see very quickly that 980 of the files have not changed and that only 20 need to be updated. It will calculate the differences in those 20--which is very fast because they are small--and send them to the replication partner. The differences will be applied and the mirror will be updated.
Asynchronous replication is probably the method of choice here because it is the simplest effective solution. The overhead of a cluster filesystem is not likely to be worth the tradeoff in speed and cost, unless those issues are unimportant in your environment or unless you need both copies of the mirror to be identical in real time. This situation where there are large numbers of small and relatively static files is commonly seen in fileservers and web servers. In fact, whenever you download something from the Internet you may notice that it comes from one of many "mirror" sites that all have a copy of the same file. A nightly rsync is the traditional way that these mirrors are maintained. If a new file is to be made available, the web developer will upload it to a master server and every night the slave servers will use rsync to update their copies of the available files. The next morning, all the mirrors will be identical thanks to asynchronous replication.
Conclusion
When choosing the best data replication method for your particular situation, there are two things that must take center stage. First, understand the strengths and weaknesses of asynchronous and synchronous replication. Asynchronous replication is fast and simple, but does not replicate in real time and thus cannot guarantee identical sets of data at the moment of failover. Speed and ease of implementation are prioritized. Synchronous replication always guarantees identical data sets, but it's usually much slower and possibly expensive. This method prioritizes data integrity above all else.
Second, think about the kind of data you are going to be replicating. This could determine that only one method or the other will be practical. If large database files need to be replicated then a synchronous cluster filesystem or application specific tool is likely to be the best choice. If you have mostly small files or large files that rarely if ever change, asynchronous replication may have the edge by virtue of its speed and simplicity.