How to Avoid a Data Disaster
February 4, 2008
While computer hackers and viruses often garner the most attention in the realm of computer threats, many contractors overlook the countless ways that physical damage can adversely impact their business. Computer hardware failure, power surges, and/or damage from flood or fire could destroy years of work orders, contacts, accounting records, and more. In an instant, your business could find itself having to recreate information that took years to accumulate.
In the following article, we’ll lay out some of the most important safeguards for contractors who rely on a computer infrastructure to run their business. As a software provider for more than 30 years, Data-Basics has helped hundreds of firms streamline and improve their contracting business. To put it bluntly: we’ve installed enough software to know that hardware reliability is vital to a contracting firm’s bottom line.
TAMING RAW POWERElectrical power is both the most necessary component of any hardware setup and one of the most affordable. For the most part, the dependability of the electricity that comes into your business is rarely a concern to most businesses. However, when considering the investment in computer equipment that is constantly fed by this electricity, any reasonable business must have a plan in place to address unforeseen power issues.
Surges, outages, line noise, and much more (see the sidebar at right) can wreak havoc on the delicate electrical components in a computer. Therefore, protecting your system with AC line conditioning to smooth out the peaks and dips common to AC power is essential. For extremely mission-critical applications, a backup generator may be called for. All this should be done to protect the component most likely to fail in the event of a power issue, the computer’s power supply.
The power supply - a component that converts line voltage AC to the various DC voltages that computers use - should be hot swappable (able to replace while a computer is still running) and redundant for the most critical machines. This configuration allows for the replacement of a failing or failed power supply without shutting down the computer.
Another best practice for supplying reliable power to critical computers is using an uninterrupted power supply (UPS). A UPS is a multitasking firewall device that offers protection between your computer and a host of power-related dangers. It provides surge protection, low-voltage and noise protection, and enough backup power to keep a computer running long enough to allow for a safe system shutdown (typically 20 to 30 minutes).
WHEN IT'S CRITICAL, IT NEEDS TO BE REDUNDANTHow many computers do you have at your business? A few dozen? A few hundred? Regardless of the number, it is important to realize that not all computers are created equal. Certain machines perform tasks that are absolutely critical to running your business. Often, the most important computers are servers: computer systems that have been designated to run a specific - often mission-critical - application (or applications). Common examples include your company’s e-mail server, a database server, or a Web server for your company Website.
Since servers typically run 24/7/365, the likelihood of failure is greater for one of these computers than for a desktop machine that gets powered down each night. A server may fail for a variety of reasons, ranging from a faulty power supply to uncontrollable external conditions like a leaky roof. To account for the possibility of downtime (see the sidebar at right), systems are often configured in a way that more than one server is responsible for delivering information and applications to your users. This way if one server fails, a redundant machine can temporarily share the workload. But what happens if a single server is wholly responsible for a specific application or task? If your business utilizes this configuration, it is imperative to have a plan of action to restore a vital piece of hardware as expeditiously as possible. This should include an exact physical replacement of the machine, including the correct versions of all required software and a plan for integrating the new server into the network.
In any case, the cost of going without - whether it is e-mail or your accounting system - must be considered when setting up machine redundancy for components that are a) the most crucial, b) have the highest failure rate, or c) both. Simply put, not only does redundancy allow a server (or any computer for that matter) to remain active until the failing component can be repaired or replaced, it gives your business piece of mind.
RAID YOUR DATAOf all the parts in a computer, the hard drive takes the most punishment. As a mechanical component with scores of moving parts, it performs day in and day out while a stark reality looms above: all hard drives eventually fail. Given this fact, a great deal of thought has been applied towards the question of how to configure a computer so that the failure of a single disk drive does not force an entire system down or cause a data corruption. One of the most accepted solutions is Redundant Array of Independent Disks (or RAID), which is a data storage scheme for the grouping of disk drives together as a single storage system. These systems offer increased reliability and/or throughput depending on the various setup options, known as levels.
While there are seven distinct RAID configurations, two configurations are used most widely:
• RAID 1 uses two separate disk drives to create an exact copy of each byte of data. For obvious reasons, RAID 1 is often referred to as “disk mirroring.” Many IT professionals prefer this setup not only because it is inexpensive, but it also allows for drive failure and replacement without shutting down the entire system.
• A far more popular configuration - also known for its relatively low cost - is RAID 5. This setup uses three or more disk drives in array acting like a single drive. RAID 5 systems containing more than three drives allow for multiple disk failures without compromising system integrity. This setup provides a high level of protection while remaining relatively cost effective.
BACKUPS TO COVER YOUR BACKSIDEWhere a RAID configuration will protect against data loss from a hard disk failure, creating a backup enables your system to quickly recover from either hardware or software problems while providing a separate layer of protection. A backup consists of a copy of data used to restore information after a data loss event. Data are typically backed up by being written to another location on removable media (see the sidebar at right) or another computer.
Just as all computers are not created equal, the same can be said for backups. Two primary types of backup include:
• A full backup makes a complete copy of multiple files, a database, or an entire disk drive. Considering that a single database can grow to be many gigabytes in size, the process of creating a full backup can be quite lengthy.
Because of the time involved, this type of backup might be scheduled to take place very late at night when few users will be inconvenienced. Still, the value of having a full backup can easily outweigh the inconvenience of decreased system performance. A cost/benefit analysis can help you determine the value associated with a full backup and how often it should occur.
• Another option to consider - particularly as a stopgap between full backups - is to perform a partial backup. A partial backup typically copies a transaction log file that records only the changes to a database or disk drive since its last full backup. In the event that information would need to be restored, the most recent full backup would be used in place of the corrupted data and then the most recent partial backup(s) of the transaction log file are applied.
While running regular backups may sound like a reasonable method for ensuring data integrity, there is one important caveat to keep in mind: backups are not 100 percent reliable. Unfortunately, the process of creating a backup can become corrupted for a host of reasons, including:
• Defective media was inserted in the backup drive.
• The hard drive to which the backed up is written is full.
• The program that schedules the backup is not configured correctly.
• The backup scheduling program was not restarted properly after a system reboot.
• The system was reconfigured and the backup drive no longer exists.
• And the list goes on...
Fortunately, most systems display status messages during the backup process or once it is complete. However, if these messages appear on a monitor located in the server room and are otherwise ignored, all your good intentions may count for nothing. That said, it is recommended that you check the integrity of your backup process on a weekly basis.
The real test of a backup comes when it is actually used to restore a system. The process of using a backup should not only be tested whenever a new system is put into place but also at regular intervals. Vigilant contractors check this restoration process on a test system, start the restored system, and compare the test to the live system. Unfortunately, few businesses actually perform this vital step even once.
SENDING BACKUPS OFFSITEAssuming that the backups execute flawlessly, what do you do with the data? All too often, we hear that backups are regularly performed, carefully dated, and then stored on removable media in a cabinet… next to the server! In the event of fire, flood, theft, or other catastrophe, your primary and backup data may both be gone forever. To avoid a devastating setback like this, consider the following procedures:
• Store the backup media in a fireproof safe or cabinet, ideally in another room.
• Send the backup media to an offsite location for storage.
• Upload an electronic copy of the backup to another location on your network.
• Upload an electronic copy of the backup to an offsite server.
Implementing just one of these steps can significantly increase the physical security of your data.
CONCLUSIONAs mentioned previously, much of the focus on computer security is on protecting against outside, intangible forces like computer viruses. The fact is, while protecting against outside threats is important, there are a number of steps that can be taken to buffer against the physical dangers related to your computer hardware. Simply put, the physical protection of your company’s computer systems does not have to be left to chance.
Having a complete understanding of how your systems are currently protected and the contingency plans in place in the event of a failure is a must with our ever-increasing reliance on technology. If nothing else, we hope the points discussed here can start a conversation on the subject of physical IT security at your contracting firm, which may help you avoid a potential data disaster.
GlossaryAC line conditioning - The process of smoothing out the voltage peaks and dips common to AC power.
CD-R disc - A “write once, read many” times optical media that has a high level of compatibility with standard CD readers. Standard data sizes include 720 or 800 megabytes (MB).
Data loss event - The unforeseen loss of data or information.
DVD/R - A “once-writable” optical disc with a standard capacity of 4.7 gigabytes (GB).
Full backup - The act of making copies of data so that these additional copies may be used to restore the original after a data loss event.
Hot swappable - The ability to remove and replace components of a machine (usually a computer) while operating.
Machine redundancy - Exceeding what is necessary; serving as a duplicate for preventing failure of an entire system.
Magnetic tape - A standard data storage solution similar to common cassette tape.
Partial backup - Also known as an incremental backup; a backup method where multiple backups are kept that is successive in nature and only contains the information that changed since the previous one.
Power supply - Converts the alternating current (AC) line to the direct current (DC) needed by a computer.
RAID - Redundant Array of Independent Disks; Data storage schemes that divide and/or replicate data among multiple hard drives. The schemes offer increased data reliability and/or throughput.
RAID levels - A basic set of RAID configurations that employ data striping, mirroring, or parity; there are seven levels numbered 0 through 6.
Server - A networked computer that provides services or applications to other computers.
Transaction log - A history of actions executed by a database management system.
Uninterrupted power supply (UPS) - A device that maintains a continuous supply of electric power to connected equipment by supplying power from a separate source (usually a battery) when utility power is not available.
Uptime - The amount of time that your data is up, running, accessible, and available.
Reprinted with permission from the Data-Basics white paper “How to Avoid a Data Disaster: Computer Security for Service Contractors.” Data-Basics provides field service software, work order software, dispatching, and service management software solutions to automate field service, accounting, service dispatching, and more for service contractors. For more information, visit www.databasics.com.
Publication date: 02/04/2008