Servers are an essential component of any enterprise in 2019. Did you know servers require maintenance like any other equipment?

Keeping a server running is more involved than loading the latest patches and updates. Use our server maintenance checklist to ensure the smooth operation of your server and avoid downtime.

Here’s is our list of 15 server maintenance tips to help you better manage your hardware and avoid the most common issues.

Server Data Verification

1. Double-Check & Verify Your Backups

If you’ve ever had to recover from a catastrophic drive failure, you know how important data is to the smooth operation of a business.

With a good backup strategy, it’s better to have them and not need them, than need them and not have them.  Schedule a few minutes every week (or every day) to check the server backups. Alternately, you can mirror the server environment to a virtual machine in the cloud and test it regularly.

2. Check the RAID array

Many dedicated servers run a RAID (Redundant Array of Independent Disks) array. Basically, multiple hard drives acting as one storage device in the event of a single disk failure.

Some RAID configurations are designed for performance, others for redundancy.  In most cases, modern RAID arrays have advanced monitoring tools. A quick glance at your RAID monitoring utility can alert you to potential drive failures. This lets you plan drive replacements and rebuilds in a way that minimizes downtime.

3. Verify Storage Utilization

Periodically check your server’s hard drive usage. Servers generate a lot of log files, old emails, and outdated software packages.

If it’s important to keep old log files, consider archiving them to external storage. Old emails can also be archived or deleted. Some application updaters don’t remove old files.  Fortunately, some package managers have built-in cleanup protocols that you can use. You can also find third-party utilities for managing old software files.

Hard drives are not just used for storage. They also use a swap file, which acts like physical memory. If disk utilization gets above 90%, it can interfere with the swap file, which can severely degrade performance.

Software & Server System Checks

4. Review Server Resource Usage

In addition to reviewing disk space, it’s also smart to watch other server usages.

Memory and processor usage can show how heavily a server is being used. If CPU and memory usage are frequently near 100%, it’s a sign that your server may be overtaxed. Consider reducing the burden on your hardware by upgrading, or by adding additional servers.

5. Update Your Control Panel

Control panel software (such as cPanel) must be updated manually. When updating cPanel, only the control panel is updated.  You still need to update the applications that it manages, such as Apache and PHP.

6. Update Software Applications

Depending on your server configuration, you may have many different software applications. Some systems have package managers that can automatically update software.  For those that don’t, create a schedule to review available software updates.

This is especially true for web-based applications, which account for the vast majority of breaches.  Keep in mind that some operating systems may specifically require older application versions – Python 2 for CentOS7, for example.  In cases where you must use older software in a production environment, take care to avoid exposing such software to an open network.

7. Examine Remote Management Tools

Check remote management tools including the remote console, remote reboot, and rescue mode. These are especially important if you run a cloud-based virtual server environment, or are managing your servers remotely.

Check in on these utilities regularly to make sure they are functional. Rebooting can solve many problems on its own. A remote console allows you to log in to a server without being physically present. Rescue mode is a Red Hat solution, but most server operating systems have a management or “safe” mode you can remotely boot to make repairs.

8. Verify Network Utilization

Much like memory and CPU usage, server loads have a network capacity. If your server is getting close to the maximum capacity of the network hardware, consider installing upgrades. In addition to the capacity of the network, you might consider using network monitoring tools. These tools can watch your network traffic for unusual or problematic usage.

Monitoring traffic patterns can help you optimize your web traffic. For example, you might migrate frequently-accessed resources to a faster server. You might also track unusual behavior to identify intrusion attempts and data breaches, and manage them proactively.

9. Verify Operating System Updates

OS updates can be a tricky field to navigate. One the one hand, patches, and updates can resolve security issues, expand functionality, and improve performance. Hackers often plan cybersecurity attacks around “zero-day” exploits. That is, they look at the OS patches that are released, and attack those weaknesses before a business can patch the vulnerability.

On the other hand, custom software can experience conflicts and instability with software updates. Dedicate time regularly to review OS updates. If you have a sensitive production environment, consider creating a test environment to test updates before rolling them out to production.

Server Hardware

10. Physically Clean Server Hardware

Schedule time regularly to physically clean and inspect servers to prevent hardware failure. This helps keep dust and debris out of the circuit boards and fans.

Dust buildup interferes with heat management, and heat is the enemy of server performance. While you’re cleaning, visually inspect the servers and server environment. Make sure the cabinets have plenty of airflow. Check for any unusual wiring of connections. An unexpected flash drive might be a security breach. An unauthorized network cable might create a data privacy concern.

11. Check for Hardware Errors

Modern server operating systems maintain logs of hardware errors.

A hardware error could be a SMART error on a failing hard drive, a driver error for a failing device, or random errors that could indicate a memory problem. Checking your error logs can help you pinpoint and resolve a hardware problem before it escalates to a system crash.

Security Monitoring

12. Review Password Security

Evaluate your password policy regularly. If you are not using an enterprise password management system, start now.

You should have a system that automates good password hygiene. If you don’t, this can be a good time to instruct users to change passwords manually.

13. Evaluate User Accounts

Most businesses have some level of turnover, and it’s easy for user accounts to be overlooked.

Review the user account list periodically, and remove any user accounts no longer needed. You can also check account permissions, to make sure they are appropriate for each user. While reviewing this data, you should also examine client data and accounts.  You may need to manually remove data for former clients to avoid legal or security complication

14. Consider Overall Server Security

Evaluate your server security policies to make sure that they are current and functioning. Consider using a third-party network security tool to test your network from the outside. This can help identify areas that you’ve overlooked, and help you prevent breaches before they occur.

15. Check Server Logs Regularly

Servers maintain logs that track access and errors on the server. These logs can be extensive, but some tools and procedures make them easier to manage.

Review your logs regularly to stay familiar with the operation of your servers. A logged error might identify a hardware issue that you can fix before it fails. Anomalies in access logs might mean unauthorized usage by users or unauthorized access from an intruder.

Regular Server Maintenance Reduces Downtime & Failures

With this checklist, you should have a better understanding of how to perform routine server maintenance.

Regular maintenance ensures that minor server issues don’t escalate into a disastrous system failure. Many server failures as a result of preventable situations due to poor planning.