Most of this right now is a brain dump for me to get some of my ideas and experiences out there. The idea is that I will use this brain dump as a template to write a much more comprehensive guide to building an end-to-end infrastructure that is built on this foundation.
In my 12+ years as a professional system administrator and infrastructure architect, I have gained a lot of insights into what works and what doesn't work and how IT infrastructures could be improved. I'm hoping to brain dump a lot of that here to create a cohesive vision that can be used as a template by startup companies to design their own infrastructure.
One thing to drill into your head is this: Work smarter, not harder! Too many IT departments get bogged down in mundane repetetive tasks that could be avoided if only they would automate their regular administrative work and design a planned infrastructure rather than vomit up some big ugly ad-hoc environment.
Major kudos to Infrastructures.org for already putting a lot of thought and work into this, and sharing their findings with the world.
Really there is a very blurry line between servers and workstations. Most of the differences did not exist until Microsoft promoted the concept with it's NT product line. Within the UNIX world, a "server" is just a process that runs on a host. We're going to worry about building and maintaining "hosts".
In my mind, hosts should just be disposable and replaceable components of an overall design. Ideally, one should be able to plug a new host into the wall, and after a burn-in period it should be provisioned within a few hours with very little direct intervention.
Ideally most of the heavy lifting is taken care of by the provisioning system itself. The sysadmin merely turns on the host which automatically installs the OS and incorporates it into the infrastructure. This should handle most of the OS load and configuration. Most of the real work goes into building the provisioning system.
The provisioning system isn't done after the "sign off" stage; it should still be used for deploying patches, changing roles of hosts, making configuration changes to hosts, and so on. All hosts have an ongoing relationship with the provisioning system.
Initial provisioning should be done in a secure room. The main reason for this is you want to be able to automate the distribution of Kerberos keytab files but you don't want to do this out on the production floor where a lot of people have physical access to the network. If you can limit your initial host provisioning to a physically secure room with an isolated subnet, it's a lot easier to more safely automate keytab distribution. Once the host is provisioned it can be moved out to its production location.
If you're a small company, just starting out, you can achieve all of this with just a few machines. Start out with desktop class machines if that's all you can afford for now, but make it a priority to get some server class hardware in place as soon as you're able. Also make sure you're getting good regular backups, and test your backup strategy out to make sure it's actually working the way you might expect.
One excellent way to cheat here to get off the ground running or to prove this out would be to spec out one machine with at least two processors, 2GB of RAM, and a few hundred gigs of space. Not that you'll need it all, but it will make your life a lot easier. I endorse the use of VMware Server for this purpose. You can build the virtual machines on workstation class hardware now, and pick up and move those virtual machines to more resilient server class hardware later. It works, and it works well! In fact, that's how I prototyped most of this environment. I was working from a lower budget than I would hope, so I was using an Athlon64 3700+ with 1GB of RAM and 250GB of SATA disk. If I were to do this again, I'd prefer 2GB of RAM and at least one dual-core processor, preferably two.
I used Ubuntu Desktop (Dapper) for the physical host OS, which gave me a fairly full featured desktop interface to work with and a very easy way to fetch any components that I needed to bootstrap my environment. I added VMware Server to this, and then started running down my checklist.
Before any work commences on your infrastructure, you need a place to store configuration files and documentation, and track changes. CVS or Subversion would be ideal for something like this. Don't spend a lot of time here. This bootstrap server will go away soon once the gold server takes over. From here on out, I'm going to use CVS for examples but you could substitute Subversion here if you prefer.
This is the server that all other hosts come from. Take your time, build this one right, and any configuration files that are changed from the vendor defaults should be populated into the CVS server. Technically speaking, once your infrastructure is built, this could be replaced by a server built by the infrastructure itself at which point the infrastructure becomes enirely self-hosting.
Most OS vendors have their own methods for auto-deploying their products. Preseed for Ubuntu, Jumpstart for Solaris, Kickstart for CentOS and Red Hat Enterprise Linux, AutoYAST for SuSE, and so on. Generally these tools can work fine. For the most part the systems we're going to install are going to be fairly straightforward, as supplied by the vendor. After the installation is complete we'll worry about bringing the OS in line with the infrastructure.
Ad-hoc changes, as a rule, should be avoided. Sometimes they cannot be avoided.
If your system clocks aren't synchronized, this will break authentication services and potentially cause you many other issues. Enter ntp. Some architects put this a little lower down their checklist, but for me it has to be one of the first services deployed. So many other parts of the infrastructure depend on this.
Cfengine, or the configuration engine is an autonomous agent and a middle to high level policy language and agent for building expert systems to administrate and configure large computer networks. This is going to be the real center of keeping your infrastructure going and making it self-hosting.
How will you replicate common files from your central repository to all of your hosts?
While you could, technically speaking, authenticate from a directory service, you wouldn't want to. It's sort of like drag racing in an SUV. We'll be using MIT Kerberos V here.
Kerberos V5 System Administrator's GuideWe're starting to get close to the point that we can add actual user accounts here, and set up a true trust relationship between hosts in your infrastructure.
User metadata, printer metadata, etc. should not be copied from host to host but rather shared from a central directory. Back in the day we used NIS. These days LDAP is king.
OpenLDAP Fedora Directory ServerOnce a user is authenticated, the directory is needed to authorize that a user is indeed allowed to use a host or resource, and provide the host with vital data about the user that is not provided by Kerberos (such as the location of the user's home directory, their preferred shell, their name, email address, preferred mail server, and so on).
I'm going to break with my contemporaries and suggest that NFSv2 or NFSv3 should not be taken seriously in an enterprise computing environment. NFSv4 shows promise, but OpenAFS deserves a serious look. OpenAFS has a client for just about every popular OS out there, it is free, it is mature, it is secure, but it does take quite a bit more work and design to get set up than NFS or Samba.
How will you keep all of your hosts patched?
yumPostfix, Amavis, ClamAV, SpamAssassin, Cyrus IMAP, Sympa mailing list manager, Sieve mail filtering language
CUPS Common UNIX Printing System, Samba
CentOS Enterprise Linux is admittedly one of my favorites. This is a free clone of Red Hat Enterprise Linux, and works extremely well (even on some platforms not supported by Red Hat!) I've found it to work very well with LDAP, Kerberos, and NFS. Very stable, very well supported, and a great user community to help you through sticky issues. Contrary to popular belief, some excellent commercial support is available for CentOS. But if you're contemplating support contracts, that sort of kills the point of using CentOS in the first place (i.e. not being railroaded into buying support for all of your hosts).
Ubuntu has been getting a lot of my attention lately. I have to relearn some of my enterprise management tools for it and determine how appropriate it is for wide scale distribution, but so far I am very encouraged. In fact, most of my CentOS machines have been reloaded with Ubuntu Dapper.
Sticky subject. Don't tread lightly here. I have a lot to write here but I'd like to get the provisioning notes squared away first.
| [Maintained by Magnus Hedemark] [Last updated August 01 2006 08:25:42.] | |
| Copyright 2009 |