We suffer from a number of issues around how we provision environments that host our software. One of the most annoying of these is that we need to make sure puppet runs complete in the correct order because different role types will “export” resources to PuppetDB, which are then consumed by other role types. For example a puppet run on an application server might produce information that is later consumed by the load balancer. When this pattern is used in many places, a complex dependency graph is built up that dictates the order that puppet must run to converge on the correct configuration. We developed a tool called PuppetRoll that understands these dependencies and forces puppet runs in the correct order and enforces that puppet completes without error before allowing dependent role types to also run puppet. This is ok, but it is leaves the concept of an applications Stack as a disjointed collection of low level concepts.
In order to create a more scalable and more deterministic infrastructure we think creating a central description of the environment we want to represent will help us achieve this goal. We want to create a model of our environments and then derive the specific data that will populate specific puppet resources in order to configure specific machines. With this model, building new environments will not be as constrained by a puppet run order and will lead to less duplication of data.
I will describe an example of this approach with a simple ruby DSL that exposes data as puppet functions to populate networking resources:
This piece of configuration:
environment "dev" do virtual_service "myservice", :vip=>"10.1.1.3.1", :port=>80 do realserver, :ipaddress=>"10.1.1.3.2" realserver, :ipaddress=>"10.1.1.3.3" end virtual_service "myservice2", :vip=>"10.1.1.4.1", :port=>80 do realserver, :ipaddress=>"10.1.1.4.2" realserver, :ipaddress=>"10.1.1.4.3" end end
Describes an environment that contains two VirtualService‘s. Each VirtualService contains a virtual ip address (a VIP) and port to expose the service on. It also contains a list of RealServer‘s that will back this virtual service.
We should be able to generate the networking configuration for the load balancers like this:
networking: eth1: ipaddress: '10.1.1.3.1' ipaddress: '10.1.1.4.1'
And would generate networking configuration for the real servers in “myservice” that look like this:
for real server 1:
networking: lo:0: ipaddress: '10.1.1.3.1' eth1: ipaddress: '10.1.1.3.2'
and for real server 2:
networking: lo:0: ipaddress: '10.1.1.3.1' eth1: ipaddress: '10.1.1.3.3'
The puppet code could look like this:
We could of could use an ENC (External Node Classifier) to push the correct configuration onto the node. And of course the use of static ip addresses seems far than ideal!
So, this very simple example got us thinking about how we could extend this concept to provide descriptions of entire complex environments; each node would pull in the configuration it needed at build time. Further to this, the provisioning of the environment could also be driven from this model.. ie: well, I need 2 app servers for “myservice”, after auditing I discover we only have 1 so I provision a new one (similar to Orc).
Taking this to the logical conclusion.. we would end up with a Rich Domain model backed by a normalized datastore that would model our environment and posses the capability to manipulate that data into hashes that could be used to populate puppet resources. A rough draft of this looks like this:
In this model a Server could be one of LoadBalancer or RealServer but the correct networking configuration can be generated for both types of server. A Server lives in an Environment and an environment has many VirtualService‘s. Each VirtualService actually depends on a number of other VirtualService‘s, this allows us to generate the correct set of firewalling rules and even app server configuration files that allow us to “wire” our environment together with minimal effort!