Scaling systems configuration at Facebook - Phil Dibowitz
183 4 24978
For many years, Facebook managed its systems with cfengine2. With many individual clusters over 10k nodes in size, a slew of different constantly-changing system configurations, and small teams, this system was showing its age and the complexity was steadily increasing, limiting its effectiveness and usability. It was difficult to integrate with internal systems, testing was often impractical, and it provided no isolation of configurations, among many other problems. After an extensive evaluation of the tools and paradigms in modern systems configuration management - open source, proprietary, and a potential home-grown solution - we built a system based on Chef.
The evaluation process involved understanding the direction we wanted to take in managing the next many iterations of systems, clusters, and teams. More importantly, we evaluated the various paradigms behind effective configuration management and the different kinds of scale they provide. What we ended up with is an extremely flexible system that allows a tiny team to manage an incredibly large number of systems with a variety of unique configuration needs. In this talk we will look at the paradigms behind the system we built, the software we chose and why, and the system we built using that software. Further, we will look at how the philosophies we followed can apply to anyone wanting to scale their systems infrastructure.
By anonymous 2017-09-20
So, this is "primarily opinion based", but I'll answer it anyway. There are 4 distinct choices here:
A definition is just a wrapper around one or more resources with some parameterization. However, definitions are not added to the resource collection. Meaning you can't "notify" or trigger events on a definition. They are solely for wrapping and naming a series of repeatable steps found in a recipe.
An LWRP (Light-weight resource and provider) is a Chef-specific DSL that actually compiles into an HWRP (Heavy-weight resource and provider) at runtime. Both LWRPs and HWRPs are Chef extensions. In addition to wrapping a series of repeatable tasks, *WRPs will create a top-level resource in Chef (like
package) that's available for use in your recipe and other cookbook's recipes as well.
The difference between and LWRP and HWRP is really the Ruby. HWRPs use full-blown Ruby classes. If you aren't a Ruby developer, they may be a bit intimidating. Nonetheless, you should give it a try before writing and LWRP. LWRPs use a Chef-specific DSL for creating resources. At the end of the day, they compile to (roughly) the same code as the Heavy-weight counterpart. I'll link some references at the end. You have access to Chef resources inside either implementation, as well as the run_context.
Finally, "libraries" (notice the quotes) are often misunderstood and abused. They are Ruby code, evaluated as Ruby, so they can do pretty much anything. HWRPs are actually a form of a library. Sometimes people use libraries as "helpers". They will create a helper module with methods like
aggregate_some_data and then "mix" (Rubyism) that library into their recipes or resources to DRY things up. Other times, libraries can be use to "hack" Chef itself. The partial-search cookbook is a good example of this. Facebook talked about how they limited the number of attributes sent back to the server last year at ChefConf too. Libraries are really an undefined territory because they are the keys to the kingdom.
So, while I haven't actually answered your question (because it's opinion-based), I hope I've given you enough information about the best direction moving forward. Keep in mind that every infrastructure is a special snowflake and there is no right answer; there's only a best answer. I'd suggest sharing this information with your team and weighing the pros and cons of each approach. You can also try the Chef mailing list, on which people will give you many opinions.