What Does DevOps Mean?

Ideally, what a DevOps implementation looks like is a cross-functional, integrated team of Developers and Administrators who take responsibility from end-to-end on the features and fixes for which they are responsible. The specific developer who pushed the piece of code which is showing up in the error logs is responsible for diganosing the problem, developing a fix, testing the fix, and then rolling out the fix to the environment / the customers and making sure the result is what's desired. This works great if there are five employees and ten clients for your software. But again, scaling is the problem.

So what does "DevOps" look like at scale? Well, "calamatous" isn't entirely wrong.

It turns out that people who write software for a living are not that keen on also supporting a production environment running that code, nor are they particularly interested in customer service and client relations. Which is not unreasonable: the skillset for development and operations support are at best orthagonal. It's also important to point out that, for various sociocultrual reasons, developers are expensive. Especially when compared to, say, the average support or customer-facing role (the argument about why this is or whether it should be true is for another day).

So almost immediately, we see Google respond to this particular issue with the SRE role: Site Relilability Engineer. A role specifically dedicated to making sure the production environment is stable and available, with the expectation that these roles will also handle continuous improvement through automation and incident response. An SRE is expected to have deep knowledge of the infrastructure systems and the various ways they must be corrected and revised, as well as have development experience to build and maintain automation tools and services. The biggest "DevOps" company in the world invented a new role rather than commit to DevOps imbedding principles.

And what is an SRE? It's a systems administrator with bits in. It's a person who is responsible for maintenance and support and development of infrastructure and tooling and is point person for any incident response or escalation of issues. Three different jobs, requiring three different skillsets, now all collapsed into one roll. With, I would point out, no clear career path or role advancement opportunities: where does one go from SRE? Any other role available means abandoning at least one of the parts of the job that are required. An employee is either incentivised to develop and then abandon a third of their skillset, or to elide or falsify some combination of skills with the intention of either avoiding those situations or learning on the job... and then abandoning those skills.

Site Reliability Engineering is hard. It requires a level of dedication and development that is often significantly harder than pure software development due to the restrictions placed on the environment in which the SRE is operating. It is hard enough to be one thing well; SREs are expected to be at least three things well. Oh, and they're often not as well-compensated as traditional line developers. And they often have brutal on-call / off-shift responsibilities. In a healthy system, SREs shouldn't exist. The value and efficiencies of these roles do not redound onto the workers.

DevOps is terrible, and we're all fucked if we keep putting our eggs in the "devops" basket.