This is an expansion of the previous post.

Problems that sustainable ops solves:

1. Reusability of operations code. Once operations are codified, the same code can be reused across different servers. Simply copy the operations code, run it, and you get two identical service environments.

2. Composability of operations code. Codified operations can be stacked and composed. For example, deploying Kafka requires installing ZooKeeper first, and deploying a Codis cluster also requires ZooKeeper. Once ZooKeeper operations are codified, we can directly invoke the ZooKeeper operations code from both the Kafka and Codis operations code – like building blocks – without reimplementing ZooKeeper operations.

3. Parameterized operations data. When deploying Codis clusters in different data centers, passwords and cluster names differ. With codified operations, operations data is abstracted into variables through template languages like Jinja2. During deployment, different variable values are passed in as needed, enabling customized deployments of different Codis clusters and improving the compatibility of the operations code.

4. Encrypted storage of operations data. As mentioned above, Codis cluster data includes both plain data and sensitive data – for example, the cluster name is plain data, while the cluster password is sensitive data. With codified operations, both plain and encrypted data are stored directly in version control systems like Git. This provides a comprehensive and simple record of all operations, with sensitive data being encrypted, as shown below:

Plain data

Plain data

Sensitive data (encrypted)

Sensitive data (encrypted)

Sensitive data (decrypted)

Sensitive data (decrypted)

As shown, sensitive data appears as ciphertext when not decrypted. Both plain data and sensitive data (after decryption) are essentially variables. Encrypted storage of operations data enables access control. For example, we can control whose public keys (one or more people) can decrypt the sensitive data. Only authorized personnel (internal staff) can successfully execute the operations code. (In practice, SSH key-based authentication provides more granular server access control – managing who can operate which servers – while encrypted storage of operations data protects internal data.)

Encrypted storage of operations data also enables collaboration, because the operations repository ultimately contains only operations code and securely stored operations data. Furthermore, open source can be seen as a form of large-scale collaboration – it may not be the fastest approach, but it goes the furthest.

5. Continuous iteration of operations. I enjoy using open-source software, but maintaining the runtime environments of several or dozens of open-source applications is painful. For example, all my local Nginx instances use HTTPS, which relies on lego and dns-01 for automated certificate issuance and renewal. My internal Nginx, external Nginx, GitLab, and Jira each have their own certificates, all requiring creation and configuration of automatic certificate renewal – this is very energy-consuming. Several factors contribute: first, open-source software updates are typically hosted on GitHub/GitLab, which suffer from poor network accessibility in China, resulting in a bad download experience; second, continuously updating the runtime environments of several or dozens of open-source applications without codified operations means manual operations are too costly in time and effort, and repetitive operations increase the risk of mistakes. Therefore, codifying operations is essential for achieving stable, efficient, and continuous updates under manageable staffing (just myself), which relies on the support of Git and CI.

Now I only need to invoke this operations code within the playbooks that install Nginx, GitLab, and Jira, as shown below: lego role

This automatically completes certificate issuance (wildcard certificates) and configures all the cron jobs needed for automatic renewal. (If you also try using dns-01 for automatic certificate renewal, remember to add some randomness to the renewal schedule to avoid timing conflicts.)

Additionally, if I need to update the lego component, I only need to update the version variable for lego, commit the change to Git, and my CI will automatically download the latest lego and update the lego version on all servers, as shown below: lego version

This article primarily explains the necessity of separating operations and data in infrastructure as code. Codified operations enable continuous optimization, secure access control, broad collaboration, and efficient automation. This is what sustainable ops requires.