Skip to content

feat: replace Infisical with OpenBao as OSS secrets backend#8

Open
nbrieussel wants to merge 8 commits into
mainfrom
feat/001-replace-infisical-oss-secrets
Open

feat: replace Infisical with OpenBao as OSS secrets backend#8
nbrieussel wants to merge 8 commits into
mainfrom
feat/001-replace-infisical-oss-secrets

Conversation

@nbrieussel

Copy link
Copy Markdown
Contributor

Summary

  • Deploys OpenBao (MPL 2.0) as the cluster secrets backend via ArgoCD, replacing the planned Infisical integration
  • State persisted to Scaleway Object Storage (backup-dev-id) — secrets survive full cluster destroy/recreate
  • PostSync init Job handles first-time init + auto-unseal on every boot from S3 keys
  • External Secrets Operator (ESO) wired as the workload integration layer via ClusterSecretStore
  • CI pipeline validates the full destroy/recreate scenario against real Scaleway S3

Architecture

wave 0 — openbao (S3 backend, creds from scaleway-s3-credentials Secret)
wave 1 — external-secrets (ESO, provides ClusterSecretStore CRD)
wave 2 — openbao-init (PostSync Job: init/unseal from S3 + ESO ClusterSecretStore)

Test plan

  • helm-lint job: lint + render all charts
  • infisical-absent job: git grep -ri infisical returns no matches
  • secrets-restore job: write canary → destroy cluster → recreate → assert canary restored (SC-001)

Cross-repo dependency

The infra Terraform must create scaleway-s3-credentials Secret in the openbao namespace before ArgoCD first sync. Required keys: access_key, secret_key, bucket.

Spec

Full design in specs/001-replace-infisical-oss-secrets/ (spec, plan, research, data-model, contracts, quickstart, tasks).

🤖 Generated with Claude Code

https://claude.ai/code/session_01KoJ6zVHxYtUKtSNXinmct7

Deploys OpenBao (MPL 2.0) via ArgoCD app-of-apps with S3 state on
Scaleway Object Storage, so secrets survive full cluster destroy/recreate.

Key additions:
- platform/{local,scaleway}/openbao.yml  — OpenBao Helm chart (0.28.4),
  S3 backend pointing to backup-dev-id bucket, creds from K8s Secret
- platform/{local,scaleway}/external-secrets.yml  — ESO (2.6.0)
- apps/openbao-init/  — in-house Helm chart: PostSync init/unseal Job
  (reads/writes keys from S3), RBAC, per-app HCL policies, ESO
  ClusterSecretStore; idempotent across boots
- clusters/{local,scaleway}/templates/openbao-init.yaml  — ArgoCD
  Applications wiring the init chart into the app-of-apps tree
- Sync-wave order: wave 0 openbao → wave 1 ESO → wave 2 openbao-init
- .github/workflows/ci.yml  — 3 jobs: helm-lint, infisical-absent,
  secrets-restore (destroy/recreate canary test against real Scaleway S3)

Cross-repo dependency: infra Terraform must create scaleway-s3-credentials
Secret in openbao namespace before ArgoCD first sync.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KoJ6zVHxYtUKtSNXinmct7
Version 2.5.5 dropped the .zip distribution in favour of .deb / .rpm.
Also bump BAO_VERSION to 2.5.5 (latest).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KoJ6zVHxYtUKtSNXinmct7
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KoJ6zVHxYtUKtSNXinmct7
applicationsets CRD exceeds 262144-byte annotation limit with client-side
apply. --server-side bypasses the last-applied-configuration annotation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KoJ6zVHxYtUKtSNXinmct7
argocd binary was missing — added explicit install step using ARGOCD_VERSION
env var (v2.13.0).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KoJ6zVHxYtUKtSNXinmct7
The ArgoCD app-of-apps chain (bootstrap → clusters → platform → openbao)
takes 10-15 min and is hard to debug in CI. Replace with a standalone
scripts/smoke-test.sh that installs OpenBao + ESO + openbao-init directly
via helm, then runs the full destroy/recreate canary assertion (SC-001).

The script is also runnable locally: SCW_ACCESS_KEY=... bash scripts/smoke-test.sh

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KoJ6zVHxYtUKtSNXinmct7
helm --wait ensures deployments are ready but CRDs need an extra moment
to register in the API discovery cache. Without this, ClusterSecretStore
is not found when openbao-init chart is applied immediately after.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KoJ6zVHxYtUKtSNXinmct7
- Switch OpenBao backend from S3 (unsupported) to Raft integrated storage
  with snapshot-based S3 backup for disaster recovery
- Add curl to apk install in init script (was missing, causing all health
  checks to return "000")
- Remove -f flag from curl health check (caused "501000" instead of "501")
- Remove --sse AES256 from s3_put (Scaleway doesn't support this param)
- ESO ClusterSecretStore: v1beta1 → v1 (ESO 2.6.0 promoted CRDs)
- Init Job: backoffLimit 3→0 + ttlSecondsAfterFinished 600 for debugging
- smoke-test.sh: bao_token() reads root_token from S3 (not K8s auth)
  so it works after cluster destroy/recreate (signing key changes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01KoJ6zVHxYtUKtSNXinmct7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant