Skip to content

feat(chart): Ensure helper secret stability across upgrades#313

Open
tidusete wants to merge 1 commit intoSentinel-One:masterfrom
tidusete:fix/preserve-helper-secrets
Open

feat(chart): Ensure helper secret stability across upgrades#313
tidusete wants to merge 1 commit intoSentinel-One:masterfrom
tidusete:fix/preserve-helper-secrets

Conversation

@tidusete
Copy link

Description

This PR addresses an idempotency issue where the helper's TLS certificate (sentinelone-helper) and server-token (sentinelone-helper-token) secrets were regenerated on every helm upgrade, Flux reconciliation, or terraform apply.

Problem

Constant rotation of these secrets causes several operational issues:

  • Service Disruption: Regenerating the TLS certificate can interrupt communication between agents and the helper service.
  • GitOps Issue: It creates unnecessary noise in GitOps diffs (e.g., in ArgoCD or Flux), making it difficult to identify meaningful changes.
  • Unpredictable State: The cluster's state changes with every reconciliation, even when no configuration values have been modified.

Solution

This change introduces the use of the Helm lookup() function to check if the helper secrets already exist in the cluster before rendering them.

  • On first installation: The secrets are generated and created as before.
  • On subsequent upgrades/reconciliations: If the secrets are found, their existing data is reused, preventing rotation.

The helm.sh/resource-policy: keep annotation remains on the secrets to prevent accidental deletion during a helm uninstall, preserving their state for future installations if desired.

ArgoCD Considerations

Due to the timing of how ArgoCD renders templates, lookup() may not find the secret during the sync planning phase and will attempt to regenerate it. The recommended workaround for ArgoCD users is to add ignoreDifferences rules for the /data field on both secrets within the Application resource specification.

Example ArgoCD ignoreDifferences:

spec:
  # ...
  ignoreDifferences:
  - group: ""
    kind: Secret
    name: sentinelone-helper
    jsonPointers:
    - /data
  - group: ""
    kind: Secret
    name: sentinelone-helper-token
    jsonPointers:
    - /data

Use lookup() to check whether the helper TLS certificate secret and the
helper server-token secret already exist in the cluster before rendering
new values. If they do, their existing data is reused verbatim so that
helm upgrade, Flux reconciliations and Terraform applies no longer
rotate certificates or tokens on every run.

On a first install the secrets are generated as before. The
helm.sh/resource-policy: keep annotation prevents accidental deletion
on helm uninstall.

The same lookup-based preservation is applied to the helper secret in
webhookconfiguration.yaml (webhooks path), including the caBundle used
by the MutatingWebhookConfiguration and ValidatingWebhookConfiguration.

For ArgoCD deployments, lookup() returns nil during helm-template
rendering so certs are still regenerated each sync. The recommended
mitigation is to add ignoreDifferences for /data on both secrets in the
ArgoCD Application spec.
@oded-s1
Copy link
Collaborator

oded-s1 commented Feb 26, 2026

@tidusete thanks for the PR.
Can you elaborate more on the motivation for this change?
Currently the helper certificate expiration duration is 365 days (the default value and can be configured). The reason we regenerate it on helm upgrade is to reset the expiration so it will not expire (assuming customers will upgrade the agent at lease once a year).
On helm upgrade we redeploy the agent and helper so the certificate should remain valid.
For argoCD our recommendation is to add the ignoreDifferences as you mentioned to avoid unnecessary noise in GitOps diffs.

@tidusete
Copy link
Author

Hey @oded-s1, I understand the intent behind regenerating certs on upgrade, but the problem in practice is that this happens on every helm upgrade not just cert-related ones. Because Helm applies the new secret and webhook configuration before the StatefulSet and DaemonSet rollouts complete, there is always a window where part of the pods are serving with the old certificate and part with the new one. This is an avoidable source of errors that makes troubleshooting harder, not easier.

I think this is also the same root issue that #242 tried to address.

From a GitOps and Helm maintainability perspective, secrets should be stable and changes should be explicit and intentional. What this PR achieves is exactly that: the certificate is preserved across upgrades, so routine changes don't cause unnecessary disruption. When you actually need to rotate (whether for expiry or any other reason) you delete the secret and it will be regenerated on the next helm upgrade or ArgoCD sync. The rotation becomes a deliberate, traceable operation rather than a side effect of every deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants