Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 23, 2025

Make github-runner-manager runnable as a standalone app

This PR implements changes to make the github-runner-manager application runnable as a regular user without requiring sudo, system users, or hardcoded system paths. The changes separate app-level concerns (unprivileged, portable) from charm-level concerns (privileged, policy-driven).

Summary of Changes

Core Refactoring: Single Base Directory with XDG State Home

Replaced multiple path options with a single base directory approach following XDG Base Directory specification for state files:

  • New CLI option: --base-dir (replaces --state-dir and --reactive-log-dir)
  • Default location: $XDG_STATE_HOME/github-runner-manager or ~/.local/state/github-runner-manager
  • Subdirectory structure:
    • state/ - for reconcile.id and other state files
    • logs/reactive/ - for reactive runner logs
    • logs/metrics/ - for metrics logs

Implementation Details

1. New utility functions in utilities.py:

  • get_base_dir(base_dir) - Gets base directory with XDG_STATE_HOME fallback
  • get_state_dir(base_dir) - Returns base_dir/state
  • get_reactive_log_dir(base_dir) - Returns base_dir/logs/reactive
  • get_metrics_log_dir(base_dir) - Returns base_dir/logs/metrics

2. CLI changes in cli.py:

  • Replaced --state-dir and --reactive-log-dir with single --base-dir option
  • Environment variable: GITHUB_RUNNER_MANAGER_BASE_DIR
  • Uses XDG_STATE_HOME for state files (logs, state) per XDG specification

3. Updated all modules to use base_dir parameter:

  • reconcile_service.py
  • manager/runner_scaler.py
  • reactive/runner_manager.py
  • reactive/process_manager.py

4. Python interpreter fix for reactive subprocesses:

  • Use sys.executable from parent process (was: hardcoded /usr/bin/python3)
  • Ensures reactive subprocesses inherit parent's Python environment

5. Simplified permissions model:

  • Removed chown operations - XDG directories are already writable by current user
  • Removed root user differentiation - application does not run as root
  • Spawn subprocesses as current user with inherited group

6. Metrics logging:

  • Follows XDG paths - stored in base_dir/logs/metrics/
  • Symlink created at /var/log/github-runner-metrics.log for grafana-agent
  • Defaults to stderr when METRICS_LOG_PATH not set

7. Charm integration in manager_service.py**:

  • Passes single --base-dir parameter pointing to ~runner-manager/.local/state/github-runner-manager
  • Sets METRICS_LOG_PATH environment variable to /var/log/github-runner-metrics.log (symlink)
  • Creates symlinks in /var/log for grafana-agent log scraping:
    • Symlink /var/log/reactive_runnerbase_dir/logs/reactive
    • Symlink /var/log/github-runner-metrics.logbase_dir/logs/metrics/github-runner-metrics.log
    • Symlinks are readable by grafana-agent (verified via logrotate configuration)
    • Helper function _create_or_update_symlink() handles symlink management

Testing

  • Updated 9 unit test files to use new base directory structure
  • All 312 unit tests passing
  • Tests cover XDG_STATE_HOME fallback logic and subdirectory creation

Acceptance Criteria Met

✅ Single base directory with organized subdirectories
✅ XDG-compliant defaults for state files ($XDG_STATE_HOME or ~/.local/state)
✅ Running manager via tox works without permission errors
✅ No sudo required in app-level tests
✅ No chown operations needed - directories are user-writable by default
✅ Application does not require or check for root privileges
✅ Charm interface updated to use --base-dir
✅ Symlinks created in /var/log for grafana-agent log scraping
✅ Metrics follow XDG paths with symlink for monitoring
✅ All tests passing

Backward Compatibility

  • Charm interface updated but remains functional
  • New --base-dir option simplifies path management
  • Environment variable METRICS_LOG_PATH still works
  • No breaking changes to application behavior

Benefits

  • Simpler: One path parameter instead of three
  • Cleaner: Organized subdirectory structure
  • Standard: Follows XDG Base Directory specification for state files
  • Unprivileged: No root or chown operations needed
  • Flexible: Charm creates symlinks to map logs to /var/log for monitoring
  • Correct: Uses XDG_STATE_HOME instead of XDG_DATA_HOME for runtime state
Original prompt

This section details on the original issue you should resolve

<issue_title>Make github-runner-manager runnable as a standalone app without charm/system user assumptions</issue_title>
<issue_description>### Enhancement Proposal

Summary

Running the manager directly (tox/pytest, CI) exposes several assumptions baked into the app about users, paths, and Python environment. These cause permission and import failures outside the charm. The app should run cleanly as the invoking user with sane defaults, while the charm applies privileged policies.

Observed problems

Writes reconcile.id to the invoking user’s home (e.g., /home/ubuntu/reconcile.id), failing when run under a different effective user.
Attempts to create/use /var/log/reactive_runner and chown it, failing without root.
Reactive subprocesses use system python and can’t import github_runner_manager despite --python-path; sudo loses tox virtualenv context.
Config file in pytest tempdir not readable when spawning under another user.
Requiring the runner-manager user in app-level tests adds brittle permission and group juggling.
Repro (CI/integration)

Start CLI with tox venv python and reactive mode, dispatch a workflow that spawns a reactive runner.
Errors seen:
PermissionError: [Errno 13] Permission denied: '/var/log/reactive_runner'
PermissionError: [Errno 1] Operation not permitted (chown/chmod/chown during setup)
ModuleNotFoundError: No module named 'github_runner_manager' from reactive subprocess
Error: Invalid value for '--config-file': '/tmp/.../config.yaml': Permission denied
PermissionError: [Errno 13] Permission denied: '/home/ubuntu/reconcile.id'
Root cause

App hardcodes system-level paths (/var/log/reactive_runner) and writes state (reconcile.id) to $HOME.
Assumes privilege to chown/chgrp and to run/react as a dedicated system user.
Reactive subprocess resolves python via PATH/system python rather than the parent interpreter.
Using sudo drops tox venv/PYTHONPATH context unless carefully preserved.
Goal
Make the application runnable as a regular user with no sudo or user switching, with defaults that “just work” in tests/CI and local dev. Keep charm-specific behavior (runner-manager user, /var/log) at the charm layer.

Proposal

State directory (locks, ids)
Add optional --state-dir and env GITHUB_RUNNER_MANAGER_STATE_DIR.
Default (when not set): follow XDG base dirs
XDG_RUNTIME_DIR (if set) → TMPDIR → XDG_STATE_HOME/github-runner-manager → ~/.local/state/github-runner-manager
Write reconcile.id (and any future lock/ledger files) under state dir.
Charm passes a managed state dir; tests rely on the default.
Logging defaults and overrides
Metrics: default to stderr; keep existing METRICS_LOG_PATH support. Charm sets /var/log/github-runner-metrics.log.
Reactive runner logs: default to a user-local directory under state dir (e.g., $STATE_DIR/reactive_runner). Add optional --reactive-log-dir (and env) for charm to set /var/log/reactive_runner.
App must not chown/chgrp unless running as root; otherwise create dirs with current user permissions only.
Python interpreter and import path
Reactive subprocesses should default to using sys.executable from the parent process (not /usr/bin/python3).
Keep existing --python-path as an optional sys.path augment; do not rely on sudo -E or PATH hacks.
This preserves a single source of truth (tox -e . editable install) and works in CI.

User and privilege model

  • App: run as the effective user without assuming runner-manager or root. Do not attempt user creation or group modification.
  • Charm: continue to:
  • Create runner-manager system user and home
  • Manage /var/log permissions and ownership
  • Run the service under runner-manager via systemd
  • Pass explicit --state-dir and log dir paths
  • SSH key location
  • Keep SSH keys under the effective user’s ~/.ssh by default (current behavior aligns with “run as invoking user”).
  • Charm continues to run as runner-manager, so keys are written under ~runner-manager/.ssh there.

Backward compatibility

  • If --state-dir is not provided, use new XDG default silently (no breaking CLI change).
  • Keep current metrics/log flags working; new defaults only affect direct, unflagged runs.
  • Charm paths and behavior remain unchanged; charm passes flags and owns permissions.

Acceptance criteria

  • Running manager via tox in CI spawns reactive runners without:
  • PermissionError on /var/log/reactive_runner
  • reconcile.id writing outside a writable state dir
  • ModuleNotFoundError in reactive subprocess
  • No sudo required in app-level tests; a single tox editable install suffices.
  • Charm continues to work unchanged by passing its preferred paths and running as runner-manager.

Tasks

  • Add --state-dir CLI option and GITHUB_RUNNER_MANAGER_STATE_DIR env support; pick XDG defaults when unset.
  • Write reconcile.id into state dir.
  • Default reactive logs into $STATE_DIR/reactive_runner; add optional --reactive-log-dir override.
  • Use sys.executable for reactive subprocesses...

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 4 commits December 23, 2025 11:42
…ults

- Add get_state_dir() utility with XDG Base Directory support
- Add --state-dir and --reactive-log-dir CLI options with env var support
- Move reconcile.id to use state directory instead of hardcoded home
- Use sys.executable for reactive subprocesses instead of /usr/bin/python3
- Guard chown/chgrp operations to only run when running as root
- Change metrics log default from file to stderr when METRICS_LOG_PATH not set
- Update charm systemd service to pass explicit state-dir and reactive-log-dir

Co-authored-by: yanksyoon <[email protected]>
- Add comprehensive tests for get_state_dir() XDG logic
- Update reactive process_manager tests to work with new implementation
- Tests cover explicit params, env vars, XDG fallbacks, and permission errors

Co-authored-by: yanksyoon <[email protected]>
- Update mock_runner_scaler_fixture to accept new reactive_log_dir parameter
- Fix reactive runner_manager tests to include reactive_log_dir in assertions
- Add METRICS_LOG_PATH backward compatibility alias for tests
- Update metrics test fixture to patch both TARGET and PATH

Co-authored-by: yanksyoon <[email protected]>
- Remove unused tempfile import from utilities.py
- Fix systemd ExecStart line continuation formatting

Co-authored-by: yanksyoon <[email protected]>
Copilot AI changed the title [WIP] Make github-runner-manager runnable as a standalone app Make github-runner-manager runnable as standalone app without system user assumptions Dec 23, 2025
Copilot AI requested a review from yanksyoon December 23, 2025 11:55
Copy link
Member

@yanksyoon yanksyoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Initial review to make the application standalone.

When not running as root, explicitly set the group to the current user's
group for spawned reactive processes. This ensures proper group inheritance
in unprivileged contexts while maintaining charm behavior when running as root.

Co-authored-by: yanksyoon <[email protected]>
Copy link
Member

@yanksyoon yanksyoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Simplify the code by adding defaults in the CLI input.

…tions

Changed default from empty string to None for --state-dir and --reactive-log-dir
to make the "not provided" semantics more explicit. Updated type hints and
simplified the config variable assignment logic accordingly.

Co-authored-by: yanksyoon <[email protected]>
Set python_path default to None for consistency with state_dir and
reactive_log_dir. Remove unnecessary conditional logic since all three
parameters now have explicit None defaults, making the code cleaner
and more straightforward.

Co-authored-by: yanksyoon <[email protected]>
Copy link
Contributor Author

Copilot AI commented Dec 23, 2025

@copilot Simplify the code by adding defaults in the CLI input.

Updated in commit 8b3a987. Simplified by setting explicit None defaults for all optional path parameters and removing unnecessary conditional checks.

Copy link
Member

@yanksyoon yanksyoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please start working on the comments

Changes based on code review feedback:
- Remove chown operations from _setup_logging_for_processes since XDG directories
  are already accessible to the current user
- Remove root user differentiation in subprocess spawning since the application
  does not run as root
- Make metrics log follow XDG paths with symlink in /var/log for grafana-agent
- Add _create_or_update_symlink helper function to handle both reactive and metrics
  symlinks
- Keep os import for kill_reactive_processes function
- Remove unused shutil and constants imports

Verification: Linux symlinks are readable by grafana-agent as confirmed by the
logrotate configuration which uses the same paths (src/logrotate.py lines 62-78).
Grafana-agent reads log files through standard file operations which work with
symlinks transparently.

Co-authored-by: yanksyoon <[email protected]>
@yanksyoon yanksyoon requested a review from yhaliaw December 24, 2025 04:16
@github-actions
Copy link
Contributor

TICS Quality Gate

❌ Failed

The following errors have occurred during analysis:

❌ [ERROR 5333] Project database 'github-runner-operator' already is connected to a started TICSQServer process that has not registered a stop time. To avoid data corruption, analysis is aborted. Details of the other TICSQServer process: started at 2025-12-22 09:55:19.000 by user ubuntu.

.github/workflows/tics.yaml / TICS / TICS GitHub Action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make github-runner-manager runnable as a standalone app without charm/system user assumptions

3 participants