Overview

This release addresses a critical stability issue in the Inspector component where message timestamps outside the configured time window could trigger an IndexError, halting data inspection and analysis.

Bug Fix: Out-of-Bounds Index

Issue:
When messages arrived with timestamps earlier or later than the expected [begin_timestamp, end_timestamp] range, the internal _count_errors() method attempted to write counts beyond the size of the counts array, resulting in:

IndexError: index XXXX is out of bounds for axis 0 with size YYYY

Resolution:

Added bounds checking and filtering of invalid time indices.
Safely ignore timestamps that fall outside the configured time range.
Added clear warnings in logs when such messages are skipped.
Introduced new unit tests verifying correct counting and stability.

Result:
The Inspector now handles out-of-range messages gracefully without crashing, ensuring uninterrupted operation and accurate in-range metric aggregation.

Additional Improvements

Minor logging improvements for debugging time range and timestamp issues.
Enhanced test coverage for _count_errors() edge cases (empty batches, duplicate timestamps, out-of-range messages).

Verification

A new test (test_count_errors_valid_and_out_of_range) confirms that:

Counts are correctly computed for valid timestamps.
Out-of-range messages are ignored safely.
No IndexError or unexpected array resizing occurs.

⚙️ Upgrade Notes

No configuration changes are required.
It is recommended to upgrade to this version if your Inspector processes messages that may arrive late or have timestamp drift.

heiDGAF Release Notes

Version 1.0.0-rc1 - Release Candidate

Release Date: October 2025

Overview

We are excited to announce the first release candidate of heiDGAF (Heidelberg Domain Generation Algorithm Framework), a comprehensive real-time DNS anomaly detection pipeline designed for cybersecurity applications. This release provides a complete end-to-end solution for detecting malicious domain generation algorithms (DGAs) and suspicious DNS traffic patterns.

Key Features

Real-Time Processing Pipeline

5-Stage Architecture: Modular pipeline design with Log Storage, Log Collection, Log Filtering, Inspection, and Detection stages
Apache Kafka Integration: Asynchronous message processing with exactly-once semantics
ClickHouse Database: High-performance analytics database for monitoring and logging
Docker Support: Complete containerized deployment with docker-compose

Advanced Anomaly Detection

Time-Series Analysis: StreamAD-based anomaly detection with univariate, multivariate, and ensemble models
Machine Learning Classification: Pre-trained ML models for DGA detection using Random Forest and XGBoost
Two-Tier Detection: Initial time-series filtering followed by domain-level classification
Configurable Thresholds: Flexible scoring and anomaly thresholds for different deployment scenarios

Comprehensive Data Processing

Flexible Log Format: Configurable logline parsing with type validation and relevance filtering
Batch Processing: Intelligent batching with subnet-based grouping and temporal windowing
Feature Engineering: Advanced domain name feature extraction including entropy, character distributions, and linguistic patterns
Data Validation: Multi-stage validation ensuring data integrity throughout the pipeline

Technical Specifications

Supported Detection Models

Time-Series Anomaly Detection (StreamAD):

Univariate: ZScoreDetector, KNNDetector, SpotDetector, SRDetector, OCSVMDetector, MadDetector, SArimaDetector
Multivariate: xStreamDetector, RShashDetector, HSTreeDetector, LodaDetector, RrcfDetector
Ensemble: WeightEnsemble, VoteEnsemble

Machine Learning Classification:

Random Forest (pre-trained, default)
XGBoost support
LightGBM support
Custom model integration via URL-based download with SHA256 validation

Data Sources & Datasets

Training Datasets: CIC-Bell-DNS-2021, DGTA-BENCH, DGArchive, Bambenek, heiCLOUD
Input Formats: DNS log files, Kafka topics
Output: Real-time alerts, monitoring dashboards, JSON warning logs

Monitoring & Operations

Comprehensive Monitoring

Fill Level Tracking: Real-time monitoring of data volumes across all pipeline stages
Performance Metrics: Batch processing statistics, timing data, and throughput monitoring
Health Checks: Stage-specific monitoring with busy state management
Alert Management: Structured alert generation with risk scoring

Configuration Management

Centralized Configuration: Single YAML file for all pipeline settings
Environment-Specific Settings: Separate configurations for development, testing, and production
Hot-Swappable Parameters: Runtime configuration updates for thresholds and model parameters

Installation & Deployment

System Requirements

Python 3.11+
Apache Kafka cluster
ClickHouse database
Docker & docker-compose (optional)

Quick Start

# Clone repository
git clone https://github.com/stefanDeveloper/heiDGAF.git

# Docker deployment
HOST_IP=127.0.0.1 docker compose -f docker/docker-compose.yml up

Configuration

Default configuration in config.yaml
Environment-specific overrides in docker/.env
Flexible logline format configuration with type validation

Performance Features

Scalability

Horizontal Scaling: Multi-broker Kafka setup with partitioned topics
Batch Optimization: Configurable batch sizes (default: 10,000 entries) with timeout handling
Memory Efficiency: Streaming processing with bounded memory usage
Concurrent Processing: Asynchronous processing across all pipeline stages

Reliability

Exactly-Once Processing: Kafka exactly-once semantics for data consistency
Error Handling: Comprehensive exception handling with graceful degradation
Data Validation: Multi-level validation ensuring data integrity
Monitoring Integration: Full observability with ClickHouse analytics

Security Features

Model Integrity

SHA256 Validation: Cryptographic validation of downloaded models
Secure Downloads: HTTPS-based model retrieval with checksum verification
Local Caching: Secure local storage of validated models

Data Protection

Input Validation: Comprehensive input sanitization and type checking
Anomaly Isolation: Secure processing of suspicious data without exposure
Audit Trails: Complete logging of all processing stages for forensic analysis

Documentation & Training

Comprehensive Documentation

API Documentation: Complete class and method documentation with consistent docstring style
Configuration Guide: Detailed configuration examples and best practices
Deployment Guide: Step-by-step deployment instructions
Model Training: Custom model training utilities and documentation

Training & Explanation Tools

Model Training Pipeline: End-to-end training workflow for custom models
Feature Engineering: Advanced domain name feature extraction utilities
Model Explanation: Interpretation and visualization tools for trained models
Dataset Utilities: Data preprocessing and validation tools for multiple dataset formats

What's New in RC1

Code Quality Improvements

Standardized Docstrings: All modules now follow consistent documentation style
Type Annotations: Complete type hints throughout the codebase
Error Handling: Enhanced exception handling and logging

Enhanced Training Pipeline

Multi-Dataset Support: Support for DGTA, Bambenek, CIC, DGArchive, and heiCLOUD datasets
Feature Extraction: Comprehensive domain name feature engineering
Model Export: Automated model packaging with SHA256 checksums
Hyperparameter Optimization: Optuna-based hyperparameter tuning

Configuration Updates

Time Window Configuration: 20ms time windows for high-resolution anomaly detection
Model Parameters: Updated default thresholds and model configurations
Kafka Topics: Standardized topic naming convention

Known Limitations

IPv6 Support: Limited IPv6 subnet handling (requires manual configuration)
Model Format: Currently supports pickle-based model serialization only
Real-time Constraints: Processing latency dependent on batch size and model complexity
Resource Requirements: Memory usage scales with batch size and model complexity

Upgrade Path

This is the first release candidate. The final 1.0.0 release will maintain backward compatibility for:

Configuration file formats
Kafka topic structures
ClickHouse schema definitions
Model artifact formats

Contributing

We welcome contributions! Please see our contributing guidelines for:

Code style and formatting requirements
Testing procedures and coverage requirements
Documentation standards
Model contribution guidelines

License

heiDGAF is released under the EUPL-1.2 license. Pre-trained models are also licensed under EUPL-1.2.

Repository: GitHub
Documentation: Read the Docs
Support: GitHub Issues

Releases: stefanDeveloper/heiDGAF

v1.0.1

Overview

Bug Fix: Out-of-Bounds Index

Additional Improvements

Verification

⚙️ Upgrade Notes

Uh oh!

v1.0.0

heiDGAF Release Notes

Version 1.0.0-rc1 - Release Candidate

Overview

Key Features

Real-Time Processing Pipeline

Advanced Anomaly Detection

Comprehensive Data Processing

Technical Specifications

Supported Detection Models

Data Sources & Datasets

Monitoring & Operations

Comprehensive Monitoring

Configuration Management

Installation & Deployment

System Requirements

Quick Start

Configuration

Performance Features

Scalability

Reliability

Security Features

Model Integrity

Data Protection

Documentation & Training

Comprehensive Documentation

Training & Explanation Tools

What's New in RC1

Code Quality Improvements

Enhanced Training Pipeline

Configuration Updates

Known Limitations

Upgrade Path

Contributing

License

Uh oh!