-
Notifications
You must be signed in to change notification settings - Fork 1.8k
processor_tda: Implement Topological Data Analysis (TDA) plugin for metrics #11250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds Ripser v1.2.1 as an optional bundled library, exposes a C wrapper API and C++ wrapper, introduces a new TDA processor plugin that computes Betti numbers from time-series via delay embedding, and wires build, packaging, tests, and installation to conditionally include Ripser support. Changes
Sequence Diagram(s)sequenceDiagram
participant Metrics as Metrics Stream
participant Processor as TDA Processor
participant Window as Sliding Window
participant Embed as Delay Embedding
participant DistMat as Dense→Compressed Builder
participant Ripser as Ripser Engine
participant Export as Metrics Export
Metrics->>Processor: incoming metric points
Processor->>Window: append / rotate samples
Window->>Processor: snapshot when window ready
Processor->>Embed: build embedded vectors (m, τ)
Embed->>DistMat: compute dense pairwise distances
DistMat->>Ripser: convert to compressed & run
Ripser-->>Processor: emit intervals / betti counts (via bridge)
Processor->>Export: emit betti gauges (betti0, betti1, betti2)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
47dccf4 to
327ad4a
Compare
d7c8e49 to
162f01e
Compare
7c7cad7 to
9f9d30b
Compare
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
This processor plugin performs Topological Data Analysis (TDA) on metrics using ripser, which computes persistent homology. The plugin aggregates incoming counters, gauges and untyped metrics into a 1-D time series, keeps a sliding window, builds a dense distance matrix and runs ripser through the new flb_ripser_* wrapper helpers. The resulting Betti numbers (currently betti0 and betti1) are exported as additional gauge metrics. TDA and persistent homology can help reveal hidden order or phase transitions in complex systems that are not easily visible from raw time series. Similar approaches have already been explored in condensed matter physics, for example: Donato, I., Gori, M., & Sarti, A. (2016). Persistent homology analysis of phase transitions. Physical Review E, 93, 052138. https://doi.org/10.1103/PhysRevE.93.052138 Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
The TDA metrics processor now supports an optional delay embedding of the
aggregated metric vectors before building the dense distance matrix
used by Ripser.
When `embed_dim > 1`, we reconstruct a Takens-style delay embedding
x_t -> (x_t, x_{t-マм, ..., x_{t-(m-1)マм)
over the sliding window, where `m = embed_dim` and `マ= embed_delay`.
Each embedded point is a flattened vector of size
feature_dim テm
and we keep using an Euclidean distance on this reconstructed phase
space.
This makes the processor more sensitive to occasional cyclic / quasi-
periodic regimes in the metric time series: loops in the reconstructed
trajectory translate into H1 features in the persistent homology. When
`embed_dim = 1`, the behaviour is unchanged and we fall back to the
original "no embedding" mode.
This change also adds two configuration options:
- `embed_dim` (int, default: 3)
Delay embedding dimension m.
Set to 1 to disable delay embedding.
- `embed_delay` (int, default: 1)
Lag マin samples between successive delays.
The design follows the standard delay embedding approach from Takens'
theorem, which shows that (under mild conditions) the attractor of an
unknown dynamical system can be reconstructed from a single observed
time series via delay coordinates.
Reference
- F. Takens, "Detecting strange attractors in turbulence",
in D. Rand and L.-S. Young (eds.), Dynamical Systems and Turbulence,
Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366-381.
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Expose threshold as a quantile-based distance scale selector. Signed-off-by: Hiroshi Hatake <[email protected]>
…tions Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
This is because tda processor could support other types of processing. Especially, it's for traces. But now, it's only for metrics pipeline. Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
657ca61 to
86b83e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
lib/ripser-1.2.1/ripser_internal.hpp (1)
89-89: Fix include guard comment mismatch.The closing comment references
RIPSER_INTERNAL_Hbut the opening guard at line 20 usesRIPSER_INTERNAL_HPP. Update for consistency.🔎 Suggested fix
-#endif /* RIPSER_INTERNAL_H */ +#endif /* RIPSER_INTERNAL_HPP */
🧹 Nitpick comments (1)
plugins/processor_tda/tda.c (1)
926-940: Unused threshold computation.The
thresholdvariable computed at line 936 is never used. The subsequent multi-quantile scan (lines 945-993) computes a freshthrfor each quantile candidate, making this computation dead code.Consider removing these lines or using
thresholdas a fallback/default if the multi-quantile scan produces no valid results.🔎 Option 1: Remove unused code
- if (m == 1) { - q = 0.5; /* No delay embedding: use something like the median. */ - } - else { - q = 0.2; /* With delay embedding: look at a smaller scale. */ - } - - /* --- choose a scale for TDA --- - * Use the number of embedded points n_embed to determine the threshold. - */ - threshold = tda_choose_threshold_from_dist(ctx, dist, n_embed, q); - if (threshold <= 0.0f) { - threshold = 0.0f; - } - memset(&betti, 0, sizeof(betti));
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
cmake/plugins_options.cmakedockerfiles/Dockerfile.centos7include/CMakeLists.txtinclude/fluent-bit/ripser/flb_ripser_wrapper.hlib/ripser-1.2.1/CMakeLists.txtlib/ripser-1.2.1/ripser.cpplib/ripser-1.2.1/ripser_internal.hpppackaging/distros/centos/Dockerfileplugins/CMakeLists.txtplugins/processor_tda/CMakeLists.txtplugins/processor_tda/tda.cplugins/processor_tda/tda.hsrc/CMakeLists.txtsrc/ripser/CMakeLists.txtsrc/ripser/flb_ripser_wrapper.cpptests/internal/CMakeLists.txttests/internal/ripser.c
🚧 Files skipped from review as they are similar to previous changes (7)
- packaging/distros/centos/Dockerfile
- plugins/processor_tda/tda.h
- plugins/processor_tda/CMakeLists.txt
- plugins/CMakeLists.txt
- src/ripser/CMakeLists.txt
- dockerfiles/Dockerfile.centos7
- tests/internal/ripser.c
🧰 Additional context used
🧠 Learnings (14)
📓 Common learnings
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: src/ripser/flb_ripser_wrapper.cpp:73-106
Timestamp: 2025-12-08T05:21:45.014Z
Learning: In the TDA processor (processor_tda) for Fluent Bit, the dimension limit FLB_RIPSER_MAX_BETTI_DIM is intentionally capped at 3 because the plugin uses embed_dim=3 and delay=1 in practice, and computing higher dimensions (>4) would be computationally prohibitive. The 8-slot betti array allocation is conservative headroom.
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:38:56.340Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.
📚 Learning: 2025-12-22T05:38:56.340Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:38:56.340Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.
Applied to files:
lib/ripser-1.2.1/CMakeLists.txtlib/ripser-1.2.1/ripser.cpptests/internal/CMakeLists.txtinclude/CMakeLists.txtinclude/fluent-bit/ripser/flb_ripser_wrapper.hsrc/CMakeLists.txt
📚 Learning: 2025-12-08T05:21:45.014Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: src/ripser/flb_ripser_wrapper.cpp:73-106
Timestamp: 2025-12-08T05:21:45.014Z
Learning: In the TDA processor (processor_tda) for Fluent Bit, the dimension limit FLB_RIPSER_MAX_BETTI_DIM is intentionally capped at 3 because the plugin uses embed_dim=3 and delay=1 in practice, and computing higher dimensions (>4) would be computationally prohibitive. The 8-slot betti array allocation is conservative headroom.
Applied to files:
cmake/plugins_options.cmakelib/ripser-1.2.1/ripser.cppplugins/processor_tda/tda.cinclude/fluent-bit/ripser/flb_ripser_wrapper.hsrc/ripser/flb_ripser_wrapper.cpp
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
cmake/plugins_options.cmakeinclude/CMakeLists.txtsrc/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
cmake/plugins_options.cmakelib/ripser-1.2.1/ripser.cppinclude/CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
cmake/plugins_options.cmakeinclude/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
cmake/plugins_options.cmakelib/ripser-1.2.1/ripser.cpp
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
lib/ripser-1.2.1/ripser.cpp
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
lib/ripser-1.2.1/ripser.cpp
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
lib/ripser-1.2.1/ripser.cpp
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
lib/ripser-1.2.1/ripser.cpp
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
lib/ripser-1.2.1/ripser.cpp
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
lib/ripser-1.2.1/ripser.cppinclude/CMakeLists.txtinclude/fluent-bit/ripser/flb_ripser_wrapper.h
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.
Applied to files:
include/CMakeLists.txt
🧬 Code graph analysis (3)
lib/ripser-1.2.1/ripser.cpp (1)
lib/ripser-1.2.1/ripser_internal.hpp (4)
i(58-58)rows(59-59)dim(72-76)dim(72-72)
include/fluent-bit/ripser/flb_ripser_wrapper.h (1)
src/ripser/flb_ripser_wrapper.cpp (4)
flb_ripser_compute_betti_from_dense_distance(119-171)flb_ripser_compute_betti_from_dense_distance(119-124)flb_ripser_compute_intervals_from_dense_distance(200-235)flb_ripser_compute_intervals_from_dense_distance(200-206)
src/ripser/flb_ripser_wrapper.cpp (2)
lib/ripser-1.2.1/ripser.cpp (12)
i(236-238)i(236-236)i(241-243)i(241-241)i(269-275)i(269-269)i(288-294)i(288-288)i(409-411)i(409-409)ripser_run_from_compressed_lower(968-986)ripser_run_from_compressed_lower(968-973)lib/ripser-1.2.1/ripser_internal.hpp (4)
i(58-58)dim(72-76)dim(72-72)ripser_run_from_compressed_lower(82-87)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (81)
- GitHub Check: PR - container builds / Windows container images (2025)
- GitHub Check: PR - container builds / Windows container images (2022)
- GitHub Check: PR - packages build Linux / debian/bullseye package build and stage to S3
- GitHub Check: PR - packages build Linux / rockylinux/10 package build and stage to S3
- GitHub Check: PR - packages build Linux / debian/bullseye.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / raspbian/bookworm package build and stage to S3
- GitHub Check: PR - packages build Linux / ubuntu/24.04.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / ubuntu/22.04.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / ubuntu/22.04 package build and stage to S3
- GitHub Check: PR - packages build Linux / ubuntu/24.04 package build and stage to S3
- GitHub Check: PR - packages build Linux / debian/buster.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / almalinux/10 package build and stage to S3
- GitHub Check: PR - packages build Linux / debian/buster package build and stage to S3
- GitHub Check: PR - packages build Linux / debian/trixie package build and stage to S3
- GitHub Check: PR - packages build Linux / almalinux/9.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / rockylinux/9.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / debian/bookworm.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / debian/trixie.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / centos/10 package build and stage to S3
- GitHub Check: PR - packages build Linux / debian/bookworm package build and stage to S3
- GitHub Check: PR - packages build Linux / almalinux/9 package build and stage to S3
- GitHub Check: PR - packages build Linux / almalinux/8 package build and stage to S3
- GitHub Check: PR - packages build Linux / centos/10.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / almalinux/10.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / almalinux/8.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / centos/9 package build and stage to S3
- GitHub Check: PR - packages build Linux / centos/8 package build and stage to S3
- GitHub Check: PR - packages build Linux / centos/7.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / rockylinux/8 package build and stage to S3
- GitHub Check: PR - packages build Linux / rockylinux/9 package build and stage to S3
- GitHub Check: PR - packages build Linux / rockylinux/10.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / centos/8.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / rockylinux/8.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / amazonlinux/2 package build and stage to S3
- GitHub Check: PR - packages build Linux / amazonlinux/2.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / centos/9.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / amazonlinux/2023.arm64v8 package build and stage to S3
- GitHub Check: PR - packages build Linux / amazonlinux/2023 package build and stage to S3
- GitHub Check: PR - packages build Linux / centos/7 package build and stage to S3
- GitHub Check: PR - container builds / arm/v7/production container image build
- GitHub Check: PR - container builds / amd64/production container image build
- GitHub Check: PR - container builds / amd64/debug container image build
- GitHub Check: PR - container builds / arm64/debug container image build
- GitHub Check: PR - container builds / arm/v7/debug container image build
- GitHub Check: PR - container builds / arm64/production container image build
- GitHub Check: PR - packages build Windows / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: PR - packages build Windows / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: PR - packages build Windows / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: PR - packages build MacOS / call-build-macos-package (Intel macOS runner, macos-14-large, 3.31.6)
- GitHub Check: PR - packages build MacOS / call-build-macos-package (Apple Silicon macOS runner, macos-14, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: PR - fuzzing test
🔇 Additional comments (30)
cmake/plugins_options.cmake (1)
77-77: LGTM!The TDA processor option follows the established pattern for processor plugins, with a clear description and default enabled state consistent with other processors.
tests/internal/CMakeLists.txt (1)
169-174: LGTM!The conditional test inclusion for Ripser follows the established pattern used by other optional features like LuaJIT, ensuring test coverage is properly gated by the feature flag.
include/CMakeLists.txt (1)
46-53: LGTM!The Ripser header installation block follows the same pattern as other optional components (e.g., simdutf), with proper feature flag gating and standard permissions.
src/CMakeLists.txt (2)
366-369: LGTM!The Ripser subdirectory inclusion follows the established pattern for optional components.
458-466: LGTM!The Ripser dependency linking follows the established pattern, ensuring both the core Ripser library and the Fluent Bit wrapper are linked when the feature is enabled.
lib/ripser-1.2.1/CMakeLists.txt (1)
1-10: LGTM!The CMake configuration for the Ripser static library is straightforward and correct, with appropriate include directory setup and C++11 requirement.
src/ripser/flb_ripser_wrapper.cpp (4)
39-54: LGTM!The dense-to-compressed matrix conversion correctly extracts the lower triangular portion with proper indexing (i > j) and reserves the exact size needed.
73-106: LGTM!The Betti interval callback correctly filters out invalid intervals (negative dimensions, non-finite values, death ≤ birth, and low persistence < 1e-3). The dimension cap at
FLB_RIPSER_MAX_BETTI_DIM(3) is intentional per the design constraints.Based on learnings, the dimension limit of 3 is intentional because the plugin uses embed_dim=3 and delay=1, and higher dimensions would be computationally prohibitive.
119-171: LGTM!The public API correctly validates inputs, caps max_dim to 8, converts the dense matrix, runs Ripser with appropriate threshold handling (enclosing radius mode when threshold ≤ 0), and properly fills the output structure.
200-234: LGTM!The interval computation API correctly validates inputs, sets up the callback bridge, and runs Ripser with the same threshold semantics as the Betti computation API.
lib/ripser-1.2.1/ripser_internal.hpp (2)
30-64: LGTM!The type definitions, compressed distance matrix template, and layout enum are well-structured. The quadratic formula in the constructor (line 43) correctly computes the matrix size from the compressed vector length.
66-77: LGTM!The interval recorder struct provides a clean callback interface with default initialization and safe null-checking in the emit method.
include/fluent-bit/ripser/flb_ripser_wrapper.h (4)
29-29: LGTM!The dimension limit of 3 is intentional and appropriate for the TDA processor's use case with embed_dim=3 and delay=1.
Based on learnings, this cap prevents computationally prohibitive calculations for higher dimensions.
33-46: LGTM!The data structures are well-designed with clear documentation. The 8-slot betti array provides conservative headroom while the practical limit remains at dimension 3.
62-67: LGTM!The function signature is well-documented with clear parameter descriptions and return value semantics. The threshold behavior (≤ 0 uses enclosing radius) is properly documented.
87-93: LGTM!The interval computation API provides a flexible callback-based interface for users who need access to individual persistence intervals rather than just the Betti number summary.
plugins/processor_tda/tda.c (9)
39-140: LGTM!The comparison function and threshold selection logic are correctly implemented with proper null checks, memory allocation error handling, and boundary conditions for quantile calculation.
142-173: LGTM!The window creation function properly handles allocation failures and cleans up resources on error paths.
179-295: LGTM!The group registration helpers properly handle memory allocation failures and roll back partial allocations when hash table insertion fails.
350-462: LGTM!The group building logic correctly handles error paths, including the fix for the potential use-after-free when
last_vecallocation fails.
476-575: LGTM!The vector construction properly handles the first sample case, computes rates with time delta safeguards, and applies log1p normalization while preserving sign.
577-638: LGTM!The ingest function correctly handles ring buffer overflow by dropping oldest samples, and properly frees all temporary allocations.
1038-1136: LGTM!The processor lifecycle functions properly initialize, clean up, and handle all allocated resources with appropriate null checks.
1138-1188: LGTM!The process metrics function correctly initializes groups and window on first call, and the gauge pointer reset is intentional since each
metrics_contextmanages its own gauge objects through the cmetrics lifecycle.
1191-1233: LGTM!The configuration map and plugin definition are properly structured with sensible defaults and correctly wired callbacks.
lib/ripser-1.2.1/ripser.cpp (5)
1-78: LGTM!License headers properly attribute both the original MIT-licensed Ripser code and the Fluent Bit modifications.
219-297: LGTM!The distance matrix implementations correctly handle triangular matrix access patterns and diagonal elements.
372-817: LGTM!The core Ripser persistence algorithm implementation is correctly integrated with the
interval_recordercallback mechanism for emitting persistence intervals.
947-986: LGTM!The edge extraction specializations and the
ripser_run_from_compressed_lowerentry point correctly integrate Ripser with the Fluent Bit wrapper, using Z/2Z coefficients for the homology computation.
988-1305: Standalone executable code disabled for Fluent Bit build.The
#ifdef RIPSEREXEsection contains the CLI frontend and is not compiled when building for Fluent Bit. Per previous discussion, this vendored code is preserved as-is to simplify future upstream updates.
This PR introduces a new processor plugin,
tda, which performs Topological Data Analysis (TDA) on stream metrics using persistent homology.The plugin aggregates incoming counters, gauges, and untyped metrics into a unified n-dimensional feature vector, maintains a sliding window, and utilizes a C-wrapped version of Ripser to compute Betti numbers.
Implementation Details:
Multiple metric streams are mapped to a fixed feature dimension. To handle varying magnitudes and bursty traffic:
log1p(natural logarithm of 1 + magnitude) to dampen dynamic range before distance calculation.The plugin keeps a ring buffer of these vectors. Before processing, it optionally applies Delay Embedding (see below) to reconstruct the phase space geometry.
A dense Euclidean distance matrix is computed from the window. Ripser determines the persistence intervals, which are summarized into Betti numbers exported as new gauges:
fluentbit.tda.betti0: Connected components (clusters).fluentbit.tda.betti1: Loops/Cycles (recurrence).fluentbit.tda.betti2: Voids (higher-order structures).Delay Embedding (Takens' Theorem):
This plugin supports an optional delay embedding [2] of the aggregated metric vectors. When$x_t$ as:
embed_dim > 1, we reconstruct the state space vectorsWhere:
embed_dimembed_delayThis transformation allows the processor to detect cyclic or quasi-periodic regimes (loops in the trajectory) even from limited metric dimensions. These loops translate into$H_1$ features in the persistent homology. If
embed_dim = 1(default), the behavior falls back to the original "no embedding" mode.Motivation:
TDA and persistent homology can help reveal hidden order, phase transitions, or subtle cyclic behaviors in complex systems that are not easily visible from raw time series or standard statistical aggregates. Similar approaches have been explored in condensed matter physics [1] for detecting phase transitions.
Configuration Options:
window_size(int, default: 60): Number of samples to keep in the TDA sliding window.min_points(int, default: 10): Minimum number of samples required before running Ripser.embed_dim(int, default: 3): Delay embedding dimension (embed_delay(int, default: 1): Lag (threshold(double, default: 0): Distance scale selector. 0 enables auto multi-quantile scan; (0,1) uses the specific quantile.References:
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
Additional Log:
For just one-time failing case, there is no increasing betti1 and betti2 metrics.
But intermittent failing cases just like the above, this higher order of metrics would raise and detected some of the "phase transitions" which means that there's no stable phase.
This log is macOS's memory leak detector:
There's no leaks in this plugin.
Plus, there's no rules but the TDA metrics tells there's something happens with betti2 and betti1 metrics with non-zeros:
This metrics' detector is different direction to lighten in the depth of anomaly detections.
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
fluent/fluent-bit-docs#2277
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Documentation
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.