processor_tda: Implement Topological Data Analysis (TDA) plugin for metrics #11250

cosmo0920 · 2025-12-03T07:23:25Z

This PR introduces a new processor plugin, tda, which performs Topological Data Analysis (TDA) on stream metrics using persistent homology.

The plugin aggregates incoming counters, gauges, and untyped metrics into a unified n-dimensional feature vector, maintains a sliding window, and utilizes a C-wrapped version of Ripser to compute Betti numbers.

Implementation Details:

Metric Aggregation & Normalization:
Multiple metric streams are mapped to a fixed feature dimension. To handle varying magnitudes and bursty traffic:
- Counters are converted to rates (differentiated against the previous snapshot).
- Values are normalized using log1p (natural logarithm of 1 + magnitude) to dampen dynamic range before distance calculation.
Sliding Window & Phase Space Reconstruction:
The plugin keeps a ring buffer of these vectors. Before processing, it optionally applies Delay Embedding (see below) to reconstruct the phase space geometry.
Persistent Homology via Ripser:
A dense Euclidean distance matrix is computed from the window. Ripser determines the persistence intervals, which are summarized into Betti numbers exported as new gauges:
- fluentbit.tda.betti0: Connected components (clusters).
- fluentbit.tda.betti1: Loops/Cycles (recurrence).
- fluentbit.tda.betti2: Voids (higher-order structures).

Delay Embedding (Takens' Theorem):

This plugin supports an optional delay embedding [2] of the aggregated metric vectors. When embed_dim > 1, we reconstruct the state space vectors $x_t$ as:

$$x_t \to (x_t, x_{t-\tau}, \dots, x_{t-(m-1)\tau})$$

Where:

$m =$ embed_dim
$\tau =$ embed_delay

This transformation allows the processor to detect cyclic or quasi-periodic regimes (loops in the trajectory) even from limited metric dimensions. These loops translate into $H_1$ features in the persistent homology. If embed_dim = 1 (default), the behavior falls back to the original "no embedding" mode.

Motivation:

TDA and persistent homology can help reveal hidden order, phase transitions, or subtle cyclic behaviors in complex systems that are not easily visible from raw time series or standard statistical aggregates. Similar approaches have been explored in condensed matter physics [1] for detecting phase transitions.

Configuration Options:

window_size (int, default: 60): Number of samples to keep in the TDA sliding window.
min_points (int, default: 10): Minimum number of samples required before running Ripser.
embed_dim (int, default: 3): Delay embedding dimension ($m$). Set to 1 to disable.
embed_delay (int, default: 1): Lag ($\tau$) in samples between successive delays.
threshold (double, default: 0): Distance scale selector. 0 enables auto multi-quantile scan; (0,1) uses the specific quantile.

References:

Donato, I., Gori, M., & Sarti, A. (2016). Persistent homology analysis of phase transitions. Physical Review E, 93, 052138.
F. Takens, "Detecting strange attractors in turbulence", in D. Rand and L.-S. Young (eds.), Dynamical Systems and Turbulence, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366-381.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

Example configuration file for the change

service:
  http_server: On
  http_port: 2021
pipeline:
    inputs:
      - name: dummy
        tag: log.raw
        samples: 10000
      - name: fluentbit_metrics
        tag: metrics.raw

        processors:
          metrics:
            - name: metrics_selector
              metric_name: /process_start_time_seconds/
              action: exclude
            - name: metrics_selector
              metric_name: /build_info/
              action: exclude
            - name: tda

    outputs:
      - name: stdout
        match: '*'

Additional Log:

2025-12-03T07:27:16.013990065Z fluentbit_tda_betti0 = 39
2025-12-03T07:27:16.013990065Z fluentbit_tda_betti1 = 7
2025-12-03T07:27:16.013990065Z fluentbit_tda_betti2 = 0
[2025/12/03 16:27:16.930210000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:16.930442000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:16.930461000] [error] [output:http:http.0] no upstream connections available to localhost:8443
[2025/12/03 16:27:16.930554000] [ warn] [engine] failed to flush chunk '30288-1764746835.908400000.flb', retry in 9 seconds: task_id=5, input=dummy.0 > output=http.0 (out_id=0)
2025-12-03T07:27:18.017259794Z fluentbit_tda_betti0 = 40
2025-12-03T07:27:18.017259794Z fluentbit_tda_betti1 = 7
2025-12-03T07:27:18.017259794Z fluentbit_tda_betti2 = 0
2025-12-03T07:27:20.024738944Z fluentbit_tda_betti0 = 41
2025-12-03T07:27:20.024738944Z fluentbit_tda_betti1 = 7
2025-12-03T07:27:20.024738944Z fluentbit_tda_betti2 = 1
[2025/12/03 16:27:21.995837000] [ info] [output:http:http.0] localhost:8443, HTTP status=200
{"status":"ok","errors":false}
2025-12-03T07:27:22.033923596Z fluentbit_tda_betti0 = 42
2025-12-03T07:27:22.033923596Z fluentbit_tda_betti1 = 7
2025-12-03T07:27:22.033923596Z fluentbit_tda_betti2 = 0
[2025/12/03 16:27:23.605981000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:23.606029000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:23.606022000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:23.606048000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:23.606089000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:23.606101000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:23.606148000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:23.606162000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:23.606243000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:23.606255000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:23.606328000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:23.606340000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:23.606400000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:23.606412000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:23.606437000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:23.606458000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:23.606472000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:23.606471000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:23.606560000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:23.606578000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:23.606593000] [error] [engine] chunk '30288-1764746830.908242000.flb' cannot be retried: task_id=0, input=dummy.0 > output=http.0
[2025/12/03 16:27:23.606725000] [ warn] [engine] failed to flush chunk '30288-1764746841.908663000.flb', retry in 10 seconds: task_id=8, input=dummy.0 > output=http.0 (out_id=0)
[2025/12/03 16:27:23.606780000] [ warn] [engine] failed to flush chunk '30288-1764746840.908811000.flb', retry in 6 seconds: task_id=13, input=dummy.0 > output=http.0 (out_id=0)
[2025/12/03 16:27:23.606825000] [ warn] [engine] failed to flush chunk '30288-1764746838.907344000.flb', retry in 10 seconds: task_id=11, input=dummy.0 > output=http.0 (out_id=0)
[2025/12/03 16:27:23.606845000] [error] [engine] chunk '30288-1764746827.908136000.flb' cannot be retried: task_id=2, input=dummy.0 > output=http.0
[2025/12/03 16:27:23.606905000] [error] [engine] chunk '30288-1764746831.908588000.flb' cannot be retried: task_id=6, input=dummy.0 > output=http.0
[2025/12/03 16:27:23.606941000] [error] [engine] chunk '30288-1764746828.909077000.flb' cannot be retried: task_id=3, input=dummy.0 > output=http.0
[2025/12/03 16:27:23.606991000] [error] [engine] chunk '30288-1764746832.908679000.flb' cannot be retried: task_id=7, input=dummy.0 > output=http.0
[2025/12/03 16:27:23.607060000] [ warn] [engine] failed to flush chunk '30288-1764746839.908556000.flb', retry in 8 seconds: task_id=12, input=dummy.0 > output=http.0 (out_id=0)
[2025/12/03 16:27:23.607112000] [ warn] [engine] failed to flush chunk '30288-1764746837.908369000.flb', retry in 6 seconds: task_id=10, input=dummy.0 > output=http.0 (out_id=0)
[2025/12/03 16:27:23.932627000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:23.932773000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:23.932794000] [error] [output:http:http.0] no upstream connections available to localhost:8443
[2025/12/03 16:27:24.30227000] [ warn] [engine] failed to flush chunk '30288-1764746842.908959000.flb', retry in 10 seconds: task_id=0, input=dummy.0 > output=http.0 (out_id=0)
2025-12-03T07:27:24.030021859Z fluentbit_tda_betti0 = 43
2025-12-03T07:27:24.030021859Z fluentbit_tda_betti1 = 9
2025-12-03T07:27:24.030021859Z fluentbit_tda_betti2 = 0
[2025/12/03 16:27:25.777729000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:25.777762000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:25.777780000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:25.777804000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:25.777824000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:25.777842000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:25.777864000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:25.777883000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:25.777916000] [error] [engine] chunk '30288-1764746826.908579000.flb' cannot be retried: task_id=1, input=dummy.0 > output=http.0
[2025/12/03 16:27:25.777955000] [error] [engine] chunk '30288-1764746833.907394000.flb' cannot be retried: task_id=9, input=dummy.0 > output=http.0
[2025/12/03 16:27:25.777981000] [error] [engine] chunk '30288-1764746829.908135000.flb' cannot be retried: task_id=4, input=dummy.0 > output=http.0
[2025/12/03 16:27:25.778051000] [ warn] [engine] failed to flush chunk '30288-1764746843.907694000.flb', retry in 11 seconds: task_id=2, input=dummy.0 > output=http.0 (out_id=0)
[2025/12/03 16:27:25.918758000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:25.918857000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:25.918872000] [error] [output:http:http.0] no upstream connections available to localhost:8443
[2025/12/03 16:27:26.13749000] [ warn] [engine] failed to flush chunk '30288-1764746844.908008000.flb', retry in 8 seconds: task_id=1, input=dummy.0 > output=http.0 (out_id=0)
[2025/12/03 16:27:26.25748000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:26.25885000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:26.25899000] [error] [output:http:http.0] no upstream connections available to localhost:8443
[2025/12/03 16:27:26.25923000] [error] [engine] chunk '30288-1764746835.908400000.flb' cannot be retried: task_id=5, input=dummy.0 > output=http.0
2025-12-03T07:27:26.013531523Z fluentbit_tda_betti0 = 44
2025-12-03T07:27:26.013531523Z fluentbit_tda_betti1 = 9
2025-12-03T07:27:26.013531523Z fluentbit_tda_betti2 = 0
[2025/12/03 16:27:27.572929000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:27.572971000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:27.572986000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:27.573010000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:27.573137000] [ warn] [engine] failed to flush chunk '30288-1764746845.906586000.flb', retry in 9 seconds: task_id=3, input=dummy.0 > output=http.0 (out_id=0)
[2025/12/03 16:27:27.573176000] [error] [engine] chunk '30288-1764746834.908597000.flb' cannot be retried: task_id=14, input=dummy.0 > output=http.0
[2025/12/03 16:27:27.935499000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:27.935638000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:27.935659000] [error] [output:http:http.0] no upstream connections available to localhost:8443
[2025/12/03 16:27:28.50004000] [ warn] [engine] failed to flush chunk '30288-1764746846.908355000.flb', retry in 10 seconds: task_id=4, input=dummy.0 > output=http.0 (out_id=0)
2025-12-03T07:27:28.049800623Z fluentbit_tda_betti0 = 45
2025-12-03T07:27:28.049800623Z fluentbit_tda_betti1 = 10
2025-12-03T07:27:28.049800623Z fluentbit_tda_betti2 = 1
[2025/12/03 16:27:29.881491000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:29.881542000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:29.881535000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:29.881565000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:29.881633000] [error] [http_client] broken connection to localhost:8443 ?
[2025/12/03 16:27:29.881648000] [error] [output:http:http.0] could not flush records to localhost:8443 (http_do=-1)
[2025/12/03 16:27:29.881736000] [error] [engine] chunk '30288-1764746840.908811000.flb' cannot be retried: task_id=13, input=dummy.0 > output=http.0
[2025/12/03 16:27:29.881782000] [error] [engine] chunk '30288-1764746837.908369000.flb' cannot be retried: task_id=10, input=dummy.0 > output=http.0
[2025/12/03 16:27:29.881873000] [ warn] [engine] failed to flush chunk '30288-1764746847.906725000.flb', retry in 9 seconds: task_id=5, input=dummy.0 > output=http.0 (out_id=0)
[2025/12/03 16:27:29.926923000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:29.927037000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:29.927068000] [error] [output:http:http.0] no upstream connections available to localhost:8443
[2025/12/03 16:27:30.37614000] [ warn] [engine] failed to flush chunk '30288-1764746848.907077000.flb', retry in 11 seconds: task_id=6, input=dummy.0 > output=http.0 (out_id=0)
2025-12-03T07:27:30.037452292Z fluentbit_tda_betti0 = 46
2025-12-03T07:27:30.037452292Z fluentbit_tda_betti1 = 12
2025-12-03T07:27:30.037452292Z fluentbit_tda_betti2 = 1
[2025/12/03 16:27:30.937891000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:30.938089000] [error] [net] TCP connection failed: localhost:8443 (Connection refused)
[2025/12/03 16:27:30.938120000] [error] [output:http:http.0] no upstream connections available to localhost:8443
[2025/12/03 16:27:30.938185000] [ warn] [engine] failed to flush chunk '30288-1764746849.906988000.flb', retry in 11 seconds: task_id=7, input=dummy.0 > output=http.0 (out_id=0)
2025-12-03T07:27:32.060665835Z fluentbit_tda_betti0 = 47
2025-12-03T07:27:32.060665835Z fluentbit_tda_betti1 = 12
2025-12-03T07:27:32.060665835Z fluentbit_tda_betti2 = 2
2025-12-03T07:27:34.073613867Z fluentbit_tda_betti0 = 48
2025-12-03T07:27:34.073613867Z fluentbit_tda_betti1 = 13
2025-12-03T07:27:34.073613867Z fluentbit_tda_betti2 = 2
[2025/12/03 16:27:36.264575000] [ info] [output:http:http.0] localhost:8443, HTTP status=200
{"status":"ok","errors":false}
[2025/12/03 16:27:36.264791000] [ info] [engine] flush chunk '30288-1764746839.908556000.flb' succeeded at retry 1: task_id=12, input=dummy.0 > output=http.0 (out_id=0)
2025-12-03T07:27:36.082592852Z fluentbit_tda_betti0 = 49
2025-12-03T07:27:36.082592852Z fluentbit_tda_betti1 = 13
2025-12-03T07:27:36.082592852Z fluentbit_tda_betti2 = 2
2025-12-03T07:27:38.070223396Z fluentbit_tda_betti0 = 50
2025-12-03T07:27:38.070223396Z fluentbit_tda_betti1 = 13
2025-12-03T07:27:38.070223396Z fluentbit_tda_betti2 = 1
2025-12-03T07:27:40.066529659Z fluentbit_tda_betti0 = 51
2025-12-03T07:27:40.066529659Z fluentbit_tda_betti1 = 13
2025-12-03T07:27:40.066529659Z fluentbit_tda_betti2 = 1
[2025/12/03 16:27:41.265885000] [ info] [output:http:http.0] localhost:8443, HTTP status=200
{"status":"ok","errors":false}
2025-12-03T07:27:42.072408891Z fluentbit_tda_betti0 = 52
2025-12-03T07:27:42.072408891Z fluentbit_tda_betti1 = 13
2025-12-03T07:27:42.072408891Z fluentbit_tda_betti2 = 2
2025-12-03T07:27:44.090558970Z fluentbit_tda_betti0 = 53
2025-12-03T07:27:44.090558970Z fluentbit_tda_betti1 = 13
2025-12-03T07:27:44.090558970Z fluentbit_tda_betti2 = 2
[2025/12/03 16:27:46.264449000] [ info] [output:http:http.0] localhost:8443, HTTP status=200
{"status":"ok","errors":false}
2025-12-03T07:27:46.100740206Z fluentbit_tda_betti0 = 54
2025-12-03T07:27:46.100740206Z fluentbit_tda_betti1 = 13
2025-12-03T07:27:46.100740206Z fluentbit_tda_betti2 = 2
2025-12-03T07:27:48.094721175Z fluentbit_tda_betti0 = 55
2025-12-03T07:27:48.094721175Z fluentbit_tda_betti1 = 13
2025-12-03T07:27:48.094721175Z fluentbit_tda_betti2 = 2
2025-12-03T07:27:50.083926971Z fluentbit_tda_betti0 = 56
2025-12-03T07:27:50.083926971Z fluentbit_tda_betti1 = 13
2025-12-03T07:27:50.083926971Z fluentbit_tda_betti2 = 2
[2025/12/03 16:27:51.265835000] [ info] [output:http:http.0] localhost:8443, HTTP status=200
{"status":"ok","errors":false}
[2025/12/03 16:27:51.266086000] [ info] [engine] flush chunk '30288-1764746841.908663000.flb' succeeded at retry 1: task_id=8, input=dummy.0 > output=http.0 (out_id=0)
2025-12-03T07:27:52.104521303Z fluentbit_tda_betti0 = 56
2025-12-03T07:27:52.104521303Z fluentbit_tda_betti1 = 13
2025-12-03T07:27:52.104521303Z fluentbit_tda_betti2 = 2
2025-12-03T07:27:54.104147236Z fluentbit_tda_betti0 = 56
2025-12-03T07:27:54.104147236Z fluentbit_tda_betti1 = 12
2025-12-03T07:27:54.104147236Z fluentbit_tda_betti2 = 2
[2025/12/03 16:27:56.264185000] [ info] [output:http:http.0] localhost:8443, HTTP status=200
{"status":"ok","errors":false}
[2025/12/03 16:27:56.264291000] [ info] [engine] flush chunk '30288-1764746838.907344000.flb' succeeded at retry 1: task_id=11, input=dummy.0 > output=http.0 (out_id=0)
2025-12-03T07:27:56.107265131Z fluentbit_tda_betti0 = 56
2025-12-03T07:27:56.107265131Z fluentbit_tda_betti1 = 12
2025-12-03T07:27:56.107265131Z fluentbit_tda_betti2 = 2
2025-12-03T07:27:58.093616840Z fluentbit_tda_betti0 = 56
2025-12-03T07:27:58.093616840Z fluentbit_tda_betti1 = 12
2025-12-03T07:27:58.093616840Z fluentbit_tda_betti2 = 2
2025-12-03T07:28:00.088709102Z fluentbit_tda_betti0 = 56
2025-12-03T07:28:00.088709102Z fluentbit_tda_betti1 = 12
2025-12-03T07:28:00.088709102Z fluentbit_tda_betti2 = 2
[2025/12/03 16:28:01.264444000] [ info] [output:http:http.0] localhost:8443, HTTP status=200
{"status":"ok","errors":false}
2025-12-03T07:28:02.097946671Z fluentbit_tda_betti0 = 56
2025-12-03T07:28:02.097946671Z fluentbit_tda_betti1 = 11
2025-12-03T07:28:02.097946671Z fluentbit_tda_betti2 = 2
2025-12-03T07:28:04.093339307Z fluentbit_tda_betti0 = 56
2025-12-03T07:28:04.093339307Z fluentbit_tda_betti1 = 12
2025-12-03T07:28:04.093339307Z fluentbit_tda_betti2 = 2
[2025/12/03 16:28:06.264442000] [ info] [output:http:http.0] localhost:8443, HTTP status=200
{"status":"ok","errors":false}
[2025/12/03 16:28:06.264618000] [ info] [engine] flush chunk '30288-1764746844.908008000.flb' succeeded at retry 1: task_id=1, input=dummy.0 > output=http.0 (out_id=0)
2025-12-03T07:28:06.093379532Z fluentbit_tda_betti0 = 56
2025-12-03T07:28:06.093379532Z fluentbit_tda_betti1 = 12
2025-12-03T07:28:06.093379532Z fluentbit_tda_betti2 = 2

For just one-time failing case, there is no increasing betti1 and betti2 metrics.
But intermittent failing cases just like the above, this higher order of metrics would raise and detected some of the "phase transitions" which means that there's no stable phase.

Debug log output from testing the change

Attached Valgrind output that shows no leaks or memory corruption was found

This log is macOS's memory leak detector:

Process 30709 is not debuggable. Due to security restrictions, leaks can only show or save contents of readonly memory of restricted processes.

Process:         fluent-bit [30709]
Path:            /Users/USER/*/fluent-bit
Load Address:    0x104aa4000
Identifier:      fluent-bit
Version:         0
Code Type:       ARM64
Platform:        macOS
Parent Process:  leaks [30708]
Target Type:     live task

Date/Time:       2025-12-03 16:33:19.616 +0900
Launch Time:     2025-12-03 16:33:06.144 +0900
OS Version:      macOS 26.0.1 (25A362)
Report Version:  7
Analysis Tool:   /usr/bin/leaks

Physical footprint:         13.0M
Physical footprint (peak):  13.1M
Idle exit:                  untracked
----

leaks Report Version: 4.0, multi-line stacks
Process 30709: 2752 nodes malloced for 419 KB
Process 30709: 0 leaks for 0 total leaked bytes.

[2025/12/03 16:33:20] [engine] caught signal (SIGCONT)
[2025/12/03 16:33:20] Fluent Bit Dump

There's no leaks in this plugin.

Plus, there's no rules but the TDA metrics tells there's something happens with betti2 and betti1 metrics with non-zeros:

This metrics' detector is different direction to lighten in the depth of anomaly detections.

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Run local packaging test showing all targets (including any new ones) build.
Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

Documentation required for this feature

fluent/fluent-bit-docs#2277

Backporting

Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

New Features
- Optional Ripser integration for persistent homology and a new TDA processor that emits Betti metrics (betti0/betti1/betti2).
Documentation
- Added Ripser README, license, and contributing guidance.
Tests
- New unit test validating Ripser Betti computations on a sample dataset.
Chores
- Build and Docker flags to enable/disable Ripser and wire its build/install steps.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-03T07:23:41Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds Ripser v1.2.1 as an optional bundled library, exposes a C wrapper API and C++ wrapper, introduces a new TDA processor plugin that computes Betti numbers from time-series via delay embedding, and wires build, packaging, tests, and installation to conditionally include Ripser support.

Changes

Cohort / File(s)	Summary
Top-level CMake & options `CMakeLists.txt`, `cmake/libraries.cmake`, `cmake/plugins_options.cmake`, `src/CMakeLists.txt`, `include/CMakeLists.txt`	Add FLB_USE_RIPSER/FLB_RIPSER detection and public option, define FLB_PATH_LIB_RIPSER, add FLB_PROCESSOR_TDA option, define FLB_HAVE_RIPSER when enabled, conditional add_subdirectory(ripser), and set C++11 when Ripser is enabled.
Bundled Ripser library `lib/ripser-1.2.1/*`	Add Ripser 1.2.1 sources, build files (CMakeLists/Makefile), license/CONTRIBUTING, README, .gitignore, .gitmodules, examples, and define `ripser-static` target.
Ripser wrapper & integration `include/fluent-bit/ripser/flb_ripser_wrapper.h`, `src/ripser/flb_ripser_wrapper.cpp`, `src/ripser/CMakeLists.txt`, `lib/ripser-1.2.1/ripser_internal.hpp`	Add public C API header for Betti/intervals, implement C++ wrapper converting dense matrices to Ripser formats, interval filtering/bridging, two exported functions for Betti and intervals, and add `flb-ripser-wrapper-static` target linking to `ripser-static`.
Processor plugin (TDA) `plugins/processor_tda/tda.h`, `plugins/processor_tda/tda.c`, `plugins/processor_tda/CMakeLists.txt`, `plugins/CMakeLists.txt`	Add `tda` processor plugin gated by FLB_RIPSER implementing sliding windows, grouping, delay embedding, dense distance construction, Ripser-driven Betti computation, and register `processor_tda_plugin`.
Tests `tests/internal/ripser.c`, `tests/internal/CMakeLists.txt`	Add unit test `test_ripser_betti_circle()` and conditionally include ripser test source when FLB_RIPSER is enabled.
Packaging & Docker `dockerfiles/Dockerfile.centos7`, `packaging/distros/centos/Dockerfile`	Add FLB_RIPSER build ARG/ENV in CentOS Dockerfiles, pass `-DFLB_RIPSER="${FLB_RIPSER}"` to CMake; CentOS7 Dockerfile sets FLB_RIPSER=Off in build.
Install & headers `include/CMakeLists.txt`, `include/fluent-bit/ripser/*.h`	Install Ripser wrapper headers to `${FLB_INSTALL_INCLUDEDIR}/fluent-bit/ripser/` when FLB_RIPSER is enabled.
Misc / examples `lib/ripser-1.2.1/examples/*`	Add example distance-matrix data files and Ripser README documentation.

Sequence Diagram(s)

sequenceDiagram
    participant Metrics as Metrics Stream
    participant Processor as TDA Processor
    participant Window as Sliding Window
    participant Embed as Delay Embedding
    participant DistMat as Dense→Compressed Builder
    participant Ripser as Ripser Engine
    participant Export as Metrics Export

    Metrics->>Processor: incoming metric points
    Processor->>Window: append / rotate samples
    Window->>Processor: snapshot when window ready
    Processor->>Embed: build embedded vectors (m, τ)
    Embed->>DistMat: compute dense pairwise distances
    DistMat->>Ripser: convert to compressed & run
    Ripser-->>Processor: emit intervals / betti counts (via bridge)
    Processor->>Export: emit betti gauges (betti0, betti1, betti2)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Focus areas:
- lib/ripser-1.2.1/ripser.cpp and ripser_internal.hpp (algorithmic correctness, memory/index safety, format readers)
- plugins/processor_tda/tda.c and tda.h (buffer lifecycle, embedding correctness, numeric stability, concurrency/context safety)
- src/ripser/flb_ripser_wrapper.cpp and include/fluent-bit/ripser/flb_ripser_wrapper.h (C/C++ ABI, callback bridging, input validation)
- CMake and Docker wiring (conditional build flags, target linkages, install rules)

Suggested reviewers

edsiper
fujimotos
koleini
niedbalski

Poem

🐰
I hopped through windows, stitched time's thread,
Measured loops where metrics tread,
Counted holes in curves that bend,
Betti bells ring loop and end —
A rabbit's joy in numbers spread.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.27% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately summarizes the main change: implementing a new processor_tda plugin that applies Topological Data Analysis to metrics.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cosmo0920-ripser-for-analysis

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Hiroshi Hatake <[email protected]>

This processor plugin performs Topological Data Analysis (TDA) on metrics using ripser, which computes persistent homology. The plugin aggregates incoming counters, gauges and untyped metrics into a 1-D time series, keeps a sliding window, builds a dense distance matrix and runs ripser through the new flb_ripser_* wrapper helpers. The resulting Betti numbers (currently betti0 and betti1) are exported as additional gauge metrics. TDA and persistent homology can help reveal hidden order or phase transitions in complex systems that are not easily visible from raw time series. Similar approaches have already been explored in condensed matter physics, for example: Donato, I., Gori, M., & Sarti, A. (2016). Persistent homology analysis of phase transitions. Physical Review E, 93, 052138. https://doi.org/10.1103/PhysRevE.93.052138 Signed-off-by: Hiroshi Hatake <[email protected]>

Signed-off-by: Hiroshi Hatake <[email protected]>

The TDA metrics processor now supports an optional delay embedding of the aggregated metric vectors before building the dense distance matrix used by Ripser. When `embed_dim > 1`, we reconstruct a Takens-style delay embedding x_t -> (x_t, x_{t-ﾏм, ..., x_{t-(m-1)ﾏм) over the sliding window, where `m = embed_dim` and `ﾏ= embed_delay`. Each embedded point is a flattened vector of size feature_dim ﾃm and we keep using an Euclidean distance on this reconstructed phase space. This makes the processor more sensitive to occasional cyclic / quasi- periodic regimes in the metric time series: loops in the reconstructed trajectory translate into H1 features in the persistent homology. When `embed_dim = 1`, the behaviour is unchanged and we fall back to the original "no embedding" mode. This change also adds two configuration options: - `embed_dim` (int, default: 3) Delay embedding dimension m. Set to 1 to disable delay embedding. - `embed_delay` (int, default: 1) Lag ﾏin samples between successive delays. The design follows the standard delay embedding approach from Takens' theorem, which shows that (under mild conditions) the attractor of an unknown dynamical system can be reconstructed from a single observed time series via delay coordinates. Reference - F. Takens, "Detecting strange attractors in turbulence", in D. Rand and L.-S. Young (eds.), Dynamical Systems and Turbulence, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366-381. Signed-off-by: Hiroshi Hatake <[email protected]>

Signed-off-by: Hiroshi Hatake <[email protected]>

Expose threshold as a quantile-based distance scale selector. Signed-off-by: Hiroshi Hatake <[email protected]>

…tions Signed-off-by: Hiroshi Hatake <[email protected]>

Signed-off-by: Hiroshi Hatake <[email protected]>

This is because tda processor could support other types of processing. Especially, it's for traces. But now, it's only for metrics pipeline. Signed-off-by: Hiroshi Hatake <[email protected]>

Signed-off-by: Hiroshi Hatake <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

lib/ripser-1.2.1/ripser_internal.hpp (1)
89-89: Fix include guard comment mismatch.

The closing comment references RIPSER_INTERNAL_H but the opening guard at line 20 uses RIPSER_INTERNAL_HPP. Update for consistency.
🔎 Suggested fix
-#endif /* RIPSER_INTERNAL_H */
+#endif /* RIPSER_INTERNAL_HPP */

🧹 Nitpick comments (1)

plugins/processor_tda/tda.c (1)
926-940: Unused threshold computation.

The threshold variable computed at line 936 is never used. The subsequent multi-quantile scan (lines 945-993) computes a fresh thr for each quantile candidate, making this computation dead code.

Consider removing these lines or using threshold as a fallback/default if the multi-quantile scan produces no valid results.
🔎 Option 1: Remove unused code
-    if (m == 1) {
-        q = 0.5;      /* No delay embedding: use something like the median. */
-    }
-    else {
-        q = 0.2;      /* With delay embedding: look at a smaller scale. */
-    }
-
-    /* --- choose a scale for TDA ---
-     * Use the number of embedded points n_embed to determine the threshold.
-     */
-    threshold = tda_choose_threshold_from_dist(ctx, dist, n_embed, q);
-    if (threshold <= 0.0f) {
-        threshold = 0.0f;
-    }
-
     memset(&betti, 0, sizeof(betti));

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 657ca61 and 86b83e5.

📒 Files selected for processing (17)

cmake/plugins_options.cmake
dockerfiles/Dockerfile.centos7
include/CMakeLists.txt
include/fluent-bit/ripser/flb_ripser_wrapper.h
lib/ripser-1.2.1/CMakeLists.txt
lib/ripser-1.2.1/ripser.cpp
lib/ripser-1.2.1/ripser_internal.hpp
packaging/distros/centos/Dockerfile
plugins/CMakeLists.txt
plugins/processor_tda/CMakeLists.txt
plugins/processor_tda/tda.c
plugins/processor_tda/tda.h
src/CMakeLists.txt
src/ripser/CMakeLists.txt
src/ripser/flb_ripser_wrapper.cpp
tests/internal/CMakeLists.txt
tests/internal/ripser.c

🚧 Files skipped from review as they are similar to previous changes (7)

packaging/distros/centos/Dockerfile
plugins/processor_tda/tda.h
plugins/processor_tda/CMakeLists.txt
plugins/CMakeLists.txt
src/ripser/CMakeLists.txt
dockerfiles/Dockerfile.centos7
tests/internal/ripser.c

🧰 Additional context used

🧠 Learnings (14)

📓 Common learnings

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: src/ripser/flb_ripser_wrapper.cpp:73-106
Timestamp: 2025-12-08T05:21:45.014Z
Learning: In the TDA processor (processor_tda) for Fluent Bit, the dimension limit FLB_RIPSER_MAX_BETTI_DIM is intentionally capped at 3 because the plugin uses embed_dim=3 and delay=1 in practice, and computing higher dimensions (>4) would be computationally prohibitive. The 8-slot betti array allocation is conservative headroom.

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:38:56.340Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.

📚 Learning: 2025-12-22T05:38:56.340Z

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:38:56.340Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.

Applied to files:

lib/ripser-1.2.1/CMakeLists.txt
lib/ripser-1.2.1/ripser.cpp
tests/internal/CMakeLists.txt
include/CMakeLists.txt
include/fluent-bit/ripser/flb_ripser_wrapper.h
src/CMakeLists.txt

📚 Learning: 2025-12-08T05:21:45.014Z

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: src/ripser/flb_ripser_wrapper.cpp:73-106
Timestamp: 2025-12-08T05:21:45.014Z
Learning: In the TDA processor (processor_tda) for Fluent Bit, the dimension limit FLB_RIPSER_MAX_BETTI_DIM is intentionally capped at 3 because the plugin uses embed_dim=3 and delay=1 in practice, and computing higher dimensions (>4) would be computationally prohibitive. The 8-slot betti array allocation is conservative headroom.

Applied to files:

cmake/plugins_options.cmake
lib/ripser-1.2.1/ripser.cpp
plugins/processor_tda/tda.c
include/fluent-bit/ripser/flb_ripser_wrapper.h
src/ripser/flb_ripser_wrapper.cpp

📚 Learning: 2025-08-31T12:46:11.940Z

Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

cmake/plugins_options.cmake
include/CMakeLists.txt
src/CMakeLists.txt

📚 Learning: 2025-08-29T06:25:27.250Z

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

cmake/plugins_options.cmake
lib/ripser-1.2.1/ripser.cpp
include/CMakeLists.txt

📚 Learning: 2025-08-31T12:46:11.940Z

Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.

Applied to files:

cmake/plugins_options.cmake
include/CMakeLists.txt

📚 Learning: 2025-08-29T06:25:27.250Z

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

cmake/plugins_options.cmake
lib/ripser-1.2.1/ripser.cpp

📚 Learning: 2025-08-29T06:24:26.170Z

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

lib/ripser-1.2.1/ripser.cpp

📚 Learning: 2025-08-29T06:24:55.855Z

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

lib/ripser-1.2.1/ripser.cpp

📚 Learning: 2025-08-29T06:25:02.561Z

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

lib/ripser-1.2.1/ripser.cpp

📚 Learning: 2025-08-29T06:24:44.797Z

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

lib/ripser-1.2.1/ripser.cpp

📚 Learning: 2025-09-08T11:21:33.975Z

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

lib/ripser-1.2.1/ripser.cpp

📚 Learning: 2025-11-21T06:23:29.770Z

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

lib/ripser-1.2.1/ripser.cpp
include/CMakeLists.txt
include/fluent-bit/ripser/flb_ripser_wrapper.h

📚 Learning: 2025-09-14T09:46:09.531Z

Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.

Applied to files:

include/CMakeLists.txt

🧬 Code graph analysis (3)

lib/ripser-1.2.1/ripser.cpp (1)

lib/ripser-1.2.1/ripser_internal.hpp (4)

i (58-58)

rows (59-59)

dim (72-76)

dim (72-72)

include/fluent-bit/ripser/flb_ripser_wrapper.h (1)

src/ripser/flb_ripser_wrapper.cpp (4)

flb_ripser_compute_betti_from_dense_distance (119-171)

flb_ripser_compute_betti_from_dense_distance (119-124)

flb_ripser_compute_intervals_from_dense_distance (200-235)

flb_ripser_compute_intervals_from_dense_distance (200-206)

src/ripser/flb_ripser_wrapper.cpp (2)

lib/ripser-1.2.1/ripser.cpp (12)

i (236-238)

i (236-236)

i (241-243)

i (241-241)

i (269-275)

i (269-269)

i (288-294)

i (288-288)

i (409-411)

i (409-409)

ripser_run_from_compressed_lower (968-986)

ripser_run_from_compressed_lower (968-973)

lib/ripser-1.2.1/ripser_internal.hpp (4)

i (58-58)

dim (72-76)

dim (72-72)

ripser_run_from_compressed_lower (82-87)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (81)

GitHub Check: PR - container builds / Windows container images (2025)
GitHub Check: PR - container builds / Windows container images (2022)
GitHub Check: PR - packages build Linux / debian/bullseye package build and stage to S3
GitHub Check: PR - packages build Linux / rockylinux/10 package build and stage to S3
GitHub Check: PR - packages build Linux / debian/bullseye.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / raspbian/bookworm package build and stage to S3
GitHub Check: PR - packages build Linux / ubuntu/24.04.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / ubuntu/22.04.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / ubuntu/22.04 package build and stage to S3
GitHub Check: PR - packages build Linux / ubuntu/24.04 package build and stage to S3
GitHub Check: PR - packages build Linux / debian/buster.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / almalinux/10 package build and stage to S3
GitHub Check: PR - packages build Linux / debian/buster package build and stage to S3
GitHub Check: PR - packages build Linux / debian/trixie package build and stage to S3
GitHub Check: PR - packages build Linux / almalinux/9.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / rockylinux/9.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / debian/bookworm.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / debian/trixie.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / centos/10 package build and stage to S3
GitHub Check: PR - packages build Linux / debian/bookworm package build and stage to S3
GitHub Check: PR - packages build Linux / almalinux/9 package build and stage to S3
GitHub Check: PR - packages build Linux / almalinux/8 package build and stage to S3
GitHub Check: PR - packages build Linux / centos/10.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / almalinux/10.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / almalinux/8.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / centos/9 package build and stage to S3
GitHub Check: PR - packages build Linux / centos/8 package build and stage to S3
GitHub Check: PR - packages build Linux / centos/7.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / rockylinux/8 package build and stage to S3
GitHub Check: PR - packages build Linux / rockylinux/9 package build and stage to S3
GitHub Check: PR - packages build Linux / rockylinux/10.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / centos/8.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / rockylinux/8.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / amazonlinux/2 package build and stage to S3
GitHub Check: PR - packages build Linux / amazonlinux/2.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / centos/9.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / amazonlinux/2023.arm64v8 package build and stage to S3
GitHub Check: PR - packages build Linux / amazonlinux/2023 package build and stage to S3
GitHub Check: PR - packages build Linux / centos/7 package build and stage to S3
GitHub Check: PR - container builds / arm/v7/production container image build
GitHub Check: PR - container builds / amd64/production container image build
GitHub Check: PR - container builds / amd64/debug container image build
GitHub Check: PR - container builds / arm64/debug container image build
GitHub Check: PR - container builds / arm/v7/debug container image build
GitHub Check: PR - container builds / arm64/production container image build
GitHub Check: PR - packages build Windows / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
GitHub Check: PR - packages build Windows / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
GitHub Check: PR - packages build Windows / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
GitHub Check: PR - packages build MacOS / call-build-macos-package (Intel macOS runner, macos-14-large, 3.31.6)
GitHub Check: PR - packages build MacOS / call-build-macos-package (Apple Silicon macOS runner, macos-14, 3.31.6)
GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
GitHub Check: pr-compile-centos-7
GitHub Check: pr-compile-without-cxx (3.31.6)
GitHub Check: PR - fuzzing test

🔇 Additional comments (30)

cmake/plugins_options.cmake (1)

77-77: LGTM!

The TDA processor option follows the established pattern for processor plugins, with a clear description and default enabled state consistent with other processors.

tests/internal/CMakeLists.txt (1)

169-174: LGTM!

The conditional test inclusion for Ripser follows the established pattern used by other optional features like LuaJIT, ensuring test coverage is properly gated by the feature flag.

include/CMakeLists.txt (1)

46-53: LGTM!

The Ripser header installation block follows the same pattern as other optional components (e.g., simdutf), with proper feature flag gating and standard permissions.

src/CMakeLists.txt (2)

366-369: LGTM!

The Ripser subdirectory inclusion follows the established pattern for optional components.

458-466: LGTM!

The Ripser dependency linking follows the established pattern, ensuring both the core Ripser library and the Fluent Bit wrapper are linked when the feature is enabled.

lib/ripser-1.2.1/CMakeLists.txt (1)

1-10: LGTM!

The CMake configuration for the Ripser static library is straightforward and correct, with appropriate include directory setup and C++11 requirement.

src/ripser/flb_ripser_wrapper.cpp (4)

39-54: LGTM!

The dense-to-compressed matrix conversion correctly extracts the lower triangular portion with proper indexing (i > j) and reserves the exact size needed.

73-106: LGTM!

The Betti interval callback correctly filters out invalid intervals (negative dimensions, non-finite values, death ≤ birth, and low persistence < 1e-3). The dimension cap at FLB_RIPSER_MAX_BETTI_DIM (3) is intentional per the design constraints.

Based on learnings, the dimension limit of 3 is intentional because the plugin uses embed_dim=3 and delay=1, and higher dimensions would be computationally prohibitive.

119-171: LGTM!

The public API correctly validates inputs, caps max_dim to 8, converts the dense matrix, runs Ripser with appropriate threshold handling (enclosing radius mode when threshold ≤ 0), and properly fills the output structure.

200-234: LGTM!

The interval computation API correctly validates inputs, sets up the callback bridge, and runs Ripser with the same threshold semantics as the Betti computation API.

lib/ripser-1.2.1/ripser_internal.hpp (2)

30-64: LGTM!

The type definitions, compressed distance matrix template, and layout enum are well-structured. The quadratic formula in the constructor (line 43) correctly computes the matrix size from the compressed vector length.

66-77: LGTM!

The interval recorder struct provides a clean callback interface with default initialization and safe null-checking in the emit method.

include/fluent-bit/ripser/flb_ripser_wrapper.h (4)

29-29: LGTM!

The dimension limit of 3 is intentional and appropriate for the TDA processor's use case with embed_dim=3 and delay=1.

Based on learnings, this cap prevents computationally prohibitive calculations for higher dimensions.

33-46: LGTM!

The data structures are well-designed with clear documentation. The 8-slot betti array provides conservative headroom while the practical limit remains at dimension 3.

62-67: LGTM!

The function signature is well-documented with clear parameter descriptions and return value semantics. The threshold behavior (≤ 0 uses enclosing radius) is properly documented.

87-93: LGTM!

The interval computation API provides a flexible callback-based interface for users who need access to individual persistence intervals rather than just the Betti number summary.

plugins/processor_tda/tda.c (9)

39-140: LGTM!

The comparison function and threshold selection logic are correctly implemented with proper null checks, memory allocation error handling, and boundary conditions for quantile calculation.

142-173: LGTM!

The window creation function properly handles allocation failures and cleans up resources on error paths.

179-295: LGTM!

The group registration helpers properly handle memory allocation failures and roll back partial allocations when hash table insertion fails.

350-462: LGTM!

The group building logic correctly handles error paths, including the fix for the potential use-after-free when last_vec allocation fails.

476-575: LGTM!

The vector construction properly handles the first sample case, computes rates with time delta safeguards, and applies log1p normalization while preserving sign.

577-638: LGTM!

The ingest function correctly handles ring buffer overflow by dropping oldest samples, and properly frees all temporary allocations.

1038-1136: LGTM!

The processor lifecycle functions properly initialize, clean up, and handle all allocated resources with appropriate null checks.

1138-1188: LGTM!

The process metrics function correctly initializes groups and window on first call, and the gauge pointer reset is intentional since each metrics_context manages its own gauge objects through the cmetrics lifecycle.

1191-1233: LGTM!

The configuration map and plugin definition are properly structured with sensible defaults and correctly wired callbacks.

lib/ripser-1.2.1/ripser.cpp (5)

1-78: LGTM!

License headers properly attribute both the original MIT-licensed Ripser code and the Fluent Bit modifications.

219-297: LGTM!

The distance matrix implementations correctly handle triangular matrix access patterns and diagonal elements.

372-817: LGTM!

The core Ripser persistence algorithm implementation is correctly integrated with the interval_recorder callback mechanism for emitting persistence intervals.

947-986: LGTM!

The edge extraction specializations and the ripser_run_from_compressed_lower entry point correctly integrate Ripser with the Fluent Bit wrapper, using Z/2Z coefficients for the homology computation.

988-1305: Standalone executable code disabled for Fluent Bit build.

The #ifdef RIPSEREXE section contains the CLI frontend and is not compiled when building for Fluent Bit. Per previous discussion, this vendored code is preserved as-is to simplify future upstream updates.

github-actions bot added the docs-required label Dec 3, 2025

cosmo0920 had a problem deploying to pr December 3, 2025 07:23 — with GitHub Actions Failure

cosmo0920 force-pushed the cosmo0920-ripser-for-analysis branch from 47dccf4 to 327ad4a Compare December 3, 2025 07:24

cosmo0920 had a problem deploying to pr December 3, 2025 07:24 — with GitHub Actions Failure

cosmo0920 temporarily deployed to pr December 3, 2025 08:23 — with GitHub Actions Inactive

cosmo0920 force-pushed the cosmo0920-ripser-for-analysis branch from d7c8e49 to 162f01e Compare December 3, 2025 08:23

cosmo0920 temporarily deployed to pr December 3, 2025 08:23 — with GitHub Actions Inactive

cosmo0920 temporarily deployed to pr December 3, 2025 08:29 — with GitHub Actions Inactive

cosmo0920 temporarily deployed to pr December 3, 2025 08:33 — with GitHub Actions Inactive

cosmo0920 temporarily deployed to pr December 3, 2025 08:53 — with GitHub Actions Inactive

cosmo0920 force-pushed the cosmo0920-ripser-for-analysis branch from 7c7cad7 to 9f9d30b Compare December 3, 2025 11:47

cosmo0920 temporarily deployed to pr December 3, 2025 11:47 — with GitHub Actions Inactive

cosmo0920 temporarily deployed to pr December 22, 2025 04:46 — with GitHub Actions Inactive

cosmo0920 temporarily deployed to pr December 22, 2025 04:47 — with GitHub Actions Inactive

cosmo0920 added 17 commits December 22, 2025 15:03

lib: ripser: Provide C wrapper for ripser

6e9844a

Signed-off-by: Hiroshi Hatake <[email protected]>

tests: internal: ripser: Add internal test case for TDA library

5a59894

Signed-off-by: Hiroshi Hatake <[email protected]>

processor_tda: Make groups to construct point cloud for TDA calculations

b528678

Signed-off-by: Hiroshi Hatake <[email protected]>

processor_tda: Provide parameters for TDA process

36662c7

Signed-off-by: Hiroshi Hatake <[email protected]>

processor_tda: Extract structs into a header

c667b85

Signed-off-by: Hiroshi Hatake <[email protected]>

processor_tda: Make threshold configurable

1f9a411

Expose threshold as a quantile-based distance scale selector. Signed-off-by: Hiroshi Hatake <[email protected]>

lib: ripser: Fix MSVC errors in ripser's explicit template specializa…

aeb3f12

…tions Signed-off-by: Hiroshi Hatake <[email protected]>

dockerfiles: Fix CentOS 7 build for disabling ripser support

09f73a5

Signed-off-by: Hiroshi Hatake <[email protected]>

packaging: centos: Handle ripser support properly

1f2b865

Signed-off-by: Hiroshi Hatake <[email protected]>

tests: internal: ripser: Fix building errors on Windows

d7c2c89

Signed-off-by: Hiroshi Hatake <[email protected]>

processor_tda: Remove metrics suffix from its name

a38a0e5

This is because tda processor could support other types of processing. Especially, it's for traces. But now, it's only for metrics pipeline. Signed-off-by: Hiroshi Hatake <[email protected]>

build: Fix a typo

f2b2fec

Signed-off-by: Hiroshi Hatake <[email protected]>

processor_tda: Use precise value for dimension

c8b550d

Signed-off-by: Hiroshi Hatake <[email protected]>

processor_tda: Add a note

2f588d2

Signed-off-by: Hiroshi Hatake <[email protected]>

processor_tda: Plug allocated assignments after releasing

86b83e5

Signed-off-by: Hiroshi Hatake <[email protected]>

cosmo0920 force-pushed the cosmo0920-ripser-for-analysis branch from 657ca61 to 86b83e5 Compare December 22, 2025 06:03

cosmo0920 temporarily deployed to pr December 22, 2025 06:03 — with GitHub Actions Inactive

coderabbitai bot reviewed Dec 22, 2025

View reviewed changes

cosmo0920 temporarily deployed to pr December 22, 2025 06:23 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

processor_tda: Implement Topological Data Analysis (TDA) plugin for metrics #11250

processor_tda: Implement Topological Data Analysis (TDA) plugin for metrics #11250

Uh oh!

cosmo0920 commented Dec 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 3, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

processor_tda: Implement Topological Data Analysis (TDA) plugin for metrics #11250

Are you sure you want to change the base?

processor_tda: Implement Topological Data Analysis (TDA) plugin for metrics #11250

Uh oh!

Conversation

cosmo0920 commented Dec 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cosmo0920 commented Dec 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 3, 2025 •

edited

Loading