Skip to content

Conversation

@fjonasALICE
Copy link
Contributor

  • finally implemented number of local maxima variable calculation, which so far was filled with a dummy value
  • the calculation handles correctly gaps between SMs and clusters that span across multiple SMs by implementing a new geometry function to get a global column and row. This function is ported from the clusterizer itself, and introduces artificial column gaps to correctly reflect large gaps between SMs. This means the behavior of the NLM calculation is consistent with clusterization code. For either access, I ported the function to the geometry class

Tested locally for:
kV3Default: a few clusters with more than one NLM (as expected)
kV3NoSplit: the definition where average NLM is highest (as expected)
kV3MostSplit: NLM is always one (as expected).

to improve performance and reduce geometry calls, the calculation is performed only once for all cells in a given cluster

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

fjonasALICE added a commit to fjonasALICE/AliceO2 that referenced this pull request Jan 6, 2026
Please consider the following formatting changes to AliceO2Group#14943
Comment on lines 502 to 508
struct CellInfo {
int row;
int column;
double energy;
};

std::vector<CellInfo> cellInfos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is done for every cluster it might be better to use a struct of arrays here instead of an array of structs and using float for the energy and potentially short16_t for row and column (maybe even short8_t can be enough if I remember correctly). That should reduce memory usage and speed things up a little bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the arrays i can implement this. For the types I thought about this too, but the energy, row and column (in the cluster and in the geometry) are int and double. So to avoid warnings i would need to then cast each time before putting it in the struct (idk the performance loss because of that). For the energy i prefer to keep double, since the clusterizer etc uses double precision and i do not want to reduce the precision when searching for NLM to stay consistent for the energy comparisons

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the change to short (kept energy as double) and also removed the struct. I tested and with O1 optimization indeed this changes significantly improves performance, however difference become less large for more aggressive optimizations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for fun I checked with godbolt to see how one can minimize the number of stack calls and stuff and came up with this (I used some fixed sized arrays since I did not want to import half our EMCal code to godbolt 😄):

// Pre-compute cell indices and energies for all cells in cluster to avoid multiple expensive geometry lookups
    const size_t n = M;
    int rows[64];
    int columns[64];
    double energies[64];

    for(int iCell = 0; iCell < M; ++iCell){
        rows[iCell] = static_cast<int>(Rows[iCell]);
        columns[iCell] = static_cast<int>(Columns[iCell]);
        energies[iCell] = (Energy[iCell]);
    }

    // Now find local maxima using pre-computed data
    int nExMax = 0;
    for (size_t i = 0; i < n; i++) {
        // this cell is assumed to be local maximum unless we find a higher energy cell in the neighborhood
        bool isExMax = true;

        int ri = rows[i];
        int ci = columns[i];
        double ei = energies[i];

        // loop over all other cells in cluster
        for (size_t j = 0; j < n; j++) {
            if (i == j){
                continue;
            }

            double ej = energies[j];
            if (ej <= ei) continue;  // early rejection

            int dr = ri - rows[j];
            if (dr < -1 || dr > 1) continue;

            int dc = ci - columns[j];
            if (dc < -1 || dc > 1) continue;

            isExMax = false;
            break;
        }
        if (isExMax) {
            nExMax++;
        }
    }
    ```
    
    However, as you said, I think the biggest performance cost here are our geometry function calls, so I am unsure how much we can do here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants