-
Notifications
You must be signed in to change notification settings - Fork 483
[EMCAL] implementation of number of local maxima variable #14943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
|
REQUEST FOR PRODUCTION RELEASES: This will add The following labels are available |
Please consider the following formatting changes to AliceO2Group#14943
| struct CellInfo { | ||
| int row; | ||
| int column; | ||
| double energy; | ||
| }; | ||
|
|
||
| std::vector<CellInfo> cellInfos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is done for every cluster it might be better to use a struct of arrays here instead of an array of structs and using float for the energy and potentially short16_t for row and column (maybe even short8_t can be enough if I remember correctly). That should reduce memory usage and speed things up a little bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the arrays i can implement this. For the types I thought about this too, but the energy, row and column (in the cluster and in the geometry) are int and double. So to avoid warnings i would need to then cast each time before putting it in the struct (idk the performance loss because of that). For the energy i prefer to keep double, since the clusterizer etc uses double precision and i do not want to reduce the precision when searching for NLM to stay consistent for the energy comparisons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the change to short (kept energy as double) and also removed the struct. I tested and with O1 optimization indeed this changes significantly improves performance, however difference become less large for more aggressive optimizations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for fun I checked with godbolt to see how one can minimize the number of stack calls and stuff and came up with this (I used some fixed sized arrays since I did not want to import half our EMCal code to godbolt 😄):
// Pre-compute cell indices and energies for all cells in cluster to avoid multiple expensive geometry lookups
const size_t n = M;
int rows[64];
int columns[64];
double energies[64];
for(int iCell = 0; iCell < M; ++iCell){
rows[iCell] = static_cast<int>(Rows[iCell]);
columns[iCell] = static_cast<int>(Columns[iCell]);
energies[iCell] = (Energy[iCell]);
}
// Now find local maxima using pre-computed data
int nExMax = 0;
for (size_t i = 0; i < n; i++) {
// this cell is assumed to be local maximum unless we find a higher energy cell in the neighborhood
bool isExMax = true;
int ri = rows[i];
int ci = columns[i];
double ei = energies[i];
// loop over all other cells in cluster
for (size_t j = 0; j < n; j++) {
if (i == j){
continue;
}
double ej = energies[j];
if (ej <= ei) continue; // early rejection
int dr = ri - rows[j];
if (dr < -1 || dr > 1) continue;
int dc = ci - columns[j];
if (dc < -1 || dc > 1) continue;
isExMax = false;
break;
}
if (isExMax) {
nExMax++;
}
}
```
However, as you said, I think the biggest performance cost here are our geometry function calls, so I am unsure how much we can do here...Please consider the following formatting changes to AliceO2Group#14943
Tested locally for:
kV3Default: a few clusters with more than one NLM (as expected)
kV3NoSplit: the definition where average NLM is highest (as expected)
kV3MostSplit: NLM is always one (as expected).
to improve performance and reduce geometry calls, the calculation is performed only once for all cells in a given cluster