Score set search returns fewer results than expected due to row multiplication in query

The `/score-sets/search` endpoint returns fewer score sets than expected. A search with no filters returns only ~84 results on staging/prod instead of the expected 100 (the `SCORE_SET_SEARCH_MAX_LIMIT`).

**Root cause**
There are two bugs in `search_score_sets` in `src/mavedb/lib/score_sets.py`.

**1. Row multiplication from experiment-level `joinedload` inside `contains_eager`**
The search query uses contains_eager(`ScoreSet.experiment`) with nested `joinedload` calls for one-to-many experiment relationships (lines 241-253):

- `Experiment.keyword_objs`
- `Experiment.doi_identifiers`
- `Experiment.publication_identifier_associations`
- `Experiment.raw_read_identifiers`

Because these are nested inside `contains_eager`, SQLAlchemy's subquery-wrapping logic (which normally protects `LIMIT` from row multiplication when using `joinedload` on collections) does not apply. Experiments with multiple keywords, publications, or DOIs produce multiple raw SQL rows per score set. The `LIMIT` 101 (limit + 1) **is applied to these raw rows, not to unique score sets**.

For example, if 15 of the first score sets (ordered by Experiment.title) belong to experiments with extra metadata, each consumes 2+ rows from the 101-row budget, resulting in only ~84-86 unique score sets after deduplication.

The count logic at line 290 then checks if `num_score_sets > offset + search.limit` to decide whether to run a count query. Since it only got 84 entities (less than 100), it concludes there are no more results and reports `num_score_sets = 84`.

**2. Count query discards filtered query**
If the count query path were reached (lines 294-296), it has a separate bug:

```
count_query = db.query(ScoreSet)
build_search_score_sets_query_filter(db, count_query, owner_or_contributor, search)
num_score_sets = count_query.order_by(None).limit(None).count()
```

`build_search_score_sets_query_filter` returns the modified query, but the return value is discarded. `count_query` remains unfiltered and would count all score sets in the database.

**Expected behavior**
A search with no filters should return up to `SCORE_SET_SEARCH_MAX_LIMIT` (100) score sets per page, with `num_score_sets` reflecting the true total count of matching results.

**Suggested fix**
Switch the experiment-level one-to-many `joinedload` calls (inside `contains_eager`) to selectinload, which uses separate queries and avoids row multiplication in the main query.
Capture the return value of `build_search_score_sets_query_filter` for the count query:

```
count_query = db.query(ScoreSet)
count_query = build_search_score_sets_query_filter(db, count_query, owner_or_contributor, search)
```

**Note:** Take care when testing on local dev instances, which may not have the metadata relationships which cause this bug to surface on staging / prod. Any fix should be tested carefully on staging before PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Score set search returns fewer results than expected due to row multiplication in query #675

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Score set search returns fewer results than expected due to row multiplication in query #675

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions