Skip to content

Score set search returns fewer results than expected due to row multiplication in query #675

@bencap

Description

@bencap

The /score-sets/search endpoint returns fewer score sets than expected. A search with no filters returns only ~84 results on staging/prod instead of the expected 100 (the SCORE_SET_SEARCH_MAX_LIMIT).

Root cause
There are two bugs in search_score_sets in src/mavedb/lib/score_sets.py.

1. Row multiplication from experiment-level joinedload inside contains_eager
The search query uses contains_eager(ScoreSet.experiment) with nested joinedload calls for one-to-many experiment relationships (lines 241-253):

  • Experiment.keyword_objs
  • Experiment.doi_identifiers
  • Experiment.publication_identifier_associations
  • Experiment.raw_read_identifiers

Because these are nested inside contains_eager, SQLAlchemy's subquery-wrapping logic (which normally protects LIMIT from row multiplication when using joinedload on collections) does not apply. Experiments with multiple keywords, publications, or DOIs produce multiple raw SQL rows per score set. The LIMIT 101 (limit + 1) is applied to these raw rows, not to unique score sets.

For example, if 15 of the first score sets (ordered by Experiment.title) belong to experiments with extra metadata, each consumes 2+ rows from the 101-row budget, resulting in only ~84-86 unique score sets after deduplication.

The count logic at line 290 then checks if num_score_sets > offset + search.limit to decide whether to run a count query. Since it only got 84 entities (less than 100), it concludes there are no more results and reports num_score_sets = 84.

2. Count query discards filtered query
If the count query path were reached (lines 294-296), it has a separate bug:

count_query = db.query(ScoreSet)
build_search_score_sets_query_filter(db, count_query, owner_or_contributor, search)
num_score_sets = count_query.order_by(None).limit(None).count()

build_search_score_sets_query_filter returns the modified query, but the return value is discarded. count_query remains unfiltered and would count all score sets in the database.

Expected behavior
A search with no filters should return up to SCORE_SET_SEARCH_MAX_LIMIT (100) score sets per page, with num_score_sets reflecting the true total count of matching results.

Suggested fix
Switch the experiment-level one-to-many joinedload calls (inside contains_eager) to selectinload, which uses separate queries and avoids row multiplication in the main query.
Capture the return value of build_search_score_sets_query_filter for the count query:

count_query = db.query(ScoreSet)
count_query = build_search_score_sets_query_filter(db, count_query, owner_or_contributor, search)

Note: Take care when testing on local dev instances, which may not have the metadata relationships which cause this bug to surface on staging / prod. Any fix should be tested carefully on staging before PR.

Metadata

Metadata

Assignees

Labels

app: backendTask implementation touches the backendtype: bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions