Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 20, 2025

Synthetic control methods construct counterfactuals using non-negative weights summing to one—a convex combination. This mathematically constrains predictions to lie within the convex hull of control units. When the treated unit's pre-intervention trajectory falls outside this hull (e.g., consistently above or below all controls), the method cannot produce accurate counterfactuals.

Changes

Core diagnostic

  • Added check_convex_hull_violation() utility that validates treated unit values fall within control range at each time point
  • Integrated automatic check into SyntheticControl.__init__() that warns when violated:
result = cp.SyntheticControl(
    df,
    treatment_time,
    control_units=["a", "b", "c"],
    treated_units=["treated"],
    model=cp.pymc_models.WeightedSumFitter(...)
)
# UserWarning: Convex hull assumption may be violated: 30 pre-intervention 
# time points (100.0% above, 0.0% below control range). Consider: (1) adding 
# more diverse control units, (2) using ITS with intercept, or (3) using 
# Augmented Synthetic Control Method.

Testing

  • 5 unit tests covering violation scenarios (above/below/both/boundary/pass cases)
  • 2 integration tests verifying warning behavior in actual usage

Documentation

  • Brief subsection in sc_pymc.ipynb introducing the concept
  • Detailed pedagogical section in sc_pymc_brexit.ipynb with mathematical explanation, visualization code, and alternatives when violated
  • Glossary term "Convex hull condition" with citation to Abadie et al. (2010)
  • Added references: Abadie et al. (2010), Ben-Michael et al. (2021)
Original prompt

This section details on the original issue you should resolve

<issue_title>Add Diagnostic Test for Convex Hull Assumption in Synthetic Control</issue_title>
<issue_description>## Problem Description

Background

The synthetic control method constructs a counterfactual by finding a weighted combination of control units that best approximates the treated unit in the pre-intervention period. In CausalPy, this is implemented via the WeightedSumFitter (Bayesian) and WeightedProportion (OLS) models, both of which impose:

  1. Non-negativity constraint: All weights β_i ≥ 0
  2. Sum-to-one constraint: Σ β_i = 1

These constraints mean the synthetic control prediction μ is a convex combination of the control units:

$$\mu = \sum_{i=1}^{n} \beta_i x_i \quad \text{where } \beta_i \geq 0 \text{ and } \sum_{i=1}^{n} \beta_i = 1$$

The Convex Hull Assumption

By definition, a convex combination can only produce values that lie within the convex hull of the input points. In the context of synthetic control, this means:

The treated unit's pre-intervention outcomes must be expressible as a weighted average of the control units' outcomes at each time point.

When this assumption is violated:

  • If all control series are above the treated series → the minimum achievable synthetic control value is the smallest control value (putting all weight on the lowest control), which is still too high
  • If all control series are below the treated series → the maximum achievable value is the largest control value (putting all weight on the highest control), which is still too low

In either case, no valid convex combination can match the treated unit's trajectory, leading to:

  • Poor pre-intervention fit (low R²)
  • Biased treatment effect estimates
  • Unreliable counterfactual projections

Current State

CausalPy does not currently:

  1. Check whether the convex hull assumption is satisfied
  2. Warn users when the assumption is violated
  3. Provide educational content about this critical assumption

Proposed Solution

1. Implement Diagnostic Test

Add a function to check whether the treated unit's pre-intervention values fall within the convex hull of the control units at each time point. A simplified but effective approach:

def check_convex_hull_violation(
    treated_series: np.ndarray, 
    control_matrix: np.ndarray
) -> dict:
    """
    Check if treated series values fall within the range of control series.
    
    For each time point, verify that:
    min(controls) <= treated <= max(controls)
    
    This is a necessary (but not sufficient) condition for the treated unit
    to lie within the convex hull of control units.
    
    Returns:
        dict with keys:
        - 'passes': bool - whether the check passes
        - 'n_violations': int - number of time points with violations
        - 'pct_above': float - percentage of points where treated > max(controls)
        - 'pct_below': float - percentage of points where treated < min(controls)
    """
    control_min = control_matrix.min(axis=1)
    control_max = control_matrix.max(axis=1)
    
    above = treated_series > control_max
    below = treated_series < control_min
    
    n_points = len(treated_series)
    return {
        'passes': not (above.any() or below.any()),
        'n_violations': above.sum() + below.sum(),
        'pct_above': 100 * above.sum() / n_points,
        'pct_below': 100 * below.sum() / n_points,
    }

2. Issue Warning with Instructive Message

In SyntheticControl.__init__(), after preparing the data but before fitting, run the diagnostic and issue a warning if violated:

import warnings

# Check convex hull assumption
hull_check = check_convex_hull_violation(
    self.datapre_treated.values.flatten(),
    self.datapre_control.values
)

if not hull_check['passes']:
    warnings.warn(
        f"Convex hull assumption may be violated: {hull_check['n_violations']} "
        f"pre-intervention time points ({hull_check['pct_above']:.1f}% above, "
        f"{hull_check['pct_below']:.1f}% below control range). "
        "The synthetic control method requires the treated unit to lie within "
        "the convex hull of control units. Consider: (1) adding more diverse "
        "control units, (2) using a model with an intercept (e.g., ITS with "
        "control predictors), or (3) using the Augmented Synthetic Control Method. "
        "See glossary term 'Convex hull condition' for more details.",
        UserWarning,
        stacklevel=2
    )

3. Update Notebooks

sc_pymc.ipynb

Add a brief subsection titled "Convex Hull Assumption" that:

  • Explains the constraint imposed by non-negative weights summing to one
  • Notes that this means the treated unit must lie within the "range" of control units
  • Links to the glossary term

sc_pymc_brexit.ipynb

Add a more detailed pedagogical subsection titled "Understanding the Convex Hull Assumption" t...


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Add diagnostic test for convex hull assumption Add convex hull violation diagnostic for synthetic control methods Dec 20, 2025
Copilot AI requested a review from drbenvincent December 20, 2025 11:56
@drbenvincent
Copy link
Collaborator

bugbot run

@cursor
Copy link

cursor bot commented Dec 20, 2025

PR Summary

Adds check_convex_hull_violation() and integrates a pre-fit convex hull warning into SyntheticControl, with tests and glossary docs.

  • Synthetic Control:
    • Adds pre-fit convex hull check in causalpy/experiments/synthetic_control.py using check_convex_hull_violation; issues a UserWarning when violated.
  • Utilities:
    • Introduces check_convex_hull_violation() in causalpy/utils.py to detect treated series outside control range; returns pass/violation stats.
  • Tests:
    • Adds unit tests in causalpy/tests/test_utils.py covering pass/above/below/both/boundary cases.
    • Adds integration tests in causalpy/tests/test_input_validation.py verifying warning emission and no-warning scenarios in SyntheticControl.
  • Docs:
    • Adds glossary entry for "Convex hull condition" in docs/source/knowledgebase/glossary.rst.

Written by Cursor Bugbot for commit 8f8d79d. This will update automatically on new commits. Configure here.

@drbenvincent drbenvincent marked this pull request as ready for review December 20, 2025 12:51
The interrogate coverage badge in the documentation was updated to reflect a new coverage value of 96.4%.
@codecov
Copy link

codecov bot commented Dec 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.38%. Comparing base (2d6bba7) to head (6a22940).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #599      +/-   ##
==========================================
+ Coverage   93.27%   93.38%   +0.11%     
==========================================
  Files          37       37              
  Lines        5632     5733     +101     
  Branches      367      370       +3     
==========================================
+ Hits         5253     5354     +101     
  Misses        248      248              
  Partials      131      131              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@drbenvincent drbenvincent added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 20, 2025
@drbenvincent
Copy link
Collaborator

drbenvincent commented Dec 20, 2025

Note to self: the "What to Do if Violated" section needs a bit more work.

Added a check in check_convex_hull_violation to safely handle empty treated and control arrays, returning default pass results. Added a corresponding test to ensure correct behavior for this edge case.
Reworded suggestions for alternative methods in the synthetic control discussion to improve clarity and consistency. Adjusted phrasing for Augmented Synthetic Control and Comparative Interrupted Time Series methods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Diagnostic Test for Convex Hull Assumption in Synthetic Control

2 participants