Skip to content

BUG: large pivot_table has incorrect output with Python 3.14 #63314

@joshuanapoli

Description

@joshuanapoli

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import sys

import pandas as pd

print(f"Python version: {sys.version}")
print(f"pandas version: {pd.__version__}")
print()

num_indices = 100000  # OK with 10,000; fails with 100,000
metrics = [
    "apple",
    "banana",
]

data_rows = []
for idx in range(num_indices):
    data_rows.append({"idx": idx, "metric": "apple", "value": 2 * idx})
    data_rows.append({"idx": idx, "metric": "banana", "value": 3 * idx})
    data_rows.append({"idx": idx, "metric": "coconut", "value": 4 * idx})

df = pd.DataFrame(data_rows)

print(f"Generated dataset: {len(df):,} rows")
print(f"Expected rows after pivot: {num_indices:,}")
print()

print("Pivoting data...")
pivoted = df.pivot_table(
    index=["idx"],
    columns="metric",
    values="value",
    aggfunc="first",
)

print("After pivot:")
print(f"  Total rows: {len(pivoted):,}")
print(f"  Unique indices: {pivoted.index.nunique():,}")
print(f"  Has duplicate indices: {pivoted.index.duplicated().any()}")

if pivoted.index.duplicated().any():
    print("  BUG: DUPLICATE INDICES")
    print()
    print("Example duplicates:")
    dup_indices = pivoted.index[pivoted.index.duplicated(keep=False)]
    for idx in dup_indices.unique()[:3]:
        print(pivoted.loc[idx])
        print()
else:
    print()
    print("OK")

status = 0 if not pivoted.index.duplicated().any() else 1
sys.exit(status)

Issue Description

With Python 3.14, the pivot_table function gives a corrupted output when the input is large. On smaller input (fewer rows or columns), the output is correct. The example code shows duplicated index values. In my production application, I see both missing output rows and duplicated index values.

With Python 3.13, the pivot_table function always gives a correct output.

I'm testing on pandas 2.3.3 and 3.0.0rc0+13.g8be8439bce.

Here is the failing output from the test program:

joshuanapoli@mac cvec-data-analysis % poetry run python pandas_bug_report.py
Python version: 3.14.2 (main, Dec  5 2025, 16:49:16) [Clang 17.0.0 (clang-1700.4.4.1)]
pandas version: 3.0.0rc0+13.g8be8439bce

Generated dataset: 300,000 rows
Expected rows after pivot: 100,000

Pivoting data...
After pivot:
  Total rows: 100,000
  Unique indices: 33,334
  Has duplicate indices: True
  BUG: DUPLICATE INDICES

Example duplicates:
metric  apple  banana  coconut
idx
1           2       3        4
1           4       6        8
1           6       9       12

metric  apple  banana  coconut
idx
2           8      12       16
2          10      15       20
2          12      18       24

metric  apple  banana  coconut
idx
3          14      21       28
3          16      24       32
3          18      27       36

Expected Behavior

Python version: 3.13.3 (main, Apr 8 2025, 13:54:08) [Clang 16.0.0 (clang-1600.0.26.6)]
pandas version: 3.0.0rc0+13.g8be8439bce

Generated dataset: 300,000 rows
Expected rows after pivot: 100,000

Pivoting data...
After pivot:
Total rows: 100,000
Unique indices: 100,000
Has duplicate indices: False

OK

Installed Versions

INSTALLED VERSIONS

commit : 8be8439
python : 3.14.2
python-bits : 64
OS : Darwin
OS-release : 25.1.0
Version : Darwin Kernel Version 25.1.0: Mon Oct 20 19:34:05 PDT 2025; root:xnu-12377.41.6~2/RELEASE_ARM64_T6041
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : C.UTF-8

pandas : 3.0.0rc0+13.g8be8439bce
numpy : 1.26.4
dateutil : 2.9.0.post0
pip : 25.0.1
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : 3.10.7
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.5
psycopg2 : None
pymysql : None
pyarrow : 22.0.0
pyiceberg : None
pyreadstat : None
pytest : 9.0.2
python-calamine : None
pytz : None
pyxlsb : None
s3fs : None
scipy : 1.16.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions