-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import sys
import pandas as pd
print(f"Python version: {sys.version}")
print(f"pandas version: {pd.__version__}")
print()
num_indices = 100000 # OK with 10,000; fails with 100,000
metrics = [
"apple",
"banana",
]
data_rows = []
for idx in range(num_indices):
data_rows.append({"idx": idx, "metric": "apple", "value": 2 * idx})
data_rows.append({"idx": idx, "metric": "banana", "value": 3 * idx})
data_rows.append({"idx": idx, "metric": "coconut", "value": 4 * idx})
df = pd.DataFrame(data_rows)
print(f"Generated dataset: {len(df):,} rows")
print(f"Expected rows after pivot: {num_indices:,}")
print()
print("Pivoting data...")
pivoted = df.pivot_table(
index=["idx"],
columns="metric",
values="value",
aggfunc="first",
)
print("After pivot:")
print(f" Total rows: {len(pivoted):,}")
print(f" Unique indices: {pivoted.index.nunique():,}")
print(f" Has duplicate indices: {pivoted.index.duplicated().any()}")
if pivoted.index.duplicated().any():
print(" BUG: DUPLICATE INDICES")
print()
print("Example duplicates:")
dup_indices = pivoted.index[pivoted.index.duplicated(keep=False)]
for idx in dup_indices.unique()[:3]:
print(pivoted.loc[idx])
print()
else:
print()
print("OK")
status = 0 if not pivoted.index.duplicated().any() else 1
sys.exit(status)Issue Description
With Python 3.14, the pivot_table function gives a corrupted output when the input is large. On smaller input (fewer rows or columns), the output is correct. The example code shows duplicated index values. In my production application, I see both missing output rows and duplicated index values.
With Python 3.13, the pivot_table function always gives a correct output.
I'm testing on pandas 2.3.3 and 3.0.0rc0+13.g8be8439bce.
Here is the failing output from the test program:
joshuanapoli@mac cvec-data-analysis % poetry run python pandas_bug_report.py
Python version: 3.14.2 (main, Dec 5 2025, 16:49:16) [Clang 17.0.0 (clang-1700.4.4.1)]
pandas version: 3.0.0rc0+13.g8be8439bce
Generated dataset: 300,000 rows
Expected rows after pivot: 100,000
Pivoting data...
After pivot:
Total rows: 100,000
Unique indices: 33,334
Has duplicate indices: True
BUG: DUPLICATE INDICES
Example duplicates:
metric apple banana coconut
idx
1 2 3 4
1 4 6 8
1 6 9 12
metric apple banana coconut
idx
2 8 12 16
2 10 15 20
2 12 18 24
metric apple banana coconut
idx
3 14 21 28
3 16 24 32
3 18 27 36
Expected Behavior
Python version: 3.13.3 (main, Apr 8 2025, 13:54:08) [Clang 16.0.0 (clang-1600.0.26.6)]
pandas version: 3.0.0rc0+13.g8be8439bce
Generated dataset: 300,000 rows
Expected rows after pivot: 100,000
Pivoting data...
After pivot:
Total rows: 100,000
Unique indices: 100,000
Has duplicate indices: False
OK
Installed Versions
INSTALLED VERSIONS
commit : 8be8439
python : 3.14.2
python-bits : 64
OS : Darwin
OS-release : 25.1.0
Version : Darwin Kernel Version 25.1.0: Mon Oct 20 19:34:05 PDT 2025; root:xnu-12377.41.6~2/RELEASE_ARM64_T6041
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : C.UTF-8
pandas : 3.0.0rc0+13.g8be8439bce
numpy : 1.26.4
dateutil : 2.9.0.post0
pip : 25.0.1
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : 3.10.7
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.5
psycopg2 : None
pymysql : None
pyarrow : 22.0.0
pyiceberg : None
pyreadstat : None
pytest : 9.0.2
python-calamine : None
pytz : None
pyxlsb : None
s3fs : None
scipy : 1.16.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
qtpy : None
pyqt5 : None