Skip to content

BUG(pandas 3.0 regression): drop(index=...) doesn't accept NA values when using arrow dtype in index #63304

@tswast

Description

@tswast

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pyarrow as pa
import pandas as pd

df = pd.DataFrame(
    index=pd.Index(
        [None, b'\xe3', b'\xe3'],
        dtype='binary[pyarrow]',
        name='bytes_col'
    )
)
df.drop(index=[pd.NA])

Issue Description

Error in pandas 3.0.0rc0

---> 20 df.drop(index=[pd.NA, b'G\xc3\xbcten Tag'])                                                                                                                                      
                                                                                                                                                                                         
File ~/src/github.com/googleapis/python-bigquery-dataframes-2/venv/lib/python3.13/site-packages/pandas/core/frame.py:5863, in DataFrame.drop(self, labels, axis, index, columns, level, i
nplace, errors)                                                                                                                                                                          
   5705 def drop(                                                                                                                                                                        
   5706     self,                                                                                                                                                                        
   5707     labels: IndexLabel | ListLike = None,                                                                                                                                        
   (...)   5714     errors: IgnoreRaise = "raise",                                                                                                                                       
   5715 ) -> DataFrame | None:                                                                                                                                                           
   5716     """                                                                                                                                                                          
   5717     Drop specified labels from rows or columns.                                                                                                                                  
   5718                                                                                                                                                                                  
   (...)   5861             weight  1.0     0.8                                                                                                                                          
   5862     """                                                                                                                                                                          
-> 5863     return super().drop(                                                                                                                                                         
   5864         labels=labels,                                                                                                                                                           
   5865         axis=axis,                                                                                                                                                               
   5866         index=index,                                                                                                                                                             
   5867         columns=columns,                                                                                                                                                         
   5868         level=level,                                                                                                                                                             
   5869         inplace=inplace,                                                                                                                                                         
   5870         errors=errors,                                                                                                                                                           
   5871     )                                                                                                                                                                            
                                                                                                                                                                                         
File ~/src/github.com/googleapis/python-bigquery-dataframes-2/venv/lib/python3.13/site-packages/pandas/core/generic.py:4607, in NDFrame.drop(self, labels, axis, index, columns, level, i
nplace, errors) 
   4605 for axis, labels in axes.items():
   4606     if labels is not None:
-> 4607         obj = obj._drop_axis(labels, axis, level=level, errors=errors)
   4609 if inplace:
   4610     self._update_inplace(obj)

File ~/src/github.com/googleapis/python-bigquery-dataframes-2/venv/lib/python3.13/site-packages/pandas/core/generic.py:4674, in NDFrame._drop_axis(self, labels, axis, level, errors, only_slice)
   4672     mask = ~axis.get_level_values(0).isin(labels)
   4673 else:
-> 4674     mask = ~axis.isin(labels)
   4675     # Check if label doesn't exist along axis
   4676     labels_missing = (axis.get_indexer_for(labels) == -1).any()

File ~/src/github.com/googleapis/python-bigquery-dataframes-2/venv/lib/python3.13/site-packages/pandas/core/indexes/base.py:6632, in Index.isin(self, values, level)
   6630 if level is not None:
   6631     self._validate_index_level(level)
-> 6632 return algos.isin(self._values, values)

File ~/src/github.com/googleapis/python-bigquery-dataframes-2/venv/lib/python3.13/site-packages/pandas/core/algorithms.py:546, in isin(comps, values)
    543 comps_array = extract_array(comps_array, extract_numpy=True)
    544 if not isinstance(comps_array, np.ndarray):
    545     # i.e. Extension Array
--> 546     return comps_array.isin(values)
    548 elif needs_i8_conversion(comps_array.dtype):
    549     # Dispatch to DatetimeLikeArrayMixin.isin
    550     return pd_array(comps_array).isin(values)

File ~/src/github.com/googleapis/python-bigquery-dataframes-2/venv/lib/python3.13/site-packages/pandas/core/arrays/arrow/array.py:1418, in ArrowExtensionArray.isin(self, values)
   1415 if not len(values):
   1416     return np.zeros(len(self), dtype=bool)
-> 1418 result = pc.is_in(self._pa_array, value_set=pa.array(values))
   1419 # pyarrow 2.0.0 returned nulls, so we explicitly specify dtype to convert nulls
   1420 # to False
   1421 return np.array(result, dtype=np.bool_)

File ~/src/github.com/googleapis/python-bigquery-dataframes-2/venv/lib/python3.13/site-packages/pyarrow/array.pxi:365, in pyarrow.lib.array()

File ~/src/github.com/googleapis/python-bigquery-dataframes-2/venv/lib/python3.13/site-packages/pyarrow/array.pxi:91, in pyarrow.lib._ndarray_to_array()

File ~/src/github.com/googleapis/python-bigquery-dataframes-2/venv/lib/python3.13/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()

ArrowInvalid: Could not convert <NA> with type NAType: did not recognize Python value type when inferring an Arrow data type

This could be related to the NA != None anymore deprecation, but unfortunately in this Index reports NA as the missing values, not None, so I don't know how else to specify such values in a pandas-y way.

Expected Behavior

In pandas 2.3.3, this returns the following:

                                                   a      b
bytes_col                                                  
b'Hello, World!'                                   1   True
b'\xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81...  2  False
b'\xc2\xa1Hola Mundo!'                             3   None
b'\xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81...  5  False
b'Hello\tBigFrames!\x07'                           7   True

Installed Versions

In [4]: pd.show_versions()

INSTALLED VERSIONS

commit : 1a3230d
python : 3.13.7
python-bits : 64
OS : Linux
OS-release : 6.16.12-1rodete1-amd64
Version : #1 SMP PREEMPT_DYNAMIC Debian 6.16.12-1rodete1 (2025-10-16)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 3.0.0rc0
numpy : 2.3.5
dateutil : 2.9.0.post0
pip : 25.3
Cython : None
sphinx : None
IPython : 9.8.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : 2025.12.0
html5lib : None
hypothesis : None
gcsfs : 2025.12.0
jinja2 : None
lxml.etree : None
matplotlib : 3.10.7
numba : None
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : 22.0.0
pyiceberg : None
pyreadstat : None
pytest : 9.0.2
python-calamine : None
pytz : 2025.2
pyxlsb : None
s3fs : None
scipy : 1.16.3
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arrowpyarrow functionalityBugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions