-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
mainhere
Location of the documentation
https://pandas.pydata.org/docs/dev/user_guide/copy_on_write.html#copy-on-write
Documentation problem
Looking at the first example code given, the docu basically says that df["foo"].iloc[0] = 100 no longer works. It spends some text to explain why not and then tells the user:
This statement can be rewritten into a single statement with loc or iloc if this behavior is necessary. DataFrame.where() is another suitable alternative for this case.
I don't think this is a sufficient "the pandas 3 fix is this ....".
It is unclear to me how I would solve this with a where or loc statement in a non-ugly way. Also it usually isn't straight forward to go to iloc - the docu could mention df.columns.get_loc here.
It would be great to see a "suggested pandas 3 replacement" for this line that one can simply copy.
Note also that I don't particularly like my suggested fix - especially I dislike the solution with loc and where. Those bool arrays are ugly as hell - while the iloc one seems ok.
To summarize: It would be great to see a suggestion how to best do the assignment when you have a named column but an iloc row.
Suggested fix for documentation
To set a single value in the n-th row and column "foo" in Pandas 3.0, you need to rewrite the code into a single .loc or .iloc statement like this:
df.iloc[n, df.columns.get_loc("foo")] = 100
You can also use loc or where like this:
df.loc[[*[False] * n, True, *[False] * (len(df) - 1 - n)], "foo"] = 100
df["foo"] = df["foo"].where([*[True] * n, False, *[True] * (len(df) - 1 - n)], 100)
Note that assigning values to single cells usually isn't best in terms of efficiency.