Pandas: one-column Boolean DataFrame vs Boolean Series
Caution when you apply any or all on a one-column Boolean DataFrame, or you should never apply these 2 functions to such a DataFrame because it works differently from your intention.
The logics of any and all are:
def all(iterable):
for element in iterable:
if not element:
return False
return True
def any(iterable):
for element in iterable:
if element:
return True
return False
The trickiest part with a DataFrame is that when being iterated, its elements are actually its column names, with types!
A column name can be a string (in most cases), an integer (acceptable) or even a Boolean (What?). Therefore when applying any() or all() to a DataFrame, you’re actually iterating its column names, which are then evaluated in the if element: clause, and at the same time NONE of the values underneath is accessed at all.
>>> foo = pd.DataFrame(data=[1,2])
>>> foo
0
0 1
1 2
>>> for element in foo:
... print(element)
...
0
>>> foo = pd.DataFrame(data=[1,2], columns=[True])
>>> foo
True
0 1
1 2
>>> for element in foo:
... print(element)
...
True
And this exactly explains the seemingly contradicting results in the following code:
>>> one_c_df = pd.DataFrame(data=[False, False], index=["a", "b"], columns=["a_colname_not_evaluated_to_False"])
>>> one_c_df
a_colname_not_evaluated_to_False
a False
b False
>>> any(one_c_df) # I cannot believe this at first sight!
True
>>> all(one_c_df)
True
>>> series = pd.Series(data=[False, False], index=["a", "b"], name="does_not_matter")
>>> series
a False
b False
Name: does_not_matter, dtype: bool
>>> any(series)
False
>>> all(series)
False
留下评论