admin 管理员组

文章数量: 887021

pandas

df.dropna()函数用于删除dataframe数据中的缺失数据,即 删除NaN数据.

官方函数说明:

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)Remove missing values.See the User Guide for more on which values are considered missing, and how to work with missing data.ReturnsDataFrameDataFrame with NA entries dropped from it.

参数说明:

Parameters说明
axis0为行 1为列,default 0,数据删除维度
how{‘any’, ‘all’}, default ‘any’,any:删除带有nan的行;all:删除全为nan的行
threshint,保留至少 int 个非nan行
subsetlist,在特定列缺失值处理
inplacebool,是否修改源文件

测试:

>>>df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],"toy": [np.nan, 'Batmobile', 'Bullwhip'],"born": [pd.NaT, pd.Timestamp("1940-04-25"),pd.NaT]})
>>>dfname        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

删除至少缺少一个元素的行:

>>>df.dropna()name        toy       born
1  Batman  Batmobile 1940-04-25

删除至少缺少一个元素的列:

>>>df.dropna(axis=1)name
0    Alfred
1    Batman
2  Catwoman

删除所有元素丢失的行:

>>>df.dropna(how='all')name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

只保留至少2个非NA值的行:

>>>df.dropna(thresh=2)name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

从特定列中查找缺少的值:

>>>df.dropna(subset=['name', 'born'])name        toy       born
1    Batman  Batmobile 1940-04-25

修改原数据:

>>>df.dropna(inplace=True)
>>>dfname        toy       born
1  Batman  Batmobile 1940-04-25

以上。

本文标签: pandas