Pandas 填充 NaN 值
Suraj Joshi
2023年1月30日
2021年1月22日
本教程解释了我们如何使用 DataFrame.fillna()
方法用指定的值填充 NaN 值。
我们将在本文中使用下面的 DataFrame。
import numpy as np
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
"Roll No": [501, 502, np.nan, 504, 505, 506],
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Income(in $)': [200, 400, np.nan, 30, np.nan, np.nan],
'Age': [17, 18, np.nan, 16, 18, np.nan]
})
print(student_df)
输出:
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 NaN Bob NaN NaN
3 504.0 Emma 30.0 16.0
4 505.0 Luna NaN 18.0
5 506.0 Anish NaN NaN
DataFrame.fillna()
方法
语法
DataFrame.fillna(value=None,
method=None,
axis=None,
inplace=False,
limit=None,
downcast=None)
DataFrame.fillna()
方法使我们能够用指定的值或方法来填充 DataFrame
中的 NaN
值。
使用 DataFrame.fillna()
方法用指定的值填充整个 DataFrame
import numpy as np
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
"Roll No": [501, 502, np.nan, 504, 505, 506],
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Income(in $)': [200, 400, np.nan, 30, np.nan, np.nan],
'Age': [17, 18, np.nan, 16, 18, np.nan]
})
filled_df = student_df.fillna(0)
print("DataFrame with NaN values")
print(student_df, "\n")
print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")
输出:
DataFrame with NaN values
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 NaN Bob NaN NaN
3 504.0 Emma 30.0 16.0
4 505.0 Luna NaN 18.0
5 506.0 Anish NaN NaN
After applying fillna() to the DataFrame:
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 0.0 Bob 0.0 0.0
3 504.0 Emma 30.0 16.0
4 505.0 Luna 0.0 18.0
5 506.0 Anish 0.0 0.0
它将 DataFrame student_df
中的所有 NaN
值替换为 0
,该值作为参数传递给 DataFrame.fillna()
方法。
import numpy as np
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
"Roll No": [501, 502, np.nan, 504, 505, 506],
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Income(in $)': [200, 400, np.nan, 30, np.nan, np.nan],
'Age': [17, 18, np.nan, 16, 18, np.nan]
})
filled_df = student_df.fillna(method='ffill')
print("DataFrame with NaN values")
print(student_df, "\n")
print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")
输出:
DataFrame with NaN values
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 NaN Bob NaN NaN
3 504.0 Emma 30.0 16.0
4 505.0 Luna NaN 18.0
5 506.0 Anish NaN NaN
After applying fillna() to the DataFrame:
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 502.0 Bob 400.0 18.0
3 504.0 Emma 30.0 16.0
4 505.0 Luna 30.0 18.0
5 506.0 Anish 30.0 18.0
它将所有 student_df
中的 NaN
值填入与 NaN
值相同列的 NaN
值之前的值。
用指定的值填充指定列的 NaN
值
为了用指定的值来填充特定的值,我们向 fillna()
方法传递一个字典,以列名作为键,以该列的 NaN
值作为值。
import numpy as np
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
"Roll No": [501, 502, np.nan, 504, 505, 506],
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Income(in $)': [200, 400, np.nan, 300, np.nan, np.nan],
'Age': [17, 18, np.nan, 16, 18, np.nan]
})
filled_df = student_df.fillna({'Age': 17, 'Income(in $)': 300})
print("DataFrame with NaN values")
print(student_df, "\n")
print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")
输出:
DataFrame with NaN values
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 NaN Bob NaN NaN
3 504.0 Emma 300.0 16.0
4 505.0 Luna NaN 18.0
5 506.0 Anish NaN NaN
After applying fillna() to the DataFrame:
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 NaN Bob 300.0 17.0
3 504.0 Emma 300.0 16.0
4 505.0 Luna 300.0 18.0
5 506.0 Anish 300.0 17.0
它将 Age
列中的所有 NaN
值填充为 17,将 Income(in $)
列中的所有 NaN
值填充为 300。Roll No
栏中的 NaN
值保持不变。
Author: Suraj Joshi
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn