Pandas 填充 NaN 值

Suraj Joshi 2023年1月30日 2021年1月22日
  1. DataFrame.fillna() 方法
  2. 使用 DataFrame.fillna() 方法用指定的值填充整个 DataFrame
  3. 用指定的值填充指定列的 NaN
Pandas 填充 NaN 值

本教程解释了我们如何使用 DataFrame.fillna() 方法用指定的值填充 NaN 值。

我们将在本文中使用下面的 DataFrame。

import numpy as np
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame({
    "Roll No": [501, 502, np.nan, 504, 505, 506],
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Income(in $)': [200, 400, np.nan, 30, np.nan, np.nan],
    'Age': [17, 18, np.nan, 16, 18, np.nan]
})

print(student_df)

输出:

   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob           NaN   NaN
3    504.0      Emma          30.0  16.0
4    505.0      Luna           NaN  18.0
5    506.0     Anish           NaN   NaN

DataFrame.fillna() 方法

语法

DataFrame.fillna(value=None, 
                 method=None, 
                 axis=None, 
                 inplace=False, 
                 limit=None, 
                 downcast=None)

DataFrame.fillna() 方法使我们能够用指定的值或方法来填充 DataFrame 中的 NaN 值。

使用 DataFrame.fillna() 方法用指定的值填充整个 DataFrame

import numpy as np
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame({
    "Roll No": [501, 502, np.nan, 504, 505, 506],
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Income(in $)': [200, 400, np.nan, 30, np.nan, np.nan],
    'Age': [17, 18, np.nan, 16, 18, np.nan]
})
filled_df = student_df.fillna(0)

print("DataFrame with NaN values")
print(student_df, "\n")

print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")

输出:

DataFrame with NaN values
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob           NaN   NaN
3    504.0      Emma          30.0  16.0
4    505.0      Luna           NaN  18.0
5    506.0     Anish           NaN   NaN 

After applying fillna() to the DataFrame:
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      0.0       Bob           0.0   0.0
3    504.0      Emma          30.0  16.0
4    505.0      Luna           0.0  18.0
5    506.0     Anish           0.0   0.0 

它将 DataFrame student_df 中的所有 NaN 值替换为 0,该值作为参数传递给 DataFrame.fillna() 方法。

import numpy as np
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame({
    "Roll No": [501, 502, np.nan, 504, 505, 506],
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Income(in $)': [200, 400, np.nan, 30, np.nan, np.nan],
    'Age': [17, 18, np.nan, 16, 18, np.nan]
})
filled_df = student_df.fillna(method='ffill')

print("DataFrame with NaN values")
print(student_df, "\n")

print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")

输出:

DataFrame with NaN values
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob           NaN   NaN
3    504.0      Emma          30.0  16.0
4    505.0      Luna           NaN  18.0
5    506.0     Anish           NaN   NaN 

After applying fillna() to the DataFrame:
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2    502.0       Bob         400.0  18.0
3    504.0      Emma          30.0  16.0
4    505.0      Luna          30.0  18.0
5    506.0     Anish          30.0  18.0 

它将所有 student_df 中的 NaN 值填入与 NaN 值相同列的 NaN 值之前的值。

用指定的值填充指定列的 NaN

为了用指定的值来填充特定的值,我们向 fillna() 方法传递一个字典,以列名作为键,以该列的 NaN 值作为值。

import numpy as np
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame({
    "Roll No": [501, 502, np.nan, 504, 505, 506],
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Income(in $)': [200, 400, np.nan, 300, np.nan, np.nan],
    'Age': [17, 18, np.nan, 16, 18, np.nan]
})
filled_df = student_df.fillna({'Age': 17, 'Income(in $)': 300})

print("DataFrame with NaN values")
print(student_df, "\n")

print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")

输出:

DataFrame with NaN values
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob           NaN   NaN
3    504.0      Emma         300.0  16.0
4    505.0      Luna           NaN  18.0
5    506.0     Anish           NaN   NaN 

After applying fillna() to the DataFrame:
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob         300.0  17.0
3    504.0      Emma         300.0  16.0
4    505.0      Luna         300.0  18.0
5    506.0     Anish         300.0  17.0 

它将 Age 列中的所有 NaN 值填充为 17,将 Income(in $) 列中的所有 NaN 值填充为 300。Roll No 栏中的 NaN 值保持不变。

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

相关文章 - Pandas NaN