Pandas 填充 NaN 值

Suraj Joshi 2023年1月30日 2021年1月22日
  1. DataFrame.fillna() 方法
  2. 使用 DataFrame.fillna() 方法用指定的值填充整個 DataFrame
  3. 用指定的值填充指定列的 NaN
Pandas 填充 NaN 值

本教程解釋了我們如何使用 DataFrame.fillna() 方法用指定的值填充 NaN 值。

我們將在本文中使用下面的 DataFrame。

import numpy as np
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame({
    "Roll No": [501, 502, np.nan, 504, 505, 506],
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Income(in $)': [200, 400, np.nan, 30, np.nan, np.nan],
    'Age': [17, 18, np.nan, 16, 18, np.nan]
})

print(student_df)

輸出:

   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob           NaN   NaN
3    504.0      Emma          30.0  16.0
4    505.0      Luna           NaN  18.0
5    506.0     Anish           NaN   NaN

DataFrame.fillna() 方法

語法

DataFrame.fillna(value=None, 
                 method=None, 
                 axis=None, 
                 inplace=False, 
                 limit=None, 
                 downcast=None)

DataFrame.fillna() 方法使我們能夠用指定的值或方法來填充 DataFrame 中的 NaN 值。

使用 DataFrame.fillna() 方法用指定的值填充整個 DataFrame

import numpy as np
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame({
    "Roll No": [501, 502, np.nan, 504, 505, 506],
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Income(in $)': [200, 400, np.nan, 30, np.nan, np.nan],
    'Age': [17, 18, np.nan, 16, 18, np.nan]
})
filled_df = student_df.fillna(0)

print("DataFrame with NaN values")
print(student_df, "\n")

print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")

輸出:

DataFrame with NaN values
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob           NaN   NaN
3    504.0      Emma          30.0  16.0
4    505.0      Luna           NaN  18.0
5    506.0     Anish           NaN   NaN 

After applying fillna() to the DataFrame:
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      0.0       Bob           0.0   0.0
3    504.0      Emma          30.0  16.0
4    505.0      Luna           0.0  18.0
5    506.0     Anish           0.0   0.0 

它將 DataFrame student_df 中的所有 NaN 值替換為 0,該值作為引數傳遞給 DataFrame.fillna() 方法。

import numpy as np
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame({
    "Roll No": [501, 502, np.nan, 504, 505, 506],
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Income(in $)': [200, 400, np.nan, 30, np.nan, np.nan],
    'Age': [17, 18, np.nan, 16, 18, np.nan]
})
filled_df = student_df.fillna(method='ffill')

print("DataFrame with NaN values")
print(student_df, "\n")

print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")

輸出:

DataFrame with NaN values
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob           NaN   NaN
3    504.0      Emma          30.0  16.0
4    505.0      Luna           NaN  18.0
5    506.0     Anish           NaN   NaN 

After applying fillna() to the DataFrame:
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2    502.0       Bob         400.0  18.0
3    504.0      Emma          30.0  16.0
4    505.0      Luna          30.0  18.0
5    506.0     Anish          30.0  18.0 

它將所有 student_df 中的 NaN 值填入與 NaN 值相同列的 NaN 值之前的值。

用指定的值填充指定列的 NaN

為了用指定的值來填充特定的值,我們向 fillna() 方法傳遞一個字典,以列名作為鍵,以該列的 NaN 值作為值。

import numpy as np
import pandas as pd

roll_no = [501, 502, 503, 504, 505]

student_df = pd.DataFrame({
    "Roll No": [501, 502, np.nan, 504, 505, 506],
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Income(in $)': [200, 400, np.nan, 300, np.nan, np.nan],
    'Age': [17, 18, np.nan, 16, 18, np.nan]
})
filled_df = student_df.fillna({'Age': 17, 'Income(in $)': 300})

print("DataFrame with NaN values")
print(student_df, "\n")

print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")

輸出:

DataFrame with NaN values
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob           NaN   NaN
3    504.0      Emma         300.0  16.0
4    505.0      Luna           NaN  18.0
5    506.0     Anish           NaN   NaN 

After applying fillna() to the DataFrame:
   Roll No      Name  Income(in $)   Age
0    501.0  Jennifer         200.0  17.0
1    502.0    Travis         400.0  18.0
2      NaN       Bob         300.0  17.0
3    504.0      Emma         300.0  16.0
4    505.0      Luna         300.0  18.0
5    506.0     Anish         300.0  17.0 

它將 Age 列中的所有 NaN 值填充為 17,將 Income(in $) 列中的所有 NaN 值填充為 300。Roll No 欄中的 NaN 值保持不變。

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

相關文章 - Pandas NaN