Pandas 填充 NaN 值
Suraj Joshi
2023年1月30日
2021年1月22日
本教程解釋了我們如何使用 DataFrame.fillna()
方法用指定的值填充 NaN 值。
我們將在本文中使用下面的 DataFrame。
import numpy as np
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
"Roll No": [501, 502, np.nan, 504, 505, 506],
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Income(in $)': [200, 400, np.nan, 30, np.nan, np.nan],
'Age': [17, 18, np.nan, 16, 18, np.nan]
})
print(student_df)
輸出:
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 NaN Bob NaN NaN
3 504.0 Emma 30.0 16.0
4 505.0 Luna NaN 18.0
5 506.0 Anish NaN NaN
DataFrame.fillna()
方法
語法
DataFrame.fillna(value=None,
method=None,
axis=None,
inplace=False,
limit=None,
downcast=None)
DataFrame.fillna()
方法使我們能夠用指定的值或方法來填充 DataFrame
中的 NaN
值。
使用 DataFrame.fillna()
方法用指定的值填充整個 DataFrame
import numpy as np
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
"Roll No": [501, 502, np.nan, 504, 505, 506],
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Income(in $)': [200, 400, np.nan, 30, np.nan, np.nan],
'Age': [17, 18, np.nan, 16, 18, np.nan]
})
filled_df = student_df.fillna(0)
print("DataFrame with NaN values")
print(student_df, "\n")
print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")
輸出:
DataFrame with NaN values
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 NaN Bob NaN NaN
3 504.0 Emma 30.0 16.0
4 505.0 Luna NaN 18.0
5 506.0 Anish NaN NaN
After applying fillna() to the DataFrame:
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 0.0 Bob 0.0 0.0
3 504.0 Emma 30.0 16.0
4 505.0 Luna 0.0 18.0
5 506.0 Anish 0.0 0.0
它將 DataFrame student_df
中的所有 NaN
值替換為 0
,該值作為引數傳遞給 DataFrame.fillna()
方法。
import numpy as np
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
"Roll No": [501, 502, np.nan, 504, 505, 506],
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Income(in $)': [200, 400, np.nan, 30, np.nan, np.nan],
'Age': [17, 18, np.nan, 16, 18, np.nan]
})
filled_df = student_df.fillna(method='ffill')
print("DataFrame with NaN values")
print(student_df, "\n")
print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")
輸出:
DataFrame with NaN values
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 NaN Bob NaN NaN
3 504.0 Emma 30.0 16.0
4 505.0 Luna NaN 18.0
5 506.0 Anish NaN NaN
After applying fillna() to the DataFrame:
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 502.0 Bob 400.0 18.0
3 504.0 Emma 30.0 16.0
4 505.0 Luna 30.0 18.0
5 506.0 Anish 30.0 18.0
它將所有 student_df
中的 NaN
值填入與 NaN
值相同列的 NaN
值之前的值。
用指定的值填充指定列的 NaN
值
為了用指定的值來填充特定的值,我們向 fillna()
方法傳遞一個字典,以列名作為鍵,以該列的 NaN
值作為值。
import numpy as np
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
"Roll No": [501, 502, np.nan, 504, 505, 506],
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Income(in $)': [200, 400, np.nan, 300, np.nan, np.nan],
'Age': [17, 18, np.nan, 16, 18, np.nan]
})
filled_df = student_df.fillna({'Age': 17, 'Income(in $)': 300})
print("DataFrame with NaN values")
print(student_df, "\n")
print("After applying fillna() to the DataFrame:")
print(filled_df, "\n")
輸出:
DataFrame with NaN values
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 NaN Bob NaN NaN
3 504.0 Emma 300.0 16.0
4 505.0 Luna NaN 18.0
5 506.0 Anish NaN NaN
After applying fillna() to the DataFrame:
Roll No Name Income(in $) Age
0 501.0 Jennifer 200.0 17.0
1 502.0 Travis 400.0 18.0
2 NaN Bob 300.0 17.0
3 504.0 Emma 300.0 16.0
4 505.0 Luna 300.0 18.0
5 506.0 Anish 300.0 17.0
它將 Age
列中的所有 NaN
值填充為 17,將 Income(in $)
列中的所有 NaN
值填充為 300。Roll No
欄中的 NaN
值保持不變。
Author: Suraj Joshi
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn