將 NumPy 陣列轉換為 Pandas DataFrame

Suraj Joshi 2021年1月22日

Pandas Pandas DataFrame

本教程介紹瞭如何使用 pandas.DataFrame() 方法將 numpy 陣列轉換為 Pandas DataFrame。

我們將 numpy 陣列傳遞到 pandas.DataFrame() 方法中，從 NumPy 陣列生成 Pandas DataFrame。我們還可以為 DataFrame 指定列名和行索引。

使用 `pandas.DataFrame()` 方法將 NumPy 陣列轉換為 Pandas DataFrame

我們將 NumPy 陣列傳遞到 pandas.DataFrame() 方法中，從 NumPy 陣列中生成 DataFrame。

from numpy import random
import pandas as pd

random.seed(5)
random.randint(100, size=(3, 5))
data_array = random.randint(100, size=(4, 3))

print("NumPy Data Array is:")
print(data_array)

print("")

data_df = pd.DataFrame(data_array)
print("The DataFrame generated from the NumPy array is:")
print(data_df)

輸出：

NumPy Data Array is:
[[27 44 77]
 [75 65 47]
 [30 84 86]
 [18  9 41]]

The DataFrame generated from the NumPy array is:
    0   1   2
0  27  44  77
1  75  65  47
2  30  84  86
3  18   9  41

它首先建立一個大小為 (4,3) 的隨機陣列，有 4 行 3 列。然後我們將陣列作為引數傳遞給 pandas.DataFrame() 方法，該方法從陣列中生成名為 data_df 的 DataFrame。預設情況下，pandas.DataFrame() 方法會插入預設的列名和行索引。

我們也可以通過 pandas.DataFrame() 方法的 index 和 columns 引數來設定列名和行索引。

from numpy import random
import pandas as pd

random.seed(5)
random.randint(100, size=(3, 5))
data_array = random.randint(100, size=(4, 3))
row_indices = ["Row_1", "Row_2", "Row_3", "Row_4"]
column_names = ["Column_1", "Column_2", "Column_3"]

print("NumPy Data Array is:")
print(data_array)

print("")

data_df = pd.DataFrame(data_array, index=row_indices, columns=column_names)
print("The DataFrame generated from the NumPy array is:")
print(data_df)

輸出：

NumPy Data Array is:
[[27 44 77]
 [75 65 47]
 [30 84 86]
 [18  9 41]]

The DataFrame generated from the NumPy array is:
       Column_1  Column_2  Column_3
Row_1        27        44        77
Row_2        75        65        47
Row_3        30        84        86
Row_4        18         9        41

在這裡，我們將 index 的值設定為 row_indices，這是包含每行索引的列表。同樣，我們通過將 columns 的值設定為 column_names 列表來分配列名，這個列表包含了每一列的名稱。

在某些情況下，NumPy 陣列本身可能包含行索引和列名。然後我們使用陣列切片從陣列中提取資料、行索引和列名。

import numpy as np
import pandas as pd

marks_array = np.array(
    [["", "Mathematics", "Economics"], ["Sunny", 25, 23], ["Alice", 23, 24]]
)

print("NumPy Data Array is:")
print(marks_array)

print("")

row_indices = marks_array[1:, 0]
column_names = marks_array[0, 1:]
data_df = pd.DataFrame(
    data=np.int_(marks_array[1:, 1:]), index=row_indices, columns=column_names
)

print("The DataFrame generated from the NumPy array is:")
print(data_df)

輸出：

NumPy Data Array is:
[['' 'Mathematics' 'Economics']
 ['Sunny' '25' '23']
 ['Alice' '23' '24']]

The DataFrame generated from the NumPy array is:
       Mathematics  Economics
Sunny           25         23
Alice           23         24

我們在 NumPy 陣列中得到了行索引和列名。我們選擇第一行和第一列之後的所有值，並將其作為 data 引數提供給 pandas.DataFrame() 函式，同時選擇第二行的所有第一列值，並將其作為 index 引數傳遞。同理，我們從第二列中選取所有第一行的值，並將其作為 columns 引數傳遞，設定列名。

numpy.array() 在製作 NumPy 陣列的同時，將整數值轉換為字串值，以保證陣列的資料格式相同。我們使用 numpy.int_() 函式將資料值轉換回整數型別。

作者： Suraj Joshi

Suraj Joshi is a backend software engineer at Matrice.ai.

使用 pandas.DataFrame() 方法將 NumPy 陣列轉換為 Pandas DataFrame

相關文章 - Pandas DataFrame

使用 `pandas.DataFrame()` 方法將 NumPy 陣列轉換為 Pandas DataFrame