Pandas DataFrame DataFrame.sample() 函式

Minahil Noor 2023年1月30日

Pandas Pandas DataFrame

pandas.DataFrame.sample() 語法
示例程式碼：DataFrame.sample()
示例程式碼：DataFrame.sample() 提取列
示例程式碼：DataFrame.sample() 生成資料的一部分
示例程式碼：DataFrame.sample() 對 DataFrame 進行過取樣
示例程式碼：DataFrame.sample() 和 weights

Python Pandas DataFrame.sample() 函式從一個 DataFrame 中隨機生成一行或一列的樣本。樣本可以包含多行或多列。

`pandas.DataFrame.sample()` 語法

DataFrame.sample(
    n=None, frac=None, replace=False, weights=None, random_state=None, axis=None
)

引數


`n`	它是一個整數值。它代表要從 `DataFrame` 中選擇的行或列的隨機數
`frac`	它是一個浮點數值。它指定了要從 `DataFrame` 中提取的隨機行或列的百分比。例如，`frac=0.45` 意味著選擇的隨機行或列將是原始資料的 45%
`replace`	它是一個布林值。如果它被設定為 `True`，那麼它將返回替換資料的樣本
`weights`	它是一個字串或一個 N 維的陣列結構。如果在 `DataFrame` 上呼叫它，那麼當軸為 0 時，它接受一列的名稱，權重列中數值較大的行更有可能作為樣本資料返回
`random_state`	它是一個整數或 `numpy.random.RandomState` 函式。如果它是一個整數，那麼它在每次迭代中返回相同數量的行或列。否則，它返回一個 `numpy RandomState` 物件
`axis`	它是一個整數或字串。它告訴目標軸的行或列。它可以是 0 或 `index`，1 或 `columns`

返回值

它返回一個 Series 或 DataFrame。返回的 Series 或 DataFrame 是一個呼叫器，包含從原始 DataFrame 中隨機選擇的 n 個元素。

示例程式碼：`DataFrame.sample()`

預設情況下，函式返回一個包含行的樣本，即 axis=1。

import pandas as pd

dataframe=pd.DataFrame({'Attendance': {0: 60, 1: 100, 2: 80,3: 75, 4: 95},
                    'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
                    'Obtained Marks': {0: 56, 1: 75, 2: 82, 3: 64, 4: 67}})
print(dataframe)

我們的 DataFrame 為，

   Attendance    Name  Obtained Marks
0          60  Olivia              56
1         100    John              75
2          80   Laura              82
3          75     Ben              64
4          95   Kevin              67

這個函式的所有引數都是可選的。如果我們在執行這個函式時沒有傳遞任何引數，它將返回一個隨機的行作為輸出。

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample()
print(dataframe1)

輸出 1：

   Attendance Name  Obtained Marks
3          75  Ben              64

輸出 2：

   Attendance   Name  Obtained Marks
4          95  Kevin              67

輸出 1 和輸出 2 顯示了同一個程式的兩次執行情況。每次這個函式都會從給定的 DataFrame 中產生一個隨機的行樣本。

示例程式碼：`DataFrame.sample()` 提取列

要在樣本中生成列，我們將簡單地把我們的軸改為 1。

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(n=1, axis=1)
print(dataframe1)

輸出：

     Name
0  Olivia
1    John
2   Laura
3     Ben
4   Kevin

該函式已經生成了一個單一列的樣本作為輸出。列的數量由引數 n=1 設定。

示例程式碼：`DataFrame.sample()` 生成資料的一部分

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(frac=0.5)
print(dataframe1)

輸出：

   Attendance   Name  Obtained Marks
3          75    Ben              64
4          95  Kevin              67
1         100   John              75

返回的樣本是原始資料的 50%。

示例程式碼：`DataFrame.sample()` 對 DataFrame 進行過取樣

如果 frac>1，那麼引數 replace 應該是 True，以允許同一行可以被多次取樣，否則，它將引發 ValueError。

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(frac=1.5, replace=True)
print(dataframe1)

輸出：

   Attendance   Name  Obtained Marks
3          75     Ben              64
0          60  Olivia              56
1         100    John              75
2          80   Laura              82
1         100    John              75
2          80   Laura              82
0          60  Olivia              56
4          95   Kevin              67

如果 replace 被設定為 False，同時 frac 大於 1，則會產生 ValueError。

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(frac=1.5, replace=False)
print(dataframe1)

輸出：

Traceback (most recent call last):
  File "..\test.py", line 6, in <module>
    dataframe1 = dataframe.sample(frac=1.5, replace=False)
  File "..\lib\site-packages\pandas\core\generic.py", line 5044, in sample
    raise ValueError(
ValueError: Replace has to be set to `True` when upsampling the population `frac` > 1.

示例程式碼：`DataFrame.sample()` 和 `weights`

import pandas as pd

dataframe = pd.DataFrame(
    {
        "Attendance": {0: 60, 1: 100, 2: 80, 3: 75, 4: 95},
        "Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
        "Obtained Marks": {0: 56, 1: 75, 2: 82, 3: 64, 4: 67},
    }
)
dataframe1 = dataframe.sample(n=2, weights="Attendance")
print(dataframe1)

輸出：

   Attendance   Name  Obtained Marks
1         100   John              75
4          95  Kevin              67

這裡，在返回的樣本中選擇 Attendance 列中數值較大的行。

pandas.DataFrame.sample() 語法

引數

返回值

示例程式碼：DataFrame.sample()

示例程式碼：DataFrame.sample() 提取列

示例程式碼：DataFrame.sample() 生成資料的一部分

示例程式碼：DataFrame.sample() 對 DataFrame 進行過取樣

示例程式碼：DataFrame.sample() 和 weights

相關文章 - Pandas DataFrame