有效利用Pandas套件的pipe方法打造資料處理流程管道

想要在機器學習或資料分析的過程中順利實作，資料集的前置處理就非常重要，像是資料清理或篩選等，而這些動作通常都是一個有順序性的流程，雖然能透過一次一次的呼叫Pandas套件方法(Method)來達成，但是如果想要讓程式碼可讀性高，以及自動化重複性的資料處理流程，就需要建置資料處理管道(Pipeline)。

而Pandas套件也內建了pipe()方法(Method)，能夠透過鏈接(Chain)多個自訂函式(Function)，實現資料處理流程管道(Pipeline)，本文將和大家分享其中的應用方式。

Q:Pandas如何檢視資料集?

首先，利用Pandas 套件的read_csv()方法(Method)讀取Kaggle網站的「電子商務運輸資料集」( https://www.kaggle.com/prachi13/customer-analytics )，如下範例：

import pandas as pd

df = pd.read_csv('Train.csv')

print(df)

Q:Pandas如何自訂函式(Function)?

假設本文想要分析每個商品重要性等級的運輸方式評價，這時候為了後續程式碼的重用性，就可以分別建立兩個自訂函式(Function)，分別為篩選商品重要性(Product_importance)欄位，如下範例：

import pandas as pd

def filt_product_importance(dataframe, level):

filt = (dataframe['Product_importance'] == level)

return dataframe.loc[filt]

df = pd.read_csv('Train.csv')

與群組運輸方式(Mode_of_Shipment)欄位，並且計算客戶評價(Customer_rating)欄位的平均值，如下範例：

import pandas as pd

def filt_product_importance(dataframe, level):

filt = (dataframe['Product_importance'] == level)

return dataframe.loc[filt]

def shipment_rating(dataframe):

dataframe = dataframe.groupby('Mode_of_Shipment')['Customer_rating'].mean()

return dataframe

df = pd.read_csv('Train.csv')

Q:Pandas pipe()方法(Method)如何使用?

接下來，就可以透過Pandas套件的pipe()方法(Method)，鏈接這兩個自訂函式(Function)，形成一個資料處理流程管道(Pipeline)，如下範例：

import pandas as pd

def filt_product_importance(dataframe, level):

filt = (dataframe['Product_importance'] == level)

return dataframe.loc[filt]

def shipment_rating(dataframe):

dataframe = dataframe.groupby('Mode_of_Shipment')['Customer_rating'].mean()

return dataframe

df = pd.read_csv('Train.csv')

pipeline = df.pipe(filt_product_importance, 'high').pipe(shipment_rating)

print(pipeline)

執行結果就是篩選出高重要性商品資料後，計算各運輸方式的平均評價。

利用Pandas套件的pipe()方法(Method)鏈接特性，即可快速的建立一個資料處理流程管道(Pipeline)，除了有很好的可讀性，能夠一目了然知道資料的處理順序外，未來也可以輕鬆的加以擴充與自動化，提升資料分析的前置處理效率。

如果想要學習更多的Python應用教學，歡迎前往Learn Code With Mike( https://www.learncodewithmike.com/2021/06/pandas-pipe-method.html

)網站觀看更多精彩內容。

直播限定優惠

【真人直播】零基礎Python數據分析與即時看板實作

14746 14

NT$ 4,380

Mike的Python學院

Python、data pipeline、Pandas、資料分析

有效利用Pandas套件的pipe方法打造資料處理流程管道