Mike的Python學院

Pandas、Python、資料分析

Mike的Python學院

關注

Mike Ku

Learn Code With Mike品牌創辦人

2022/01/20

3個Pandas套件比較CSV檔案資料之間的差異秘訣(下)

本文以Kaggle網站的「Coursera Course Dataset(https://www.kaggle.com/siddharthm1698/coursera-course-dataset

)」及「Course Reviews on Coursera(https://www.kaggle.com/imuhammad/course-reviews-on-coursera

)」兩個資料集為例，來分享Pandas套件比較CSV檔案資料集的常用方法。

Q: Pandas查找兩個資料集之間的相異資料

相反的，如果想要知道兩個資料集不相同的資料，就分為「左邊(Left)有而右邊(Right)沒有」及「左邊(Left)沒有而右邊(Right)有」的情況。

同樣使用Pandas套件的merge()方法(Method)，設定how關鍵字參數(Keyword Argument)為「outer」，即可查找出兩個資料集的相異資料，如下範例：

import pandas as pd

df1 = pd.read_csv('coursea_data.csv').rename(columns={'course_title':'name'})

df2 = pd.read_csv('Coursera_courses.csv')

result = df1.merge(df2, how='outer', indicator=True)

print(result)

其中，indicator=True，就是在執行結果的最後一欄(_merge)，標示出兩個資料集的比較結果，包含：

1. left_only：左邊資料集有而右邊資料集沒有的資料

2. right_only：左邊資料集沒有而右邊資料集有的資料

3. both：兩邊資料集皆擁有的資料

所以，想要查找出「左邊(Left)有而右邊(Right)沒有」的資料，利用Pandas套件的loc[]語法，篩選出「_merge」欄位為left_only即可，如下範例：

import pandas as pd

df1 = pd.read_csv('coursea_data.csv').rename(columns={'course_title':'name'})

df2 = pd.read_csv('Coursera_courses.csv')

result = df1.merge(df2, how='outer', indicator=True).loc[lambda x : x['_merge'] == 'left_only']

print(result)

而「左邊(Left)沒有而右邊(Right)有」的資料，則篩選出「_merge」欄位為right_only，如下範例：

import pandas as pd

df1 = pd.read_csv('coursea_data.csv').rename(columns={'course_title':'name'})

df2 = pd.read_csv('Coursera_courses.csv')

result = df1.merge(df2, how='outer', indicator=True).loc[lambda x : x['_merge'] == 'right_only']

print(result)

如果想要學習更多的Python應用教學，歡迎前往Learn Code With Mike(https://www.learncodewithmike.com/2021/10/pandas-compare-values-between-dataframes.html

)網站觀看更多精彩內容。

learncodewithmike.com

[Pandas教學]3個Pandas套件比較CSV檔案資料之間的差異秘訣

1 0 190 0

拍手

留言

分享到：

Line

Facebook

複製連結

取消

本篇內容來自以下教室

Mike的Python學院

Mike與104學習精靈合作，在這邊分享Learn Code With Mike網站上部分的Python「入門教學、爬蟲應用、資料分析與網頁開發」等主題的教學文章，而這也是Learn Code With Mike品牌的初衷，以簡單易懂的實作幫助大家學習Python程式語言。

推薦學習

【真人直播】零基礎Python數據分析與即時看板實作

14274 14

NT$ 4,380