104學習

Mike的Python學院

關注

Mike Ku

Learn Code With Mike品牌創辦人

2021/12/30

教你Scrapy框架匯出CSV檔案方法提升資料處理效率

Q:Scrapy CsvItemExporter(CSV資料模型匯出器)如何使用?

開啟「資料模型管道檔案(pipeline.py)」，由於本文想要將爬取到的資料匯出到CSV檔案中，所以就需要引用CsvItemExporter(CSV資料模型匯出器)，如下範例：

from itemadapter import ItemAdapter

from scrapy.exporters import CsvItemExporter

接下來，新增一個CsvPipeline類別(Class)，用來定義Scrapy網頁爬蟲取得的資料匯出到CSV檔案的流程，這個類別名稱可以自行命名，如下範例：

from itemadapter import ItemAdapter

from scrapy.exporters import CsvItemExporter

class CsvPipeline:

類別(Class)命名完成後，新增一個建構式(Constructor)，用來定義初始化的動作，如下範例：

class CsvPipeline:

def __init__(self):

self.file = open('posts.csv', 'wb')

self.exporter = CsvItemExporter(self.file, encoding='big5')

self.exporter.start_exporting()

以上的初始化動作包含了：

1.建立或打開CSV檔案，設定寫入二進位碼模式(wb, write binary)。

2.建立Scrapy框架的CsvItemExporter(CSV資料模型匯出器)物件，傳入檔案物件及編碼方式，預設為utf-8，如果讀者在匯出CSV檔案後，想要使用Microsoft Excel軟體開啟的話，就需要設定為big5，否則會出現亂碼。

3.呼叫start_exporting()方法(Method)開始進行檔案匯出的動作。

接下來，資料處理的部分，就需要實作Scrapy框架內建的process_item()方法(Method)，在其中把資料模型(items)所裝載的資料，透過export_item()方法(Method)傳入CsvItemExporter(CSV資料模型匯出器)，如下範例：

def process_item(self, item, spider):

self.exporter.export_item(item)

return item

將Scrapy網頁爬蟲取得的資料匯出到自訂的CSV檔案後，結束時所要進行的動作，就要實作內建的close_spider()方法(Method)，如下範例：

def close_spider(self, spider):

self.exporter.finish_exporting()

self.file.close()

以上範例也就是在Scrapy網頁爬蟲結束時，呼叫finish_exporting()方法(Method)完成檔案匯出，並且將檔案物件關閉，釋放資源。

匯出CSV檔案的資料模型管道(pipeline)完成後，別忘了在settings.py檔案，將此資料模型管道(pipeline)加入到ITEM_PIPELINES設定中，如下範例：

ITEM_PIPELINES = {

'news_scraper.pipelines.CsvPipeline': 500,

}

最後，利用以下指令來執行Scrapy網頁爬蟲：

$ scrapy crawl inside

執行後會在Scrapy網頁爬蟲專案中，看到posts.csv檔案，利用Microsoft Excel軟體開啟即可。

如果想要學習更多的Python應用教學，歡迎前往Learn Code With Mike(https://www.learncodewithmike.com/2021/01/scrapy-export-csv-files.html

)網站觀看更多精彩內容。

learncodewithmike.com

[Scrapy教學7]教你Scrapy框架匯出CSV檔案方法提升資料處理效率

2 0 979 2

拍手

留言

分享到：

Line

Facebook

複製連結

取消