はじめに

Beautiful Soup使い方についてまとめる記事です。

概要

Pythonのスクレイピングで使用されるライブラリは以下が主に以下があります。

BeautifulSoup
Requests
Selenium

役割や長所・短所は以下のようになります。

	データの取得	データの抽出	特徴
Beautiful Soup		〇	データの抽出が高速かつ簡潔だが、動的ページのスクレイピングが弱い
Requests	〇
Selenium	〇	〇	動的ページのスクレイピングに強いが、データの抽出が低速かつ複雑

BeautifulSoupライブラリを用いてスクレイピング

BeautifulSoupライブラリを用いて、スクレイピングする場合は、以下の2つのライブラリのインストールが必要です。

BeautifulSoupライブラリのインストール

以下のどちらかの方法でインストール可能です。

pip installコマンドを使用
- pip install beautifulsoup4
conda installコマンドを使用
- conda install beautifulsoup4

Requestsライブラリのインストール

pip installコマンドを使用
- pip install requests
conda installコマンドを使用
- conda install requests

株価の取得

以下のサイトhttp://www.nikkei.com/markets/kabu/から日経平均株価を取得します。

タブの文字列はタグ名.textあるいはタグ名.stringどちらか使用できます。詳しくは以下のサイトを見てください。

BeautifulSoupでstringとtextの挙動の明確な違い – Python
- https://lets-hack.tech/programming/languages/python/bs4-text-or-string/

import urllib.request
from bs4 import BeautifulSoup

url =  "http://www.nikkei.com/markets/kabu/"

html = urllib.request.urlopen(url)

soup = BeautifulSoup(html,"html.parser")

span = soup.find(class_="mkc-stock_prices")

print(span)
print(f"span.text = {span.text}")
print(f"span.string = {span.string }")

<span class="mkc-stock_prices">23,386.74</span>
span.string = 23,386.74
span.text = 23,386.74

参考リンク

まず、以下のページがとてもわかりやすいです。

図解！Python BeautifulSoupの使い方を徹底解説！(select、find、find_all、インストール、スクレイピングなど)
- https://ai-inter1.com/beautifulsoup_1/

nprogram’s blog

気ままに、プログラミングのトピックについて書いていきます

Beautiful Soupを用いたスクレイピングまとめ