Member-only story

#59 Web scraping (part 3): Splash

3 min readJun 23, 2022

Brief introduction

Splash is a headless browser that executes JavaScript for people crawling websites. It is open source and fully integrated with Scrapy and Portia.

Some of its features include:

process multiple webpages in parallel;
get HTML results and/or take screenshots;
turn OFF images or use Adblock Plus rules to make rendering faster;
execute custom JavaScript in page context;
write Lua browsing scripts;
develop Splash Lua scripts in Splash-Jupyter Notebooks.
get detailed rendering info in HAR format.

Installation

Make sure that Docker Desktop is available on your local machine.

Please refer to this documentation for more information.

A small and fun practical project

REQUIREMENT: Make sure you have Anaconda.Navigator and Visual Studio Code downloaded to your local machine.

BIG STEP: Create a new virtual environment in Anaconda.Navigator and then open terminal. Create a new project directory, inside this project then create a new scrapy project named “ABC” by startproject and…

#59 Web scraping (part 3): Splash

Brief introduction

Installation

A small and fun practical project

Written by Hang Nguyen

No responses yet