Member-only story
#59 Web scraping (part 3): Splash
3 min readJun 23, 2022
Brief introduction
Splash is a headless browser that executes JavaScript for people crawling websites. It is open source and fully integrated with Scrapy and Portia.
Some of its features include:
- process multiple webpages in parallel;
- get HTML results and/or take screenshots;
- turn OFF images or use Adblock Plus rules to make rendering faster;
- execute custom JavaScript in page context;
- write Lua browsing scripts;
- develop Splash Lua scripts in Splash-Jupyter Notebooks.
- get detailed rendering info in HAR format.
Installation
Make sure that Docker Desktop is available on your local machine.
Please refer to this documentation for more information.
A small and fun practical project
REQUIREMENT: Make sure you have Anaconda.Navigator and Visual Studio Code downloaded to your local machine.
BIG STEP: Create a new virtual environment in Anaconda.Navigator and then open terminal. Create a new project directory, inside this project then create a new scrapy project named “ABC” by startproject and…