Member-only story
#60 Web crawling (part 4): A fun project with Scrapy
2 min readJun 28, 2022
Website to crawl (non-unicode website).
Tool: VS Code, Scrapy, Anaconda.Navigator
(Simple web crawler using Scrapy would be enough, no need to crawl API or Selenium)
1. Create a default spider
· Go to Anaconda.Navogator -> create a new virtual environment and click “open terminal”
· Inside terminal, use startproject and genspider commands to create a default spider
· Open the spider folder in VS Code. (Python Interpreter should be set to same virtual environment that has just been created above)
2. Create a spider to crawl time and new cases
- One needs to inspect the website structure by clicking F12 and start inspecting all the timelines. One should notice that there are 10 timelines for 1 page and to direct to a new page then one has to click “Tiep theo” button, which is located near the end of the page.
- Ideas to generate a spider: Use recurssion to sequentially go to the the last page by clicking “Tiep theo” button. Before any recurssion happens, time and new cases should be extracted first -> go to next page -> same thing happens (crawl time and new cases) -> repeat until the last page -> stop.