Member-only story

#60 Web crawling (part 4): A fun project with Scrapy

2 min readJun 28, 2022

Website to crawl (non-unicode website).

Tool: VS Code, Scrapy, Anaconda.Navigator

(Simple web crawler using Scrapy would be enough, no need to crawl API or Selenium)

1. Create a default spider

· Go to Anaconda.Navogator -> create a new virtual environment and click “open terminal”

· Inside terminal, use startproject and genspider commands to create a default spider

· Open the spider folder in VS Code. (Python Interpreter should be set to same virtual environment that has just been created above)

One needs to inspect the website structure by clicking F12 and start inspecting all the timelines. One should notice that there are 10 timelines for 1 page and to direct to a new page then one has to click “Tiep theo” button, which is located near the end of the page.
Ideas to generate a spider: Use recurssion to sequentially go to the the last page by clicking “Tiep theo” button. Before any recurssion happens, time and new cases should be extracted first -> go to next page -> same thing happens (crawl time and new cases) -> repeat until the last page -> stop.