Member-only story

#60 Web crawling (part 4): A fun project with Scrapy

Hang Nguyen
2 min readJun 28, 2022

--

Website to crawl (non-unicode website).

Tool: VS Code, Scrapy, Anaconda.Navigator

(Simple web crawler using Scrapy would be enough, no need to crawl API or Selenium)

1. Create a default spider

· Go to Anaconda.Navogator -> create a new virtual environment and click “open terminal”

· Inside terminal, use startproject and genspider commands to create a default spider

· Open the spider folder in VS Code. (Python Interpreter should be set to same virtual environment that has just been created above)

2. Create a spider to crawl time and new cases

  • One needs to inspect the website structure by clicking F12 and start inspecting all the timelines. One should notice that there are 10 timelines for 1 page and to direct to a new page then one has to click “Tiep theo” button, which is located near the end of the page.
  • Ideas to generate a spider: Use recurssion to sequentially go to the the last page by clicking “Tiep theo” button. Before any recurssion happens, time and new cases should be extracted first -> go to next page -> same thing happens (crawl time and new cases) -> repeat until the last page -> stop.

--

--

Hang Nguyen
Hang Nguyen

Written by Hang Nguyen

Just sharing (data) knowledge

No responses yet