[DATA ANALYSIS] Basic data analysis combining Python, SPSS and PowerBI.

Hang Nguyen
3 min readApr 18, 2022

--

Last week I had to do a presentation on a data analysis project and allow me to share with you the procedure in this post. For the slides, please check this link.

Research goal:

Data Collection

Kaggle everybody, highly recommended! The link to data set is here.

Data Extraction

I performed a bit of data cleaning in Python, the tool that I feel at ease. And due to the fact that we only have a small data set, no need for big data tools here :) As a matter of fact, why I chose these variables was due to research questions that have been mentioned above.

I performed this process in 6 different steps:

  1. Find all unique values in Salary Estimate column and assign each one with a number or assign them into min-max range if there are too many unique values, -1 as Nan (luckily no nan for this column).
Original values
Into 3 smaller columns as such

2. Clean the company size column: Assign each category into a number, merge -1 with Unknown.

3. Clean Industry column: assign each unique industry with a number and put these numbers into a new column named “Indsutry_new"

4. Location column cleaning: Separate this column into 2 smaller columns: 1 with state name only named “state", and 1 with state code named “state_new" :

5. Drop all unnecessary columns

6. Write to sav and csv file

Hypothesis testing

4 questions being asked and each one of them are performed with different tests in SPSS. Before performing any test, normality should be checked for variables in question and some with correlation test, the linearity test should be checked first. Further details on checking these metrics can be found from my previous posts.

Data Vizualization

This part was performed by using PowerBI with csv file.

Warning: Should be average, not mean

Data insight Presentation

Using Canva.

Walaa, there you go for a whole data analysis process maximizing all tools with their own purpose!

--

--

Hang Nguyen
Hang Nguyen

Written by Hang Nguyen

Just sharing (data) knowledge

No responses yet