The dataset used in this project was obtained from "https://www.kaggle.com/datasets/azim069/goodreads-all-time-greatest-books-8k-data?_ga=2.11631028.1837326818.1678024239-443706589.1675095538" and analyzed according to the interests of the analyst.

This is the raw data that contain with 9 columns and 7806 rows. This dataset is showing about the reader’s satisfactions to each book recording from goodreads website.

This is the raw data that contain with 9 columns and 7806 rows. This dataset is showing about the reader’s satisfactions to each book recording from goodreads website.

<aside> 💡 Project Objective:


Goodreads 1.csv

In this CSV is contained with dataset that used in this project

In this CSV is contained with dataset that used in this project


What we need to prepare

First, it needs to be rearrange the star to locate in the same column and create one more column to specify star number as easier to analyze and visualize the data.

Untitled

Then create new spreadsheet and moved “Rating, Reviews and Average_star” because these data detail will not match to the new template above,

Untitled

Last, I decided to create one more column to show the rank of book in 3 types consist of “Excellent = average star ≥ 4.50”, “Good = average star 4.00-4.49” and “Under rated = average star < 4.00”

<aside> 💡 Classify type of rank by using =IF(avg.star≥value,”STRING”,IF(AND(avg.star≥value,avg.star<value),”STRING”,”STRING”)) as the result in picture below,

</aside>

Untitled

How to get the Correlation and liner Regreesion

Correlation and Regression can be done through Power Bi program by matching the data with correct type of chart or calculation method but it needs to be joined the data table first.