The dataset used in this project was obtained from "https://www.kaggle.com/datasets/azim069/goodreads-all-time-greatest-books-8k-data?_ga=2.11631028.1837326818.1678024239-443706589.1675095538" and analyzed according to the interests of the analyst.

This is the raw data that contain with 9 columns and 7806 rows. This dataset is showing about the reader’s satisfactions to each book recording from goodreads website.

<aside> 💡 Project Objective:

To identify the reader’s behaviors whom rate 1 score to the book were effected to book’s review or not.
If the most of reviews didn’t come from 1 rating readers, which rating of book’s reviews is the most effective. </aside>

Goodreads 1.csv

In this CSV is contained with dataset that used in this project

What we need to prepare

First, it needs to be rearrange the star to locate in the same column and create one more column to specify star number as easier to analyze and visualize the data.

Untitled

Then create new spreadsheet and moved “Rating, Reviews and Average_star” because these data detail will not match to the new template above,

Untitled

Last, I decided to create one more column to show the rank of book in 3 types consist of “Excellent = average star ≥ 4.50”, “Good = average star 4.00-4.49” and “Under rated = average star < 4.00”

<aside> 💡 Classify type of rank by using =IF(avg.star≥value,”STRING”,IF(AND(avg.star≥value,avg.star<value),”STRING”,”STRING”)) as the result in picture below,

</aside>

Untitled

How to get the Correlation and liner Regreesion

Correlation and Regression can be done through Power Bi program by matching the data with correct type of chart or calculation method but it needs to be joined the data table first.