Skip to the content.

Product Catalogue Project

Domain Proyek

The business is focused on improving the product recommendation system for an e-commerce platform. The goal is to enhance the customer shopping experience by providing more accurate and relevant product recommendations. The current recommendation system lacks effectiveness and relevance, leading to potential customer dissatisfaction and lower conversion rates.

Business Understanding

- Problem Statement

The main problem is the inefficiency of the current product recommendation system. Customers often receive recommendations that are not tailored to their preferences or needs, resulting in reduced user engagement, longer browsing times, and lower sales conversion rates. The business aims to solve this issue by implementing an improved recommendation system that provides customers with more personalized and relevant product suggestions.

- Goals

Data Understanding

Next, describe all the variables or features in the data. As an example : | Information | Decription | | —— | —— | | Number of Rows | 150000 | | Number of Columns | 16 | | Missing Value | 2384 |

Keterangan kolom data :

Name Description
uniq_id id
product_name Name of product
manufacturer The item manufacturer, as reported on Amazon. Some common “manufacturers”, like Disney, actually outsource their assembly line.
price price of product
number_avaiable_in_stock Available in stock
number_of_reviews Score review of product
number_of_answered_questions Amazon includes a Question and Answer service on all or most of its products. This field is a count of how many questions that were asked actually got answered.
average_review_rating Average score review rating
amazon_category_and_sub_category A tree-based, »-delimited categorization for the item in question.
customers_who_bought_this_item_also_bought References to other items that similar users bought. This is a recommendation engine component that played a big role in making Amazon popular initially.
description Description
product_information Information of product
product_description Description of product
items_customers_buy_after_viewing_this_item Rata-rata Pendapatan Nasabah
customer_questions_and_answers A string entry with all of the product’s JSON question and answer pairs
customer_reviews A string entry with all of the product’s JSON reviews
sellers A string entry with all of the product’s JSON seller information (many products on Amazon are sold by third parties)

Data Preparation

Data Cleaning: Remove missing values in the “price” and “amazon_category_and_sub_category” columns. Feature Engineering: Extract key information from columns like “product_name,” “price,” “product_information,” and “amazon_category_and_sub_category” to create a unified content for embedding. Text Embedding: Use a pre-trained Sentence Transformer model to convert product content into numerical embeddings. Clustering: Perform clustering on the embeddings to group similar products together.

Data Modelling

Clustering: Apply community detection methods to create clusters of similar products based on their embeddings.

Model Architecture:

The clustering process involves the following steps:

Evaluation

The silhouette score is used to evaluate the quality of the clustering. It measures how similar an object is to its own cluster compared to other clusters. A higher silhouette score indicates better-defined clusters.

Model Performance

  1. Before Reclustering:

Silhouette Score: The silhouette score measures the quality of clustering by calculating how similar an object is to its own cluster compared to other clusters. It ranges from -1 to 1, where higher values indicate better-defined clusters. The moderate silhouette score obtained indicates that the initial clustering yielded moderately well-defined clusters. Clusters: The products were categorized into different clusters based on their sizes: Single-member clusters: Products that were not part of any larger cluster.

  1. After Reclustering (Member-to-Member):
  1. After Reclustering (Member-to-Centroid):

Summary of Analysis

before_clustering

after clustering member to member

member to centroid

Given the silhouette scores you’ve provided:

Silhouette Score Before Reclustering: 0.18 Silhouette Score After Member-to-Member Reclustering: 0.10 Silhouette Score After Member-to-Centroid Reclustering: 0.12

Here’s what these scores indicate:

After applying the member-to-centroid reclustering method, the silhouette score increased slightly compared to the member-to-member reclustering. The clusters may have become more well-defined and distinct from each other compared to the previous method.

In summary, the silhouette score values suggest the following:

The initial clusters had a moderate level of separation. The member-to-member reclustering might have caused some overlap or less-defined separation. The member-to-centroid reclustering improved the cluster separation slightly compared to the member-to-member method. It’s important to note that the interpretation of silhouette scores is relative, and the scores should be evaluated in the context of the specific dataset and clustering problem. Additionally, while silhouette scores provide valuable insights, they should be considered along with other factors when making decisions about clustering methods and their outcomes.