Project description

This project represents a typical analysis of a sales dataset using Pandas and Streamlit.

Sample Dataset used : https://www.kaggle.com/carrie1/ecommerce-data (540.000 entries)

Cleaning the data

  1. Converted columns to appropriate types.
  2. Analysed negative values in columns.
  3. Removed negative values in columns where possible.
  4. Analysed non-numerical entries in numerical column.
  5. Removed nulls where possible.
  6. Removed dupes.
  7. Replaced illegal entries.
  8. Created new clean dataset.

KPIs

  • Which customers have spent the most money?
  • Which products overall sell the most?
  • Which products overall are the most profitable?
  • App Features :

    1. Total profit, sales and most sold item based on filters applied.
    2. Top 10 customers based on the filters applied.
    3. Products which the top 10 customers bought (bar charts).
    4. Easily sortable dataframe.

    Project info

    This project could be roughly divided into 2 phases. In the first step, the data is cleaned.

    At the same time, any anomalies present in the dataset are uncovered and explained thoroughly.

    After a clear picture of the dataset and its unique properties is acquired, we move to the visualisation phase.

    In this phase, the data is presented in the most digestible and clear format, making the best use of various filters, key metrics and charts.

    Software used : Python (Pandas), Jupyter Notebooks, Streamlit

    View more projects

    City

    Athens, Greece