The data that I used is the data provided by BAND(Udacity) project 2. I reorganised the data using SQLite and tried to do some visualisations using excel. As excel kept crashing, I decided to learn some python, using resources from Udacity and Mode Analytics, in helping me do the analysis.

I want to look at which country the company got the highest number of orders from, which product category was selling it selling most to that country and what are the products.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
sales_data = pd.read_csv('Sales info with date.csv')
sales_data[:2]
Out[2]:
OrderDate OrderID ProductID CategoryName ProductName ShipCountry UnitPrice Quantity Discount
0 2014-07-04 10248 11 Dairy Products Queso Cabrales France 14.0 12 0.0
1 2014-07-04 10248 42 Grains/Cereals Singaporean Hokkien Fried Mee France 9.8 10 0.0
In [3]:
#Revenue
sales_data['Revenue'] = (sales_data['UnitPrice'] - sales_data['Discount'])* sales_data['Quantity']
sales_data[:2]
Out[3]:
OrderDate OrderID ProductID CategoryName ProductName ShipCountry UnitPrice Quantity Discount Revenue
0 2014-07-04 10248 11 Dairy Products Queso Cabrales France 14.0 12 0.0 168.0
1 2014-07-04 10248 42 Grains/Cereals Singaporean Hokkien Fried Mee France 9.8 10 0.0 98.0
In [4]:
count_shipCountry = sales_data['ShipCountry'].value_counts()
In [5]:
#checking how many orders were shipped to each country
count_shipCountry.plot.bar(color = 'lightblue')
plt.title('Numbers of delivery to each country')
plt.xlabel('Country'), plt.ylabel('Number of Delivery')
plt.show()

#With Shipping Country on the x-axis and count of delivery on y,
#it shows that the USA has the highest count of delivery compared to the other countries. 
#This could also mean that the USA might be a big market target for the company. 
In [6]:
category_country_sales = sales_data.pivot_table(index='ShipCountry', columns='CategoryName', values='Revenue', aggfunc='sum')
In [7]:
#breaking down where revenue generated by country for each category product
category_country_sales.plot(kind='bar', stacked=True, figsize=[18,8], title='Sales By Country', colormap='autumn')
plt.ylabel('Sales Amount/USD')
plt.show()

#With Revenue on Y axis and country on  X axis,
#we can see that the USA is an important client for the company in terms of revenue,
#followed by Germany and then Austria.
#However, 
#this visualisation makes it hard to understand which product category was selling out more.
In [8]:
#I then decided to plot the information using heatmap
#With Product Category on X axis and Ship Country on Y axis,
#the heatmap shows higher color intensity for USA and Germany in the beverage section
#and lowest in Spain, Poland and Portugal.
#Although the company makes more sales from selling beverages to the USA, 
#the USA also seems to be the country that is buying more products from other categories.

#Another thing that we should note is that, there are no orders from Argentina for meat or
#from Norway/Poland for cereal or grains.
sns.set_style("white")
cmap = sns.light_palette("#4285f4", as_cmap=True)
sns.heatmap(category_country_sales, cmap = cmap)
plt.title('Revenue by Country for each categorical product')
plt.show()
In [9]:
sales = sales_data.pivot_table(index='OrderDate', columns='CategoryName', values='Revenue', aggfunc='sum')
In [10]:
sales.plot(kind='line', figsize=[22,8], colormap='Accent')
plt.ylabel('Sales/USD')
plt.title('Line Graph showing Revenue across time')
plt.show()
#after plotting Revenue against OrderDate, we see that some revenue data was unavailable
#This shows that not all product category sells out everyday and revenue was not consistent.
#But beverages were still doing well 
In [11]:
beverages_data = sales_data[sales_data['CategoryName'] == 'Beverages']
In [12]:
#Beverage products in each country 
beverages_data_pivot = beverages_data.pivot_table(index='ProductName', columns='ShipCountry', values='Revenue', aggfunc='sum')
In [13]:
sns.set_style("white")
cmap = sns.light_palette("#fc8d62", as_cmap=True)
sns.heatmap(beverages_data_pivot, cmap=cmap)
plt.title('Beverages shipped to each country')
plt.show()

#the heatmap below shows that the top selling beverage is Cote de Blaye, which is wine. 
#At about 200 USA a bottle, it was still very popular amongst many country.
#But there were no orders from countries, such as Finland, Ireland,Poland, Switzerland or the UK.
#The company should look into why and try to market this product.
#Moreover, countries such as Norway is not ordering much beverage from the company. 
#The company should look into this.