Pre-Processing & Post-Processing Clustering Visualization 2019-09-12T15:20:27+00:00

Pre-Processing & Post-Processing Clustering Visualization

This is sixth in a series of seven Segmentation and Clustering articles. It presents a few available data visualization options for 2D and multi-dimensional data sets.

 

Overview

For analysts, the most common problem in clustering is arriving at the inherent partitions of a data set. In most algorithms’ experimental evaluations, 2D-data sets are used so that the reader is able to visually verify the validity of the results (i.e. how well the clustering algorithm discovered the clusters of the data set).

Clearly, data set visualization is a crucial verification of clustering results. In the case of large multidimensional data sets (i.e. more than three dimensions), effective visualization of the data set is difficult. Moreover, perceiving clusters using available visualization tools is hard for people unaccustomed to higher dimensional spaces. Some of the available data visualization options are Scatter Plot, Minimum Spanning Tree, Dendrogram, and Smoothed Data Histogram.

                

 

Data Visualization in Python

There are numerous libraries for data visualization in Python, including matplotlib, Seaborn, ggplot, Bokeh , plotly, pygal, Altair, geoplotlib, Gleam, leather, and missingno.

Plotly is a web-based data visualization platform for data scientists and engineers. The engine behind it is plotly.js, an open-source charting library built on D3.js and stack.gl. The following graphs represent the elbow curve and 3-D visualization of the clusters (K-means clustering) for the Iris dataset using plotly. Also, find below jupyter notebook with the implementation of the same in Python.

 

 

Over 2.5 quintillion bytes of data are created every single day. Not all the hidden patterns and trends can be identified just by going through millions of rows of data. Data visualization can affect an organization in both positive and negative ways. Improper visualization may bias your results and lead to faulty decisions whereas correctly applied data visualization can give clearer insights along with improving efficiency.

-Authored by Manshi Poonia, Data Scientist at Absolutdata

Technical articles are published from the Absolutdata Labs group, and hail from The Absolutdata Data Science Center of Excellence. These articles also appear in BrainWave, Absolutdata’s quarterly data science digest.

Subscribe to BrainWave