Different ways of coding a bar chart

Software Engineering 2346 views

I have this bar chart in a web application which shows the number of new projects which are fulfilled each month. The data is extrapolated from scrutinizing a Projects table and the created_at column. The SQL is pretty simple:

select
    convert(date, cast(year(created_at) as char(4)) + '-' + cast(month(created_at) as varchar(2)) + '-01') as date,
    count(*) as count
from
    projects
group by
    year(created_at),
    month(created_at)
order by
    1, 2;

It does a count for each year/month and formats the date as the first of the month. For example:

date count
2011-06-016
2011-07-0111
2011-08-018

JavaScript + D3

In a Ruby on Rails web application I've written, I generate the bar chart using the D3.js library with 156 lines of JavaScript code I had to write. D3 is a lower level library which generates visualizations using SVG. Writing data visualizations this way is more difficult and requires more knowledge of the technologies involved. However, it is also the most flexible and interactive. Using html, css, and javascript you can make truly interactive visualizations. In this case, the bar chart does not require interactivity.

R + plot

Here's an example using a few lines of R code with plot:

library(RODBC);

channel <- odbcConnect("dsn", uid="uid", pwd="pwd");
query <- "
select
    convert(date, cast(year(created_at) as char(4)) + '-' + cast(month(created_at) as varchar(2)) + '-01') as date,
    count(*) as count
from
    projects
group by
    year(created_at),
    month(created_at)
order by
    1, 2;";

results <- sqlQuery(channel, query);
close(channel);

barplot(results$count,
        main="New Projects per Month",
        names.arg=as.Date(results$date),
        xlab="Month",
        ylab="# Projects",
        col="#3d88ba");

Which produces this simple bar chart:

R + ggplot

Using gglot you can produce a better looking chart:

library(RODBC);

channel <- odbcConnect("dsn", uid="uid", pwd="pwd");
query <- "
select
    convert(date, cast(year(created_at) as char(4)) + '-' + cast(month(created_at) as varchar(2)) + '-01') as date,
    count(*) as count
from
    projects
group by
    year(created_at),
    month(created_at)
order by
    1, 2;";

results <- sqlQuery(channel, query);

close(channel);
library(ggplot2);
image <- ggplot(results, aes(x=date, y=count, fill=count)) +
geom_bar(stat="identity", position="dodge") +
geom_text(aes(label=count), vjust=1.5, colour="white",
position=position_dodge(.9), size=3) +
ggtitle("New Projects per Month") +
xlab("Date") +
ylab("# Projects") +
guides(fill=FALSE) +
theme(axis.text.x = element_text(angle=30, hjust=1, vjust=1));
Output:

Python + Pandas (via an IPython Notebook)

As with R, there are many ways to make the same bar chart in Python. Here’s one example using the Pandas library via an IPython Notebook:

import pyodbc
import pandas.io.sql as psql

dsn = 'dsn'
user = 'user'
password = 'password'
database = 'database'

con_string = 'DSN=%s;UID=%s;PWD=%s;DATABASE=%s;' % (dsn, user, password, database)
cnxn = pyodbc.connect(con_string)
cursor = cnxn.cursor()

sql = """
select
    convert(date, cast(year(created_at) as char(4)) + '-' + cast(month(created_at) as varchar(2)) + '-01') as date,
    count(*) as count
from
    projects
group by
    year(created_at),
    month(created_at)
order by
    1, 2;
"""
cursor.execute(query)
df = psql.frame_query(sql, cnxn)
cnxn.close()

df.plot(kind='bar', x='date', y='count')

What's great about IPython notebooks is that you can create a reproducible, step-by-step document which shows your inputs and outputs. It also supports various markup languages such as markdown and LaTeX. Many data scientists use Ipython notebooks to demonstrate a concept or to compliment a paper or thesis.