Category Archives: Projects

Visualizing Energy Consumption in Philadelphia

Today is the deadline for Azavea’s Open Data Philly Visualization Contest, which is near to my heart because it calls on developers to design a data visualization application for my home city of Philadelphia!

My coworker Weilin Meng and I have been working for the past month on our submission and we are excited and honored to reveal our project, an interactive data visualization that explores energy consumption in the City of Philadelphia.

About the Open Data Philly Visualization Contest

OpenDataPhilly is a portal that provides access to free datasets related to the Philadelphia region. It contains datasets on Philadelphia’s education system, police department, maps of the Philadelphia region, property values, and more.

Azavea is a company in Philadelphia that creates GIS-based decision-support software. Azavea has a socially-conscious focus and its mission is to empower and improve communities through its work. To further this mission, Azavea has re-designed OpenDataPhilly. To celebrate their unveiling of the new design, Azavea has called on developers and data enthusiasts around the world to submit data visualizations to their Open Data Philly Visualization Contest.

About our submission

For this competition, we produced a data visualization that allows users to visualize energy consumption in the city of Philadelphia. The visualization consists of a map and scatterplot, which display the energy consumption patterns of large commercial buildings in the Philadelphia region. The map allows users to view each building’s natural gas consumption, electricity consumption, energy use intensity (EUI), energy star score, and greenhouse gas emissions using visualizations developed in D3.js and the Google Maps API. The visualization can be used to detect poor and outstanding performers against energy benchmarks, or it can be used to understand energy consumption patterns in the City of Philadelphia.

What data sources does the visualization use?

The visualization uses the following two sources of data, which are available for public use through

This dataset, produced by the GIS Services Group of the City of Philadelphia, encodes the shapes of buildings and building footprints.

Building Energy Benchmarking
Compiled by the City of Philadelphia Mayor’s Office of Sustainability, the energy benchmarking data is self-reported data on energy consumption metrics pertaining to large commercial properties in Philadelphia.

How did we do it?

We produced our submission by first programmatically encoding all of the addresses in the energy benchmarking dataset to longitude and latitude points using the Google Maps geocoding service. Once we had these points, we encoded the energy data as a GeoJSON file containing a collection of points with the energy data stored in each point’s properties. We then converted OpenDataPhilly’s “buildings” shapefile into GeoJSON and linked this file with the collection of points by looking up each point in the buildings GeoJSON using a “point in polygon” approach. This allowed us to associate the energy benchmarking data with the polygons encoded in the buildings GeoJSON file. Once we produced our final GeoJSON file, we were able to use the Google Maps Javascript API and D3.js to draw the resulting polygons, heatmap, and scatterplot, each of which made use of the energy data in coloring, sizing, and charting. All of the preliminary data processing work was performed in Node.js and Python, while the final web-based visualization was produced primarily using D3.js and the Google Maps Javascript API.

How can I view the visualization first-hand?

You can view the visualization here:

Please be aware that the visualization will likely not function properly on older browser versions and it has not been tested on Internet Explorer.

What are some of the features of the visualization?

The heatmap is initialized with “Total Greenhouse Gas Emissions” selected upon first visiting the page:


The map supports zooming and panning:


Clicking the “Scatter Plot” button slides out a scatter plot pane from the right. Points in the scatterplot are sized by energy star rating (100 being the best score, 50 being the mean score, and 0 being the worst score) and are colored by property type:


Users can change the measure used to generate the heatmap and the scatterplot y-axis values, and the scatterplot supports tooltips on hover:


Users can toggle the heatmap on and off and can continue to pan and zoom in the map overlay while the scatterplot is showing:


What technology and tools does the project make use of?

To show our appreciation for those developing open source tools and technology, we’ve included brief descriptions and links to some of the tools we used below:

GDAL: A translator library for working with vector and raster geospatial data formats, which we used for its ogr2ogr command-line utility
Geojson-Merge: A Node.js package / utility developed by Mapbox, which provides a command-line interface for merging multiple GeoJSON files together
GeoJSON-JS-Utils: A Javascript module that provides some useful functions for manipulating and working with GeoJSON, which we used primarily for its “point in polygon” implementation
D3.js: A powerful data visualization library for generating data visualizations using Javascript, which we used to produce the scatterplot, render the polygons on the Google Maps overlay, and color the polygons by EUI
D3-tip: A great tooltip implementation for D3.js written by Justin Palmer (if you haven’t seen his amazing data visualizations of Portland, Oregon, then check them out!)
JQuery: The Javascript library that we all know and love…
ColorBrewer: A color scheme by Cynthia Brewer that can be used to color shapes by their values (our project used the CSS implementation)

Visualizing Provider Charges

CMS reimbursement forms a nexus between political, corporate, and economic influences. This intersection ultimately determines the prices of healthcare procedures for senior citizens, and, indirectly, for consumers in the private insurance market. There has been considerable attention paid to the notion of variability in pricing for procedures in CMS reimbursement. Recent news stories have highlighted differences in costs for identical procedures performed in both proximate and disparate geographic areas. In an effort to introduce transparency into CMS pricing and reimbursement strategies, and to shed light on variability in pricing and fraud, CMS recently released provider utilization and charges data and made an earnest call to developers to build data visualization applications that use the dataset.

In my free time I’ve been working on one such data visualization application. The application is built using basic front-end technology (JQuery and D3) and a MySQL database to store the provider charges and utilization data. My goal was to make it possible to dynamically query the underlying dataset and produce interesting, meaningful charts without having knowledge of SQL and databases. Users can select variables to filter the dataset by and can calculate a weighted average value for charges submitted to CMS, payments made by CMS, and the CMS allowed amount. These weighted average charges can be grouped by various values in order to produce bar charts for visualizing the CMS data.


Users can add filters, which presents a modal window that allows users to select filters from the available underlying variables. Selected filters are highlighted and can be removed by unselecting them. The list is regenerated as variables are selected so that the application is responsive to the user’s actions.


As filters are selected they are added to the filters control section to the right of the chart

As filters are selected they are added to the filters control section to the right of the chart

Users can remove filters either from the filters modal or by removing them from the filter control section to the right of the chart. Once the user has selected an appropriate set of filters that satisfies their interests, they can regenerate the chart and the chart will be dynamically built by querying a PHP service behind the scenes using Ajax.

For example, let’s say the user wants to view the average charges billed by the provider, average payment amounts, and average CMS allowed amounts for all procedures performed in California. Let’s also assume that they want to group the results by provider type. This will produce a fairly large chart, with one bar for each provider type.

Average charges by provider type in California.

Average charges by provider type in California.

This provides some fairly immediate observations about the underlying data. For example, the highest submitted charges are coming from ambulatory surgical centers. This is hardly surprising as surgeries would undoubtedly be more expensive than routine procedures (such as routine evaluation and management or laboratory tests).

The user can also change the group by variable, allowing them to view the averages calculated over different buckets or subsets of the data. For example, the user might want to remove all filters and change the grouping variable to “state” in order to simply view the average charges, average payment, and average CMS allowed amounts by state.



By clicking “Regenerate Chart” the data is fetched using the new criteria and the chart is rebuilt. In this case we will have one bar per state and will be able to easily view the differences in total average charges and reimbursement across all CPT codes and procedures performed within the state.


If you look closely you may see some funny states (what the heck is ZZ??) but it turns out that these are just special classifications for additional procedures performed in areas that are covered but are not one of the standard 50 states. For example, the documentation for the provider utilization and charges file says the following:

‘XX’ = ‘Unknown’
‘AA’ = ‘Armed Forces Central/South America’
‘AE’ = ‘Armed Forces Europe’
‘AP’ = ‘Armed Forces Pacific’
‘AS’ = ‘American Samoa’
‘GU’ = ‘Guam’
‘MP’ = ‘North Mariana Islands’
‘PR’ = ‘Puerto Rico’
‘VI’ = ‘Virgin Islands’
‘ZZ’ = ‘Foreign Country’

While this chart doesn’t tell us much on its own, there are a few interesting nuggets that could be explored further. For example, the average payment amount is considerable higher in the ‘XX’ bucket, which is an ‘Unknown’ state. Why would this be? Could this potentially be a signal that there is some fraud occurring in charges submitted from ‘Unknown’ states?

It’s also interesting to note that the average submitted charges are so much higher for the ‘Armed Forces Europe’, ‘Armed Forces Pacific’, and ‘Foreign Country’ categories. Finally, this chart displays regional differences in CMS reimbursement, which is based on a variety of factors that are geographically dependent, including wages.

Finally, to illustrate another use, let’s filter the dataset down to include only chiropractors. Let’s not apply any other filters to the dataset and instead let’s group by HCPCS code.


This chart makes one thing very obvious. Chiropractors are charging and being paid much more for the procedure with HCPCS code 99205. Let’s list out what some of these procedures are:

99203: New patient visit with detailed history and exam, low degree of medical decision making, presenting with a moderately sever problem
99205: New patient visit with comprehensive history and exam, high degree of medical decision making, presenting with a moderate to highly severe problem

Notice the conspicuous absence of 99202 and 99204 codes. While 99202 is similar to 99203, it indicates less effort on the part of the physician, and therefore likely translates to lower reimbursement. The case is similar for 99204.

99202: New patient visit with an expanded, problem-focused history and exam, straightforward degree of medical decision making, presenting with a low-to-moderate problem.
99204: New patient visit with a comprehensive history and exam, moderate degree of medical decision making, presenting with a moderate to high severity problem

While this may be perfectly plausible, it could also be a sign of “upcoding,” when billing specialists assign a more valuable code to a procedure that was performed even though the procedure actually is more accurately captured by a lower-value code. While this chart is in no way proof or evidence of that fact, it could possibly be an indicator that this is occurring, and could be explored further. If the chart were to indicate the presence of upcoding, this would be an example of how data visualization can help CMS accomplish their goals of identifying fraud and abuse.

Designing a CMS Data Visualization Application

The other week I became aware of ONCHIT’s call for proposals for their annual Datapalooza Code-a-Palooza competition.  The official notice called for proposals that made heavy use of the Provider Utilization and Payment Data that CMS released on April 9th.  The deadline for submissions was April 25th, giving competitors just over two weeks to analyze the various open data sources at CMS and determine how a data visualization application could be built out of them.  I decided that this would be a good project for me to work on and that maybe I’d even get lucky and get invited to Code-a-Palooza.

So needless to say, I was very busy over the last two weeks loading various data sources, exploring the data, writing user stories, and designing mockups of the data visualization application.  Now I have multiple sources of open CMS data loaded and ready to play with, including:

  • Healthcare-associated infections from the CDC
  • Death and readmission rates for hospitals around the country
  • Average inpatient charges for hospitals for some of the most common DRG codes
  • Average outpatient charges for outpatient facilities for some of the most common APC codes
  • The national downloadable database of physician data
  • Group practice quality measures from CMS’s ACO initiative
  • Physician utilization and reimbursement data for physicians in the Medicare

The application was pretty brief, so I wrote some brief use cases and accompanied them with wireframes to help give a sense of what the application might look like.  Below are two of the wireframes that I submitted along with the  main written use cases for the application proposal.

I’ll continue to document the project as I work on it here!



photo credit:  tec_estromberg on flicker