This page describes the data, my methods of analysis and results for the above task.
I used MS Excel 2003 to make all the visualizations.
Data |
|||||||||||||||||||||||||
The following sets of numerical data were available:-
|
|||||||||||||||||||||||||
|
Here is the first plot of the data I made a simple line chart with one
series each for the 3 utilities, and one series for the temperatures. It gives a
clear idea about the scale relationships between the items. To enable visual comparision between the series, I scaled up gas by a factor of 10, and scaled down water by a factor of 3. At this point, there is no assumption about existence or absence of relationships between the items. The image may not make much sense unless you try to look at 2 items at a time (a while later). |
|||||||||||||||||||||||||
Supplementary data |
|||||||||||||||||||||||||
|
Temperature: The temperature records for Chicago were initially
obtained from the
bills of the gas / electric company. To verify correctness of the data I
compared the values against temperature records off the goverment website
National Weather
Service Weather Forecast Office ( backed by
NOAA data I suppose) The data here is aggregated by month. Each column in a given set (month) of columns, indicates the difference between the temperature value reported on the bill in that month, and the temperature value recorded by NOAA. You may want to see the aggregation by year. The difference shown here is in absolute terms (degree F). A percentage (%) difference chart can be seen here. But it is not an accurate picture of differences, since the maximum temperature above 0 varies by season.
|
|||||||||||||||||||||||||
|
Electricity utilised by various appliances: The
advise was to look up this data from a few websites and cross-reference to
establish the reliability. I used the following web pages for this purpose. |
|||||||||||||||||||||||||
| Water usage: I used the data provided in the exam description, after cross-referencing with http://www.kingcounty.gov/environment/wastewater/WaterConservation/Tips.aspx> | |||||||||||||||||||||||||
Analysis |
|||||||
Seasonal Trends - Macro Patterns |
|||||||
|
There seems to be a strong co-relation between temperature and the energy
utility usage, but water usage is pretty independent. So, I observed water usage
separately from the other two items. First, energy. It is very clear that temperature affects the energy usage throughout the year. During winter natural gas used for heating and in summer electricity used for cooling, both show trends as per the temperature.
Looking at the series, we can come up with a crude estimate that in this household, following is the usage pattern. The May conclusion is based on the fact that E bill starts increasing in May, but at the same time, the G bill has not yet reached its lowest levels. I deduce this happens because at this time, the temperature is just cold enough to have the heater work during the evening and/or night, while during the day the air conditioner is at work. Also, this explanation might be off the mark. It simply can be the case that the space heater is being used more often in these months, and the central heat anyway turns on less often due to higher temperatures of approaching summer. Similar argument applies for the Oct month. The following bubble charts make the case stronger for natural gas heating being used in October, but not as much in May.
|
|||||||
|
After observing above charts, I came up with a revised cycle of the heating and cooling... |
|||||||
|
An additional guess - the thermostat is set to around 65-68 F. This can be deduced by studying the line series; the gas usage starts peaking every year once the temperature crosses around 65. |
|||||||
| Next, water | |||||||
|
|||||||
Statistical |
|||||||
A statistical analysis can bring forward outliers. I performed a simple
analysis, in which I visualize the variance of each value from the mean. The
mean is calculated by grouping the years together for every month. The standard
deviations and mean are plotted in graphs below the variance visualization. I
found this analysis feasible because the years drawn as series over the month
axes, are pretty cohesive for both the electricity and gas (gas is extremely
cohesive). So it is clear that they bundle up around a mean value each month.
|
|||||||
|
|||||||
WATER |
|||||||
| For water, I presumed a normal usage level of 90 gallons/day for day to day activities (after observation) and used this to see offsets which might be interesting. | |||||||
|