Case Studies

Three Months of Pilot Data: What We Measured

Tobias Schulz 11 min read
Dashboard view showing 90-day demand charge comparison across pilot buildings with monthly peak kW trend lines

After three months of live pilot operation across a set of commercial buildings in the Minneapolis metro area — spanning office, mixed-use, and retail building types — we have enough data to share an honest accounting of outcomes. Not the best cases. Not the worst cases. What the average looked like, where the approach delivered, where it underperformed, and what the data revealed about demand charge reduction in commercial buildings that we didn't know with as much precision before starting.

This is not a vendor case study. We're not going to give you four bullet points and a 23% savings headline. We're going to walk through what we measured, how we measured it, where the numbers are solid, and where reasonable people would disagree with our interpretation.

How We Measured: The Counterfactual Problem

Measuring demand charge savings from a demand management intervention faces a fundamental methodological challenge: you can't run the same building under both conditions simultaneously. The building either has forecast-based pre-conditioning or it doesn't. The months where the forecast was active can't be compared directly to different months, because demand charges vary by season, occupancy, and weather conditions that differ month to month.

The measurement methodology we used is a regression-adjusted comparison to a synthetic baseline. For each building, we built a regression model from 12 months of pre-pilot data that predicts monthly peak demand as a function of weather variables (cooling degree days, heating degree days, peak outdoor temperature, and peak outdoor humidity for the billing month) and occupancy-related calendar variables (number of high-occupancy days, event counts). This model gives us a prediction of what the building's peak demand would have been in each pilot month, absent the intervention.

The savings estimate is the difference between the model-predicted baseline and actual measured peak demand during the pilot months. This approach is similar to the baseline methodology used in utility-administered demand response measurement and verification (as described in the IPMVP Option B methodology framework), adapted for demand charge rather than energy consumption measurement. It has limitations — the regression baseline may not perfectly capture all the factors that drive demand, and three months isn't enough data to fully characterize seasonal variation — but it's more rigorous than a simple month-over-month comparison that conflates weather changes with intervention effects.

The Office Building Results

Across the office building category, the average regression-adjusted demand charge reduction during the pilot period was 14.8%, with a range of 8.3% to 22.1% across individual buildings. These figures represent the reduction in monthly peak demand kW, which translates directly to demand charge savings at the applicable demand charge rate.

The highest performers in this category had two characteristics in common: heavy concrete construction (thermal time constants of 8+ hours) and occupancy patterns that were moderately predictable — not perfectly predictable, but with enough regularity that the occupancy forecast model could identify high-demand days with meaningful advance accuracy. The buildings where the approach performed most strongly were those where the pre-existing setback schedule was demonstrably not optimized for the building's thermal characteristics — where the schedule's fixed pre-conditioning window was set too late for the building's actual recovery rate.

The lower performers (8–11% range) were two buildings with high-WWR curtain wall construction and shorter thermal time constants. In these buildings, the thermal mass pre-loading that drives the strongest results in heavy-mass buildings is less effective because the building's thermal flywheel is smaller. The pre-conditioning benefit is real but more limited. For these building types, the forecast's value is primarily in avoiding unnecessary pre-conditioning on low-demand days rather than in enabling aggressive thermal mass loading on high-demand days.

The Retail Building Results

The retail building in the pilot set — a 120,000 sq ft power center anchor tenant space — had a different demand profile and a different result pattern. Peak demand events in retail are concentrated on weekend afternoons in summer, not on weekday mornings as in office buildings. The demand management approach required adapting the pre-conditioning window for retail's afternoon peak risk rather than the morning-dominant office pattern.

The regression-adjusted demand reduction in the retail building was 11.4% over the three pilot months. This is lower than the upper end of the office building range, and there are two reasons: the pilot period included late winter and spring months with relatively moderate weather, which meant the highest-demand summer Saturday events didn't occur during the measurement period; and the foot-traffic forecast model for the retail building had higher uncertainty than the office occupancy models, producing less confident pre-conditioning recommendations on high-traffic event days.

The retail pilot surfaced a data quality issue we hadn't fully anticipated: the people counter system at the retail site had calibration drift over the pilot period, producing systematically inflated entry counts in months 2 and 3 that the occupancy forecast model learned from and then over-predicted occupancy going forward. Identifying and correcting the sensor calibration drift required manual review of the occupancy data against POS transaction counts — a reconciliation step that we'll build into the standard pilot monitoring protocol going forward.

Where the Forecast Was Wrong

Three forecast errors stand out from the three-month dataset as instructive failures.

The polar vortex underestimate. In late February, a polar vortex event brought outdoor temperatures to -18°F for approximately 36 hours. Our forecast model's cold-regime adjustments (described in a previous post) extended the pre-heating window and flagged the event as high-demand risk. However, the model underestimated the heating demand spike by approximately 22% for the period when outdoor temperatures were below -15°F — our training data didn't include many hours in this temperature range, and the infiltration spike at extreme cold (estimated 25% above baseline) exceeded what our model had parameterized. The result was an insufficient pre-heating window that led to a demand event higher than forecast predicted, though still lower than the pre-pilot baseline would have produced.

The event day under-forecasting. One office building in the set hosted a major all-hands meeting on a day our occupancy model predicted at 72% of design capacity (a typical Friday). Actual occupancy reached 118% of design capacity. The forecast pre-conditioned for a moderate-demand day; actual conditions were a high-demand day. The demand event that resulted was approximately 40 kW higher than the forecast-based pre-conditioning had prepared for. We added a manual event override interface to the pilot tools after this incident — a way for the facilities team to flag explicitly that tomorrow's occupancy will exceed the model's estimate, triggering more aggressive pre-conditioning regardless of the model output.

The BMS setpoint override. During month two, at one building, the BMS supervisor automatically overrode the forecast-based cooling setpoint write at 6:15 AM because a zone temperature alarm had triggered overnight and the supervisory sequence defaulted to conservative setpoint behavior. The override was appropriate from a fault response perspective, but it meant the pre-conditioning window ran at the conservative setpoint rather than the forecast-optimized one, resulting in under-pre-cooling on a day that turned out to be the highest-demand day of the pilot. We flagged this as a BMS alarm sequence interaction issue and worked with the facilities team to add an explicit exception for pre-conditioning hours in the alarm response logic.

We're not saying these failures are acceptable or unavoidable — each one led to a concrete improvement in the pilot protocol. We're saying they're representative of the real failure modes in live building deployments, and any honest evaluation of demand management technology should account for them rather than presenting only the days the system worked as intended.

The Energy Use Interaction

A question that facilities managers consistently ask during pilot evaluation: does demand charge reduction come at the cost of increased total energy consumption? The concern is that pre-conditioning burns off-peak energy that wouldn't otherwise be consumed, and net energy savings are limited or negative even if demand charges fall.

The data from the pilot period is nuanced on this point. Total HVAC energy consumption across the pilot building set changed by -3.2% on average compared to the regression-adjusted baseline — a small but measurable reduction. The reduction comes primarily from two effects: eliminating unnecessary pre-conditioning on low-demand days (the schedule had been over-conditioning on days where demand charge risk was low), and reducing afternoon overcooling that resulted from manual operator interventions trying to compensate for inadequate morning pre-conditioning.

However, on the highest-demand days — the days where pre-conditioning ran most aggressively — total HVAC energy was 8–14% higher than baseline because more off-peak energy was consumed moving the cooling load earlier. On those specific days, we traded on-peak energy for off-peak energy, and we reduced demand charges, but total energy consumption on those days was higher. This is the correct engineering trade-off — off-peak energy is cheaper, demand charges are more expensive per kW than energy is per kWh in most commercial tariffs — but it's important to be transparent that demand charge reduction and energy reduction are distinct outcomes that don't always point in the same direction.

What the Next Three Months Need to Show

The summer cooling season is the highest-demand charge exposure period for most of these buildings. The pilot results from February through April include the tail of heating season and the mild spring — not the conditions that generate the monthly peak demand records. The cases that matter most for evaluating the approach's economic impact are the hot summer afternoons, the back-to-back heat wave days, and the July and August occupancy surges.

That's the next data set. The regression models we've built on 12 months of historical data plus 3 months of pilot data will be updated before the summer season begins, incorporating the pilot period observations and the corrections to the forecast model from the identified failure cases. The polar vortex infiltration correction, the occupancy event override interface, and the BMS alarm sequence interaction fix are all in place before summer demand charge season starts.

The summer data will also give us the first seasonal completeness — a full annual cycle of data with the forecast system operating. That's the data set that provides the most defensible before/after comparison, and it's what will drive the next round of performance analysis beyond this preliminary three-month snapshot.