This paper describes a model developed to support investment decisions in a service station network. The model determines the sales potential of service stations in major cities of Brazil like Rio de Janeiro, São Paulo and Recife. It uses explanatory variables like traffic density, number of competitors, average number of pumps per competitor and speed on passing roads. Non linear regression analysis were used to develop the model, with some results presented here.
1. INTRODUCTION
This article presents a model for evaluating the sales potential at urban gas stations, with a view to deciding to open a new unit, monitoring the sales volume of an existing unit or analyzing the sensitivity of the factors that most influence the sales potential at a station. The developed model is a multiple nonlinear regression taking into account variables such as traffic volume, installation area, and level of competition in the vicinity of the gas station.
For fuel distribution companies operating in Brazil, the network of fuel stations is the heart of their business and their biggest source of revenue. On the other hand, a gas station is an investment of hundreds of thousands of dollars, encompassing not only its installation, but also its continued operation. In this way, the study is justified because, based on the determination of the sales potential inherent to the location, it is possible to estimate the Return on Investment (ROI) for a given operating configuration and degree of competition in the market, in order to assess the viability of the future post.
It is also possible, for a given existing station, to compare its sales potential with its current performance, using the model as a management tool for controlling, improving and training its operators. In any case, the inherent complexity of decisions to open gas stations is a fertile field for the application of several modeling techniques in the development of decision support systems. In addition to a brief literature review, in the next sections we will detail the variables considered, the analyzes developed and the results obtained.
2. LITERATURE REVIEW
There are few studies of this nature available in the literature. The work by DIXON (1995) developed in South Africa is the closest to the line developed in this article. His study was based on a sample of several urban gas stations located in Cape Town, Durban and Gauteng, with the main objective being to determine their sales potential.
Initially, 100 explanatory variables were tested, including the number of nozzles, traffic signs, traffic volume, access, area and appearance of the facilities. After some tests DIXON (1995) arrived at a non-linear model composed of 30 explanatory variables. The explanatory power of this model can be considered good, with the Adjusted R2 being 80%, 72% and 71% for Durban, Cape Town and Gauteng respectively. The author also shows that the use of quadratic terms in the model substantially improves its explanatory power. Finally, FERNANDES et al. (1997) developed a model to forecast fuel sales at stations located along highways. The formula adopted to explain aggregate fuel sales (market potential) in a given area was as follows:
![]() |
where V is the sum of sales of all stations in the area, X1 is the sum of the areas and X2. is the number of inhabitants in the area. The explanatory power of this model is high, with an Adjusted R2 of 94,9% for the potential of the area. Subsequently, the authors used gravitational models to analyze the individual sales of each gas station.
3. SAMPLE CHARACTERISTICS AND CONSIDERED VARIABLES
Throughout 1996, data were collected from 95 gas stations in the main Brazilian urban centers such as Rio de Janeiro, São Paulo, Belo Horizonte and Recife. The exclusive choice of urban service stations was due to the fact that, in Brazil, service stations in the interior have their sales volume determined, above all, by the degree of empathy and cordiality in the service. It appears that, contrary to urban stations, there is a relationship of loyalty between the fuel station and its customers in rural areas, which is also a difficult sector to quantify. Table 1 describes the explanatory variables and the dependent variable considered in the model. Table 2 contains a summary with descriptive statistics for each variable.
![]() |
The next section describes the initial analysis performed on the collected data.
4. PRELIMINARY ANALYSIS: MULTIPLE LINEAR REGRESSION
Once the main explanatory variables and the behavior of their respective signals were defined, the next step consisted of the residual analysis of the following multiple linear model:
![]() |
This model had a very low explanatory power (Adjusted R2 = 40%) in addition to a very high standard error, around 131 m3/month. However, as expected, the variables TRAFFIC (T), HOURS (H) and AVERAGE NOZZLES PER COMPETITOR (B) showed positive contributions to Sales Volume (VV), while the sum of competing stations (P) and speed ( V) presented negative contributions.
Among the objectives of the residual analysis in the multiple linear model, we highlight the exploration of the existence or not of multicollinearity between the explanatory variables, the identification of outliers and observations with high leverage coefficients (h) . The Cook distances of each of the observations in relation to the others were also analyzed, as well as the condition of homoscedasticity and normality of the residues.
A point with high leverage is an observation that contains an unusual set of values for the explanatory variables, capable of exerting a strong influence on the result (coefficients) of the regression due to its disproportionate (leveraged) effect compared to the other observations.
![]() |
With only one explanatory variable, the determination of unusual observations is done by analyzing the histogram of the explanatory variable. When working with more than two explanatory variables, it is much more complex to graphically determine whether a point is uncommon. For example, in figure 1 between two explanatory variables (x1 and x2), the point marked in the upper right corner is unusual, far from the large cloud of observations. However, analyzing the histograms of x1 and x2 separately, this difference is not noticed. The analysis of leverage coefficients (h) is a very useful tool for identifying observations such as the one noted above. High values of h indicate that an observation is having a disproportionate impact on the regression coefficients. In general, we can consider points of high leverage, those that satisfy the relationship:
![]() |
Cook's distance, in turn, is a statistic used to quantify how unusual an observation is, taking into account not only the explanatory variables (as is the case with leverage coefficients), but also their residuals. The calculation of the critical value for Cook's distance is given by the formula:
![]() |
In total, 15 observations with high leverage coefficient and high Cook distance were identified, which were removed from the sample for definitive data analysis, which involved the use of non-linear multiple regression techniques.
5. FINAL ANALYSIS: MULTIPLE NONLINEAR REGRESSION
Among the various non-linear models tested, the one that showed the best responsiveness (measured by a smaller squared residual error and a higher Adjusted R2 coefficient) was the following:
![]() |
However, the heteroscedasticity verified in the residuals led to the division of the sample of 80 observations (95 initial ones excluding the 15 observations with high leverage coefficients and Cook distances) into two samples of similar sizes, according to different tested criteria. Among these, the criterion that most increased the explanatory power of the model for the two subsamples was the speed variable (V). Observations with speed values greater than 40 km/h were grouped in one sample (high-speed roads) and observations with speed values lower than or equal to 40 km/h (low-speed roads) were grouped in another. We must remember that the median of the speed variable is 40 km/h.
For the subsample with V<=40 km/h, the model had a very high explanatory power (Adjusted R2 = 80%) in addition to the average of the residuals being quite reasonable, around 51 m3/month. Still as expected, the variables TRAFFIC (T), HOURS (H) and AVERAGE NOZZLES PER COMPETITOR (B) showed positive contributions to Sales Volume (VV), while the sum of competing stations (P) and speed ( V) presented negative contributions.
As for the subsample with V>40 km/h, the model also showed a very high explanatory power (Adjusted R2 = 76%) in addition to the average of the residues being quite reasonable, around 52 m3/month. Still as expected, the variables TRAFFIC (T), HOURS (H) and AVERAGE NOZZLES PER COMPETITOR (B) showed positive contributions to Sales Volume (VV), while the sum of competing stations (P) and speed ( V) presented negative contributions. According to graphs 1 and 2, which show the residuals versus the sales volume for the two subsamples, it is possible to see that no heteroscedasticity is observed.
![]() |
- CONCLUSIONThe developed model is currently in use by the fuel distribution company that owns the 95 stations where the observations were collected. The ease of operation provided by the use of electronic spreadsheets (for example, the EXCEL ® spreadsheet) allowed the use of this model by field personnel in deciding whether or not to open a new gas station. One of the factors that contributed to the acceptance of the model was the fact that there was no constant term; thus, when all explanatory variables are zero, the sales volume (VV) estimated for the gas station is also zero. Future developments of this model are directly related to its segmentation by urban centers, in order to consider the particular characteristics of each city regarding fleet profile, average income, traffic pattern, regulatory municipal postures, etc. There are clear structural differences between the various Brazilian urban centers that will certainly increase the explanatory power of the model in the future.
8. BIBLIOGRAPHY
DIXON, EC, 1995; “The Management of a Service Station Network”; Decision Support Services Programs – CSIR Information Services, Pretoria, South Africa.
FERNANDES, C., THEMIDO, I., 1997; “Sales Modeling of Liquid Fuels Using Gravitational Models”; Operational Investigation, v.17, n.1, June, pp. 41-60.
FREES, EW, 1996; Data Analysis Using Regression Models – The Business Perspective. 1 ed. New Jersey, Prentice Hall International.