How does the framework proposed in Part 2 of this series perform? Can it scale to large number of models and multiple simultaneous requests? To evaluate the performance of the implementation described in Part 2 I built 100 models using the same set of inputs. For each model I inserted the relevant metadata into modelmd_tab. I then invoked the score_multimodel stored procedure (Part 2) with different WHERE clause arguments so that a different set of models would be selected each time. The number of selected models ranged from 20 to 100. Figure 3 shows the time required for scoring a single row as a function of the number of models. The numbers are for a single 3 GHz CPU Linux box with 2 G of RAM. As indicated in the graph, the proposed architecture achieves real-time (below 1 second) performance. In fact, extrapolating the trend in the graph, it would take about 0.54 seconds to score one thousand models. Actual performance for different systems is impacted by the type of model and the number of attributes used for scoring. Nevertheless, the numbers are representative for an untuned database running on a single CPU box.
Besides the good performance with the number of models, the system also scales well with the number of concurrent users. The architecture can leverage multiple processors and RAC. The cursor sharing feature described in Part 2 also keeps the memory requirements to a minimum while the database caching mechanisms will make good use of available memory. Because we score each model independently, it is also possible for the application to assign groups of models to different servers and increase cache re-use.
It is important to note that the numbers in Figure 3 should not be used as a baseline to estimate the performance of scoring multiple records with a single model in Oracle Data Mining. In this type of task, Oracle Data Mining can score millions of records in a couple of seconds (link).
Figure 3: Time for sequentially scoring a single row with multiple models.
I started this series trying to answer the question: Can we implement a large-scale real-time scoring engine, coupled with model management, using the technologies available in the 10gR2 Oracle Database? The answer is Yes. The technologies available in the 10gR2 Oracle Database provide a flexible framework for the implementation of large-scale real-time scoring applications. As shown in the example described above, it is possible to support:
- Large number of models
- Large number of concurrent calls
The approach relies on off-the-shelf components (e.g., RAC and Oracle Data Mining). It also supports a flexible filtering scheme (Part 2) and can be extended to leverage textual and spatial information.