Explaining Predictions with Easy to Read Tree Rules

When scoring decision tree data mining models using SQL PREDICTION operators, one can obtain a node ID of the tree using the PREDICTION_DETAILS operator. This node ID indicates which node in the decision tree was used to predict the outcome for a specific record (row). Sometimes, we may also want to produce a textual rule for the specific node, and provide this rule on a per-row basis when scoring new data, for example:
PREDICTION RULE
---------- ----------------------
YES RELATIONSHIP isIn ("Husband" "Wife") and
EDUCATION_NUM greaterThan 12.5 and
EDUCATION isIn ("Bach." "Masters" "PhD" "Profsc") and
CAPITAL_GAIN greaterThan 5095.5
Peter Stengard sent me a tip on how to create this type of textual tree rules. The idea is to create a view (or a table) from the XML tree model representation and join it with the PREDICTION operator statement in the query. Before describing how this can be implemented, let's look at an example.

Consider the following data set:
SQL> SELECT empno, job, sal, deptno FROM emp;

EMPNO JOB SAL DEPTNO
---------- --------- ---------- ----------
7369 CLERK 800 20
7499 SALESMAN 1600 30
7521 SALESMAN 1250 30
7566 MANAGER 2975 20
7654 SALESMAN 1250 30
7698 MANAGER 2850 30
7782 MANAGER 2450 10
7788 ANALYST 3000 20
7839 PRESIDENT 5000 10
7844 SALESMAN 1500 30
7876 CLERK 1100 20
7900 CLERK 950 30
7902 ANALYST 3000 20
7934 CLERK 1300 10

14 rows selected.
Also consider a decision tree model tree_model for predicting JOB based on SAL and DEPTNO. We can create this model as follows:
CREATE VIEW emp_view AS
SELECT empno, job, sal, deptno FROM emp;

CREATE TABLE my_settings (
setting_name VARCHAR2(30),
setting_value VARCHAR2(30));

BEGIN
-- Populate settings table
INSERT INTO my_settings VALUES
(dbms_data_mining.algo_name, dbms_data_mining.algo_decision_tree);
INSERT INTO my_settings VALUES
(dbms_data_mining.tree_term_minrec_split, 2);
INSERT INTO my_settings VALUES
(dbms_data_mining.tree_term_minrec_node, 1);

COMMIT;

dbms_data_mining.create_model(
model_name => 'TREE_MODEL',
mining_function => dbms_data_mining.classification,
data_table_name => 'EMP_VIEW',
case_id_column_name => 'EMPNO',
target_column_name => 'JOB',
settings_table_name => 'my_settings');
END;
The following query scores the data using the model tree_model and also returns the tree node ID a record was assigned to:
SQL> SELECT PREDICTION(tree_model USING a.*) prediction,
PREDICTION_DETAILS(tree_model USING a.*) node_id
FROM emp a
ORDER BY empno;

PREDICTION NODE_ID
-------------------- --------------------
CLERK <Node id="2"/>
CLERK <Node id="2"/>
CLERK <Node id="2"/>
MANAGER <Node id="3"/>
CLERK <Node id="2"/>
MANAGER <Node id="3"/>
MANAGER <Node id="3"/>
ANALYST <Node id="4"/>
ANALYST <Node id="4"/>
CLERK <Node id="2"/>
CLERK <Node id="2"/>
CLERK <Node id="2"/>
ANALYST <Node id="4"/>
CLERK <Node id="2"/>

14 rows selected.
Using Peter's approach we could write the following query:
SELECT
PREDICTION(tree_model using a.*) prediction,
rule_details.description rule
FROM emp a,
tree_model_rules rule_details
WHERE
extractvalue(PREDICTION_DETAILS(tree_model USING a.*),
'Node/@id') = rule_details.node_id
ORDER BY empno;

PREDICTION RULE
---------- --------------------------------------------------
CLERK SAL lessOrEqual 2025
CLERK SAL lessOrEqual 2025
CLERK SAL lessOrEqual 2025
MANAGER SAL greaterThan 2025 and SAL lessOrEqual 2987.5
CLERK SAL lessOrEqual 2025
MANAGER SAL greaterThan 2025 and SAL lessOrEqual 2987.5
MANAGER SAL greaterThan 2025 and SAL lessOrEqual 2987.5
ANALYST SAL greaterThan 2025 and SAL greaterThan 2987.5
ANALYST SAL greaterThan 2025 and SAL greaterThan 2987.5
CLERK SAL lessOrEqual 2025
CLERK SAL lessOrEqual 2025
CLERK SAL lessOrEqual 2025
ANALYST SAL greaterThan 2025 and SAL greaterThan 2987.5
CLERK SAL lessOrEqual 2025

14 rows selected.
tree_model_rules_details is a view created from the XML model representation. The query uses the PREDICTION operator to get the predictions and the join with the tree_model_rules_details to get the textual rule for the node_id.

The following code illustrates how to create the tree_model_rules_details view. This example uses tree_model as the model name. For other models, we can use the same code and replace tree_model with the name of the new model.
CREATE VIEW tree_model_rules AS
SELECT * FROM
XMLTABLE('
declare function local:parent_node($e as element()*) as element()*
{
for $n in $e
return
<rule>
{
local:parent_node($n/../../Node[@id=$n/../@id])
}
{
if (count($n/SimplePredicate) > 0) then
concat(
string($n/SimplePredicate/@field),string('' ''),
string($n/SimplePredicate/@operator),string('' ''),
string($n/SimplePredicate/@value),string('' and '')
)
else
string('''')
}
{
if (count($n/SimpleSetPredicate) > 0) then
concat(
string($n/SimpleSetPredicate/@field),string('' ''),
string($n/SimpleSetPredicate/@booleanOperator),string('' ''),
string($n/SimpleSetPredicate/Array/text()),string('' and '')
)
else
string('''')
}
{
if ($n/CompoundPredicate/*[1] instance of
element(SimplePredicate)) then
concat(
string($n/CompoundPredicate/SimplePredicate[1]/@field),
string('' ''),
string($n/CompoundPredicate/SimplePredicate[1]/@operator),
string('' ''),
string($n/CompoundPredicate/SimplePredicate[1]/@value),
string('' and '')
)
else if ($n/CompoundPredicate/*[1]
instance of element(SimpleSetPredicate)) then
concat(
string($n/CompoundPredicate/SimpleSetPredicate[1]/@field),
string('' ''),
string(
$n/CompoundPredicate/SimpleSetPredicate[1]/@booleanOperator),
string('' (''),
string(
$n/CompoundPredicate/SimpleSetPredicate[1]/Array/text()),
string('') and '')
)
else
string('''')
}
</rule>
}; for $a in /PMML/TreeModel//Node
let $p := local:parent_node($a)
where count($a/../../Node) > 0
return
<SimpleRule id="{$a/@id}">
{normalize-space(substring($p,0,string-length($p)-4))}
</SimpleRule>
' PASSING dbms_data_mining.get_model_details_xml('tree_model')
COLUMNS
"NODE_ID" NUMBER PATH '/SimpleRule/@id',
"DESCRIPTION" VARCHAR2(4000) PATH '/SimpleRule'
);
If you have data mining or analytics tips that you would like to share send me an email at oracledmt@gmail.com.
Posted on April 11, 2006 .