object

c45

C4.5 decision tree learning algorithm. Builds a decision tree from a dataset object implementing the dataset_protocol protocol and provides predicates for exporting the learned tree as a list of predicate clauses or to a file. Supports both discrete and continuous attributes, handles missing values, and supports tree pruning.

Availability:
logtalk_load(c45(loader))
Author: Paulo Moura
Version: 1:0:0
Date: 2026-02-20
Compilation flags:
static, context_switching_calls
Implements:
Uses:
Remarks:
  • Algorithm: C4.5 is an extension of the ID3 algorithm that uses information gain ratio instead of information gain for attribute selection, which avoids bias towards attributes with many values.

  • Discrete attributes: The learned decision tree is represented as leaf(Class) for leaf nodes and tree(Attribute, Subtrees) for internal nodes with discrete attributes, where Subtrees is a list of Value-Subtree pairs.

  • Continuous attributes: For continuous (numeric) attributes, the tree uses binary threshold splits represented as tree(Attribute, threshold(Threshold), LeftSubtree, RightSubtree) where LeftSubtree corresponds to values =< Threshold and RightSubtree to values > Threshold.

  • Missing values: Missing attribute values are represented using anonymous variables. During tree construction, examples with missing values for the split attribute are distributed to all branches. Entropy and gain calculations use only examples with known values for the attribute being evaluated.

  • Tree pruning: The prune/3 and prune/5 predicates implement pessimistic error pruning (PEP), which estimates error rates using the upper confidence bound of the binomial distribution (Wilson score interval) with a configurable confidence factor (default 0.25, range (0.0, 1.0)) and minimum instances per leaf (default 2). Subtrees are replaced with leaf nodes when doing so would not increase the estimated error.

Public predicates

prune/5

Prunes a decision tree using pessimistic error pruning (PEP). This post-pruning method estimates error rates using the upper confidence bound of the binomial distribution with the given confidence factor and replaces subtrees with leaf nodes when doing so would not increase the estimated error. Pruning helps reduce overfitting and can improve generalization to unseen data.

Compilation flags:
static
Template:
prune(Dataset,Tree,ConfidenceFactor,MinInstances,PrunedTree)
Mode and number of proofs:
prune(+object_identifier,+tree,+float,+positive_integer,-tree) - one
Remarks:
  • Confidence factor: The confidence factor controls the aggressiveness of pruning. It must be in the range (0.0, 1.0). Lower values result in more aggressive pruning (smaller, simpler trees), while higher values result in less pruning (larger, more complex trees). The default value is 0.25.

  • Minimum instances per leaf: The minimum number of instances required at a leaf node. When a node has fewer instances than this value, the node may be pruned. It must be a positive integer. The default value is 2.

  • Statistical basis: The pruning uses the upper confidence bound of the binomial distribution to estimate the true error rate.


prune/3

Prunes a decision tree using pessimistic error pruning (PEP) with default parameter values. Calls prune/5 with ConfidenceFactor = 0.25 and MinInstances = 2.

Compilation flags:
static
Template:
prune(Dataset,Tree,PrunedTree)
Mode and number of proofs:
prune(+object_identifier,+tree,-tree) - one
Remarks:
  • Default parameters: Uses the standard C4.5 default values: confidence factor of 0.25 (the confidence level for computing the upper bound of the error estimate) and minimum instances per leaf of 2.


Protected predicates

(no local declarations; see entity ancestors if any)

Private predicates

(no local declarations; see entity ancestors if any)

Operators

(none)