object
c45
C4.5 decision tree learning algorithm. Builds a decision tree from a dataset object implementing the dataset_protocol protocol and provides predicates for exporting the learned tree as a list of predicate clauses or to a file. Supports both discrete and continuous attributes, handles missing values, and supports tree pruning.
logtalk_load(c45(loader))static, context_switching_calls
Algorithm: C4.5 is an extension of the ID3 algorithm that uses information gain ratio instead of information gain for attribute selection, which avoids bias towards attributes with many values.
Discrete attributes: The learned decision tree is represented as
leaf(Class)for leaf nodes andtree(Attribute, Subtrees)for internal nodes with discrete attributes, whereSubtreesis a list ofValue-Subtreepairs.Continuous attributes: For continuous (numeric) attributes, the tree uses binary threshold splits represented as
tree(Attribute, threshold(Threshold), LeftSubtree, RightSubtree)whereLeftSubtreecorresponds to values=< ThresholdandRightSubtreeto values> Threshold.Missing values: Missing attribute values are represented using anonymous variables. During tree construction, examples with missing values for the split attribute are distributed to all branches. Entropy and gain calculations use only examples with known values for the attribute being evaluated.
Tree pruning: The
prune/3andprune/5predicates implement pessimistic error pruning (PEP), which estimates error rates using the upper confidence bound of the binomial distribution (Wilson score interval) with a configurable confidence factor (default 0.25, range(0.0, 1.0)) and minimum instances per leaf (default 2). Subtrees are replaced with leaf nodes when doing so would not increase the estimated error.
Public predicates
prune/5
Prunes a decision tree using pessimistic error pruning (PEP). This post-pruning method estimates error rates using the upper confidence bound of the binomial distribution with the given confidence factor and replaces subtrees with leaf nodes when doing so would not increase the estimated error. Pruning helps reduce overfitting and can improve generalization to unseen data.
staticprune(Dataset,Tree,ConfidenceFactor,MinInstances,PrunedTree)prune(+object_identifier,+tree,+float,+positive_integer,-tree) - one
Confidence factor: The confidence factor controls the aggressiveness of pruning. It must be in the range
(0.0, 1.0). Lower values result in more aggressive pruning (smaller, simpler trees), while higher values result in less pruning (larger, more complex trees). The default value is0.25.Minimum instances per leaf: The minimum number of instances required at a leaf node. When a node has fewer instances than this value, the node may be pruned. It must be a positive integer. The default value is
2.Statistical basis: The pruning uses the upper confidence bound of the binomial distribution to estimate the true error rate.
prune/3
Prunes a decision tree using pessimistic error pruning (PEP) with default parameter values. Calls prune/5 with ConfidenceFactor = 0.25 and MinInstances = 2.
staticprune(Dataset,Tree,PrunedTree)prune(+object_identifier,+tree,-tree) - one
Default parameters: Uses the standard C4.5 default values: confidence factor of
0.25(the confidence level for computing the upper bound of the error estimate) and minimum instances per leaf of2.
Protected predicates
(no local declarations; see entity ancestors if any)
Private predicates
(no local declarations; see entity ancestors if any)
Operators
(none)
See also
dataset_protocol, isolation_forest, knn, naive_bayes, nearest_centroid, random_forest, ada_boost