object

isolation_forest

Extended Isolation Forest (EIF) algorithm for anomaly detection. Implements the improved version described by Hariri et al. (2019) that uses random hyperplane cuts instead of axis-aligned cuts, eliminating score bias artifacts. Builds an ensemble of isolation trees from a dataset object implementing the dataset_protocol protocol. Missing attribute values are represented using anonymous variables.

Availability:
logtalk_load(isolation_forest(loader))
Author: Paulo Moura
Version: 1:0:0
Date: 2026-02-20
Compilation flags:
static, context_switching_calls
Remarks:
  • Algorithm: The Extended Isolation Forest builds an ensemble of isolation trees (iTrees) by recursively partitioning the data using random hyperplanes. Anomalous points, being few and different, require fewer partitions (shorter path lengths) to be isolated.

  • Extended vs Original: The original Isolation Forest uses axis-aligned splits (random attribute + random value), which introduces bias in anomaly scores along coordinate axes. The extended version uses random hyperplane cuts with arbitrary slopes, producing more consistent and reliable anomaly scores.

  • Extension level: The extension level controls the dimensionality of the random hyperplane cuts. Level 0 corresponds to the original axis-aligned Isolation Forest. The default level is d - 1 (fully extended) where d is the number of dimensions.

  • Prediction: The predict/3 predicate returns anomaly if the anomaly score is above the threshold (default: 0.5) and normal otherwise. The score_all/3 predicate returns a sorted list of all instances with their corresponding scores and class labels. Predictions use by default the learned model options but can override them using the anomaly_threshold/1 option.

  • Anomaly score: The anomaly score s(x) is computed as s(x) = 2^(-E(h(x))/c(psi)) where E(h(x)) is the average path length across all trees, c(psi) is the average path length of unsuccessful searches in a BST, and psi is the subsample size. Scores close to 1 indicate anomalies; scores below 0.5 indicate normal points.

  • Discrete attributes: Discrete (categorical) attributes are mapped to numeric indices based on their position in the attribute value list declared by the dataset. This allows the algorithm to handle datasets with mixed attribute types.

  • Missing values: Missing attribute values are represented using anonymous variables. During tree construction, missing values are replaced with random values drawn from the observed range of the corresponding attribute. During scoring, instances with missing values are sent down both branches of the tree and the path length is computed as the weighted average of the two branches.

  • Classifier representation: The learned model is represented as an if_model(Trees, SubsampleSize, AttributeNames, Attributes, Ranges, Options) compound term.

Public predicates

learn/3

Learns an isolation forest model from the given dataset object using the specified options. Valid options are number_of_trees/1 (default: 100), subsample_size/1 (default: 256 or the number of instances if smaller), extension_level/1 (default: d - 1 where d is the number of dimensions), and anomaly_threshold/1 (default: 0.5).

Compilation flags:
static
Template:
learn(Dataset,Model,Options)
Mode and number of proofs:
learn(+object_identifier,-compound,+list(compound)) - one

predict/4

Predicts whether an instance is an anomaly or normal using the learned model and the anomaly threshold with the given options. The instance is a list of Attribute-Value pairs where missing values are represented using anonymous variables. Returns anomaly if the anomaly score is above the threshold, normal otherwise.

Compilation flags:
static
Template:
predict(Model,Instance,Prediction,Options)
Mode and number of proofs:
predict(+compound,+list,-atom,+list(compound)) - one

score/3

Computes the anomaly score for a given instance using the learned model. The instance is a list of Attribute-Value pairs where missing values are represented using anonymous variables. The score is in the range [0.0, 1.0]. Scores close to 1.0 indicate anomalies. Scores close to 0.5 or below indicate normal instances.

Compilation flags:
static
Template:
score(Model,Instance,Score)
Mode and number of proofs:
score(+compound,+list,-float) - one

score_all/3

Computes the anomaly scores for all instances in the dataset. Returns a list of Id-Class-Score triples sorted by descending anomaly score.

Compilation flags:
static
Template:
score_all(Dataset,Model,Scores)
Mode and number of proofs:
score_all(+object_identifier,+compound,-list) - one

Protected predicates

(no local declarations; see entity ancestors if any)

Private predicates

(no local declarations; see entity ancestors if any)

Operators

(none)