Notes on Inductive Logic Programming
Reading notes on the ILP literature: FOIL, PROGOL, Metagol, FOLD-RM, and the persistent challenge of scalability. With observations on why ILP keeps returning to the foreground despite the neural wave.
These are notes accumulated while working through the ILP literature for the FOLD-RM research project. They are not a survey — for that, see Cropper and Dumančić (2022) — but rather a record of what I found interesting or surprising.
Why ILP Keeps Coming Back
The history of ILP is a series of rediscoveries. The core insight — that logic programs are a good hypothesis language because they are compositional, interpretable, and support generalisation via logical entailment — has been demonstrated repeatedly, and repeatedly overshadowed by systems that work better on standard benchmarks because the benchmarks reward interpolation over extrapolation.
The current resurgence is driven by two things: the interpretability crisis in deep learning (ILP systems are, by construction, interpretable) and the low-data learning problem (ILP systems can generalise from very few examples when the hypothesis language is appropriately constrained).
FOIL → PROGOL → Metagol
FOIL (Quinlan, 1990) is the ur-algorithm: greedy top-down rule induction, covering algorithm, information gain as the evaluation function. Simple, efficient, brittle in the face of noise. The key insight is the use of mode declarations to constrain the hypothesis space.
PROGOL introduces inverse entailment: instead of building clauses bottom-up from examples, compute a most-specific clause that is entailed by the background knowledge and the positive examples, then search upward in the generalisation space. This is a major step forward for handling background knowledge but requires the background theory to be relatively complete, which is rarely the case in practice.
Metagol adds meta-interpretive learning: the system learns Prolog programs by searching for proofs using meta-rules — higher-order program schemas — that constrain the shape of learned programs. Metagol can learn recursive programs, which FOIL and PROGOL cannot do reliably. The trade-off is that the meta-rules must be specified by the user, introducing a new knowledge engineering burden.
FOLD-RM
FOLD-RM (Shakerin and Gupta, 2021) takes a different approach: answer set programming rather than Prolog, and a greedy-with-backtracking search that handles exceptions explicitly. The key innovation for practical use is the treatment of default rules with exceptions — instead of trying to learn a monolithic rule that covers all positive examples, FOLD-RM learns a default with a possibly-empty set of exception clauses. This makes it much more tolerant of noise and inconsistency in training data.
For knowledge graph applications, the most relevant feature is the scalability. FOLD-RM handles much larger hypothesis spaces than classical ILP algorithms without requiring the entire training set to fit in memory, which is essential when the “examples” are triples in a graph with millions of edges.
The Scalability Ceiling
The persistent limitation of ILP is quadratic or worse scaling with the number of training examples and the depth of the hypothesis. For graphs, this translates to: ILP is practical for learning rules of depth 2–3 (two or three hops in the graph) but becomes computationally intractable for deeper rules. Since many interesting knowledge graph relations require longer chains of inference, this is a real limitation.
Current work on combining neural embeddings with ILP search (neural-guided ILP, NTP, αNLI) tries to address this by using learned representations to prune the search space. The results are promising but the systems are complex. Whether the complexity is worth the payoff depends heavily on the application.