Mixed Integer Linear Optimization Formulations for Learning Optimal Binary Classification Trees

Date
2021-11-10
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Decision trees are powerful tools for classification and regression that attract many researchers working in the burgeoning area of machine learning. A classification decision tree has two types of vertices: (i) branching vertices at which datapoints are tested on a selection of discrete features, and (ii) leaf vertices at which datapoints are assigned classes. An optimal binary classification tree is a special type of classification tree in which each branching vertex has exactly two children and can be obtained by solving a biobjective mixed integer linear optimization problem that seeks to minimize the (i) number of misclassified datapoints and (ii) number of branching vertices. In this thesis we present two new multicommodity flow formulations and a new cut-based formulation to learn such optimal binary classification trees. We then provide a comparison of the formulations' strength, valid inequalities to strengthen all formulations, and accompanying computational results.

Description
Degree
Master of Arts
Type
Thesis
Keywords
MILO, classification, decision trees, mixed integer programming, machine learning
Citation

Alston, Brandon. "Mixed Integer Linear Optimization Formulations for Learning Optimal Binary Classification Trees." (2021) Master’s Thesis, Rice University. https://hdl.handle.net/1911/113687.

Has part(s)
Forms part of
Published Version
Rights
Copyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.
Link to license
Citable link to this page