The Spark community has actively improved the decision tree code since then. Apache Spark is an ideal platform for a scalable distributed decision tree implementation since Spark's in-memory computing allows us to efficiently perform multiple passes over the training dataset.Ībout a year ago, open-source developers joined forces to come up with a fast distributed decision tree implementation that has been a part of the Spark MLlib library since release 1.0. However, most are designed for single-machine computation and seldom scale elegantly to a distributed setting. ![]() ![]() Decision trees are easy to interpret, handle categorical and continuous features, extend to multi-class classification, do not require feature scaling and are able to capture non-linearities and feature interactions.ĭue to their popularity, almost every machine learning library provides an implementation of the decision tree algorithm. ![]() Origami Logic provides a Marketing Intelligence Platform that uses Apache Spark for heavy lifting analytics work on the backend.ĭecision trees and their ensembles are industry workhorses for the machine learning tasks of classification and regression. ![]() This is a post written together with one of our friends at Origami Logic.
0 Comments
Leave a Reply. |