Bulletin No. 1, 2021

... ... The sorting tree The computer takes random samples from the emission data and feeds them through what is called a decision tree , where they are sorted into two groups by, say, the air temperatures at which the emissions occurred. The sorting is repeated with another biophysical variable to further organize the emission data. Planting a forest The computer takes another batch of random samples from the emission data and makes another decision tree. This is repeated until there are many of these trees, hence a random ‘forest’. A single decision tree looks at just one iteration of the data and derives a pattern specific to that iteration, which may be too specific. In AI lingo, this is called overfitting . With an ensemble of trees, this overfitting averages out. Finding the culprit Now that a pattern emerges from the emission data, we can find out how much each biophysical variable has contributed to putting the data in order and bringing out the pattern using a metric known as the mean decrease in impurity . The variable that contributes the most or, in other words, has the greatest effect in shaping the data is precisely the most significant driver of the emissions. A whiter box Unlike many AI models, which have gained infamy for being black boxes, a model based on the random forest algorithm has a reasonably straightforward and transparent decision-making process. This is what allows us to assess relatively easily the extents to which individual variables determine the emission pattern. 2 3 4 methane emission low high air temperature air temperature ≥ x air temperature < x salinity THE NEW GOSPEL ACCORDING TO A.I. 37