The Analytics 2 student teams presented some interesting projects this semester:
- Team Energy partnered with ExxonMobil to develop a refinery crack spread model using SVM regression with kernel variations.
- Team Hacker placed in the top quartile of a Kaggle competition with a creative dimension reduction strategy.
They did a great job! (model development was primarily in RStudio / R Server).
I’d like to highlight Team Audit, who built a model in Azure Machine Learning that selects transactions from the GL and billing subledger (SQL Server tables with hundreds of thousands of transactions), and applies algorithms to test transactions for integrity (this approach was inspired by the EY forensics technology group, who came down from Charlotte to meet with us during the semester – thanks Scott and Atul!).
The model reads the free text (descriptions, comments, etc.) in transactions and uses that data and other dimensions to predict account classification – the idea being that descriptive text can reveal the original intent of a transaction. It also tests the transaction value against predicted value. Then, the model brings all of this together and tests for exceptions. Out of the hundreds of thousands of transactions, the exceptions totaled less than 20:
This is good performance for text analytics and multi-classification, and the exceptions included every one of the test transactions that I planted (unknown to the students). I won’t go into the details of parameter tuning, but just to drill down a bit on Model Components:
- Free text data are run in parallel through N-Gram Extraction and Latent Dirichlet Allocation (LDA creates synthetic topics in clusters of words), and then merged before applying a Multiclass Decision Forest (the team found that merging n-grams with topics considerably improved classification accuracy. This is a good practice that I used at JPM ).
- The predicted classifications, along with all the original dimensions, are then run through to a Boosted Decision Tree Regression which predicts the transaction value.
- Finally, decision rules (R-Script Module) test classification and regression variances, and create the exceptions (final output to csv).
The final project presentations class is my favorite part of the semester – lots of fun. I thought the presentations were very professional – ready for prime time. We are very grateful for the partnership and participation of ExxonMobil, EY, Microsoft, and 2DA Analytics.
As we move forward, we will be focusing more on ERP / accounting scenarios, as a good portion of the students are MS-Accountancy (the rest are MBA and ME). Looking forward to a transfer pricing / tax optimization project with EY tax technology in the spring.