Fall 2017 Analytics Team Projects


, , , , , , , , , , ,

The Analytics 2 student teams presented some interesting projects this semester:

  • Team Energy partnered with ExxonMobil to develop a refinery crack spread model using SVM regression with kernel variations.
  • Team Hacker placed in the top quartile of a Kaggle competition with a creative dimension reduction strategy.

They did a great job! (model development was primarily in RStudio / R Server).

I’d like to highlight Team Audit, who built a model in Azure Machine Learning that selects transactions from the GL and billing subledger (SQL Server tables with hundreds of thousands of transactions), and applies algorithms to test transactions for integrity (this approach was inspired by the EY forensics technology group, who came down from Charlotte to meet with us during the semester – thanks Scott and Atul!).

The model reads the free text (descriptions, comments, etc.) in transactions and uses that data and other dimensions to predict account classification – the idea being that descriptive text can reveal the original intent of a transaction. It also tests the transaction value against predicted value. Then, the model brings all of this together and tests for exceptions. Out of the hundreds of thousands of transactions, the exceptions totaled less than 20:

GL MC Matrix

This is good performance for text analytics and multi-classification, and the exceptions included every one of the test transactions that I planted (unknown to the students). I won’t go into the details of parameter tuning, but just to drill down a bit on Model Components:

Team Projects

  • Free text data are run in parallel through N-Gram Extraction and Latent Dirichlet Allocation (LDA creates synthetic topics in clusters of words), and then merged before applying a Multiclass Decision Forest (the team found that merging n-grams with topics considerably improved classification accuracy. This is a good practice that I used at JPM ).
  • The predicted classifications, along with all the original dimensions, are then run through to a Boosted Decision Tree Regression which predicts the transaction value.
  • Finally, decision rules (R-Script Module) test classification and regression variances, and create the exceptions (final output to csv).

The final project presentations class is my favorite part of the semester – lots of fun. I thought the presentations were very professional – ready for prime time. We are very grateful for the partnership and participation of ExxonMobil, EY, Microsoft, and 2DA Analytics.

As we move forward, we will be focusing more on ERP / accounting scenarios, as a good portion of the students are MS-Accountancy (the rest are MBA and ME).  Looking forward to a transfer pricing / tax optimization project with EY tax technology in the spring.

Analytics in Pricing and Proposal Management


, , , ,

pricingAs an Industry Architect at Microsoft, I worked on a number of projects to improve pricing and proposal management for some great companies, which created a lot of value (small price improvements on the top line become large improvements on the bottom line). We used a business rules engine to imbed logic in pricing workflows. That approach was good at the time (circa 2000), and definitely superior to using SAP conditions or custom code – but it had its challenges with complexity and dynamics (here’s one example).

Today, analytics platforms, like Azure Machine Learning (AML), have changed the pricing game. Algorithms can comprehend the dimensions of pricing in a way that is not humanly possible. And they are truly dynamic (continuous learning can be implemented). This is a huge leap forward for companies with complex and dynamic pricing, and most definitely a source of competitive advantage (an overloaded term, but justified here).

The images above show AML algorithms embedded in a proposal management workflow site (K2 workflow) and integrated with pricing strategy models in Excel, so pricing strategy is consistent from planning to execution (the circled section on the site displays recommended price, and probability of winning the deal, to help people assess opportunities as they log RFPs and decide on resource commitments and approval routings in RFP responses).

Note: This solution architecture can be applied to an array of scenarios (e.g.,  S&OP, supply chain, HR recruitment and onboarding…)

UH / Bauer Analytics Lab


, , , , , , , ,


The UH Analytics Lab is a great resource for the students, and serves as a reference architecture for companies building an analytics competency. And it’s not expensive (unless you start spinning up clusters all the time – seldom needed).

We start with RStudio/RServer for data acquisition, structuring and description (and application of CRAN algorithms). We extend to Azure Machine Learning for application with MS Research algorithms. For data storage, we have SQL Server (database and analysis services) and Data LakeSpark for heavy crunching. We also have Dynamics ERP (accounting and business topics are the focus here). Presentation tools include Excel, Tableau and Power BI. Overall, a good end-to-end experience for students and business community partners (student projects can serve as POCs – so benefits work both ways).

Analytics Adoption – Internal Resources




I discussed barriers to Analytics adoption in a recent post, noting that both McKinsey and Gartner cite finding talent as the primary barrier, with Gartner suggesting that companies train existing staff into “citizen data scientists”.  I like these recommendations, and would like to add some learnings:

  1. Divide the “Data Scientist” role into “Data Acquisition and Integration (DAI)”, and “Algorithm Application” roles and Give the DAI role to Business Intelligence (BI) architects. The “big data” technologies (e.g., data clusters and factories) are quickly absorbed by data architects – in fact, most are thrilled with what they discover (solves a lot of historical frustration). They have experience with integration (from years of ETL), and can understand the security and scalability issues necessary to build robust data pipelines. BI is a mature capability – it’s been around for 20 years (Kimball’s “Data Warehousing Toolkit” was published in 1996 – my how time flies!). And it’s still evolving with a lot of new releases in 2016. In fact, BI serves as a platform for analytics – it really needs to be in place before companies start building an analytics competency (e.g., analytics algorithms consume and create new data which are often persisted in BI stores).

Keep in mind though – not every Analytics project needs “big data” technology – from my experience, it’s often not needed. And I wouldn’t recommend scoping it into the first few projects unless there’s a compelling business case.

  1. Then focus on filling the “Algorithm Application” role. Let’s call this “Analytics Analyst”, and the role can be filled from anywhere, but BI Analysts are a good place to start. It’s important though, to recognize that the roles are very different. BI Analysts build reports, dashboards, and functional models, but not mathematical models (and just to level-set here: Power BI and Tableau are not analytics tools – they’re BI tools). So, BI Analysts will need some math (light calculus and linear algebra) and machine learning (a couple of courses), but with the right mindset a BI Analyst can get there. People in Engineering roles are also a good place to look – many are already using algorithms, understand the math, and the mind shift isn’t as great. People in Accounting and Audit too – they know the corporate data at a business level, and do forensic and abstract modeling. The nice thing about filling the role internally is the domain knowledge of the business is already in place –  no small issue. Just open up a req and see what happens…

Filling that key role can get you started. There are caveats:

  • Start Small. If possible, start with small projects (e.g.  deeper analysis of corporate transactions and forecasting). If the vision is larger and urgent, then you’ll need to bring in consultants. But in that case, the roles discussed above still need to be filled so competencies can be transfered, and continue.
  • Mindset. The work really does require a different mindset – for management and analysts. Analytics is about building a deeper understanding, and it often requires long periods of continuous, cumulative thought (which gets lost if interruptions occur). And forensic curiosity – what Julia calls the “scout mentality” (this is may be a little tangential, but I really love this ).

So, most companies should be able to get Analytics started up using internal resources, and build from there.  When to tackle the larger, more complex projects depends on the business case.

Analytics Adoption – Partnering with Academia


, , , ,

The 2016 McKinsemckinseyy report: The Age of Analytics: Competing in a Data-Driven Worldis an excellent piece of research. My take-away is: Analytics can create substantial value, but only a fraction has been realized due to organizational resistance and lack of talent. The imperative for companies to overcome that resistance is an “ever widening gap between leaders and laggards”, where “leaders are staking out large advantages”.

Change management is always a challenge in theses types of transformations, but the incentives (competition and investor expectations) are there, and skepticism can be overcome with education and pilots (light bulbs go off when people see it working on their own turf). Cost should not be a barrier either – most is “pay-as-you-go”.

So, with those barriers down, the last one is talent. McKinsey describes analytics as a merger of four broad types of roles: data architects, data engineers, data scientists, and business translators. The business translator role (combining domain and technical knowledge) is considered key (McKinsey is projecting a demand of 2m-4m for these people over the next decade, which should be good news to my students). Gartner also wrote about the talent issue in June [1], recommending that companies: Train existing staff into “citizen data scientists” (see my post), and partner with academia.

We have bright, passionate students in Analytics courses at Bauer. We also have an Analytics lab (see my post), which can be used for POCs. We research and share best practices, and we can help with education and visioning. So, we think partnering with academia is a brilliant way to jump start analytics – we’re here to help. Please contact me if you’d like to explore.

[1]G00294588 “Doing Machine Learning Without Hiring Data Scientists” Published: 20 June 2016

Planning and Analysis Platform – Tabular Model


, , , , ,

As someone who managed the planning function for a large corporation, the efficiency and effectiveness of analysts’ work is near and dear to my heart.  A good planning function is supported by a platform that promotes data consistency across many perspectives, while promoting an agile (and creative) modeling experience with integration of existing models and powerful functional tools (e.g., analytics).

That was the goal of the data warehouse, but the complexity and rigidity of the multidimensional model limited agility. So, these environments evolve ad-hoc, with each analyst creating their own model based on unique logic, and links to other spreadsheets (also with unique logic). The result is a very fragile and unmanageable environment (what many call “Excel Hell”):


Companies need a way to get the analysts more intimately involved in the design of the underlying data models, and drastically reduce the design cycle. And they need to give analysts the robust functions that database systems offer, instead of all those “toy functions” (e.g., vlookups). Analysts need more agility and power.

These are among the reasons that Microsoft introduced the Tabular Model into SQL Analysis Services. This model produces a managed analysis environment with consistent data, shared logic, and relational database functionality extended to Excel:


Think of it as PowerPivot implemented across the whole environment (both the desktops and the servers), coupled with a new spreadsheet language and real functions. An analyst can build core model / logic in PowerPivot, and push that down to the server where it can be shared and extended. And its all managed with the same scalability, reliability and security as a traditional data warehouse (it’s still SQL Analysis Services).

Implementation can be incremental – realizing continuous improvement for low risk and low cost (you probably already own this).  Worth considering if your analysis platform is not meeting the needs of the business.

Conceptual Framework for Forecasting


, , , ,

value-chainGood forecasts are critical to planning, and it surprises me how many are simple time series regression models. Time is a dimension, but it’s not a driver – sales don’t just happen because the sun comes up! I also see a lot of forecasts based on environmental variables alone, like industry or commodity forecasts. Correlations between industry indicators and enterprise level value chains are usually weak, and so the forecasts are also weak as a basis for major capital or operating investment.

I tell my students that forecasting is the process of projecting transactions into another timeframe (one of our class exercises is building forecasts based on transactional sales scoring). The most insightful and resilient models are often an ensemble of industry and transactional models. Backlog and pipeline are excellent sources of transactional data (those would fall in the RD1 – revenue driver group 1 above). But that’s not enough, and you can usually pursue transactional data with predictive quality further up the chain. RD2 describes the next stage e.g., your customer’s orders and specific demand drivers in your market. RD3 would be your customer’s customers orders, and drivers in that market. The same approach works on the CD (cost driver) side.

The most obvious benefit to this approach is higher quality forecasts. But there are other benefits: the exercise of meeting with customers, and customer’s customers, builds relationships, understanding, and shared purpose. We all have a tendency to be insular. But the answers may not be in the boardroom and spreadsheets – sometimes, we have to get outside and seek the drivers.

Analytics in Audit and Tax


, , , ,

Most of my students are going to work for large accounting and consulting firms. Beyond consulting, where analytics is becoming a core service offering, analytics is emerging in the audit and tax practices. Here’s a good discussion from the ICAEW: http://www.icaew.com/-/media/corporate/files/technical/iaa/tecpln14726-iaae-data-analytics—web-version.ashx

This is a very dynamic area. Recent discussions I’m having center around the concept of a continuous audit, where big data + analytics provide constant, automated monitoring of transactions. The potential to improve assurance efficiency, as well as improve overall enterprise data quality (which impacts predictive analytics down the line), is substantial. Very exciting area!

Tableau and R


, , ,


I discussed integration of Excel with Azure Machine Learning in another post (https://econolytics.org/2016/11/19/pricing-and-proposal-management/ ). I should also mention that Tableau integrates with R, (and more importantly for real world applications, AML or AWS through web services), so plenty of options for Analytics in Tableau. In the image above, I’m showing how Tableau calls RStudio from a Calculated Field (predicting prices using a multivariate support vector machine).  For more on this, take a look at Bora Beran’s blog  (https://boraberan.wordpress.com/). Bora is a Microsoft Alum, now with Tableau – good stuff. Tableau and Excel (with PowerPivot and PowerBI) are roughly equivalent, but neither is an analytics tool (I realize that Tableau has a tab titled “Analytics”, but a few regression, and one clustering algorithm does not make an analytics tool).

For complex analysis of large datasets, it’s more productive to work within an R environment (which have strong visualization). The R CRAN library has thousands of packages that can be integrated in infinite ways. Better to start in the room without walls – add graphics later.