Building Predictive Algorithms from Scratch in Excel in Minutes with Copilot
Use Copilot to build advanced predictive analytics directly into Excel
Five years ago, I first started learning Python so that I could start doing more robust data analysis. I suck at learning foreign languages, and learning the various data languages in Python was no different.
Today, I know enough to be dangerous, but I’m far from an expert. Luckily, I don’t have to be.
Getting Started with Python in Excel
First, let’s open up the Copilot pane in excel. I entered the following prompt:
I am using python in excel to build a linear regression model that will predict Receipts (located in D15#)
A15# (AC / FC) denotes whether the payment week (B15#) is a week that has occurred (Actuals: AC) or is a forecast period (Forecast: FC)
The independent variables are Shifted Sales (E15#), Avg Unit Price (F15#), and DSO (G15#)
write a python script that i can enter into cell K15 that will produce a forecasted value for receipts for all payment weeks
Here are a few things to keep in mind:
Note the context I’m providing - I’m explicitly telling Copilot that I want a linear regression model
There are a whole bunch of ways to build a multiple linear regression model in python, but because I did not specify, Copilot is going to take the liberty of determining how to deploy the model
I am very specific about what fields should be used and where those fields are located in the spreadsheet
I then am specific about which fields should be used as independent variables
I am also specific about what I want the output to be (forecasted value)
With this, Copilot will produce a new tab called “Analysis1” and walks through its analysis steps. Each response that AI makes is unique so you may not get exactly what I got, but it will be similar.
Python Steps
Note: It’s easy to spot python code in Excel, just look for the [PY] notation
Loads the data from our tab as a dataframe
Filters for Actual data and trains a Linear Regression Model to predict receipts
But something looks wrong. The preview of the results show a series of -8003.1. That’s not right.
What happened? Well, if we look at our data, the predictive model is using Shifted Sales, Avg Unit Price, and DSO to predict receipts. The problem is we have a ton of zeros in those forecast periods
Refining our Prompt
Let’s retry our prompt and account for this nuance:
I am using python in excel to build a linear regression model that will predict Receipts (located in D15#)
A15# (AC / FC) denotes whether the payment week (B15#) is a week that has occurred (Actuals: AC) or is a forecast period (Forecast: FC)
The independent variables are Shifted Sales (E15#), Avg Unit Price (F15#), and DSO (G15#)
write a python script that i can enter into cell K15 that will produce a forecasted value for receipts for all payment weeks
actual independent variables (E15#, F15#, and G15#) should be used to predict weeks denoted as "AC" while forecasted independent variables (H15#, I15#, and J15#) should be used to predict weeks denoted as "FC"
This time Copilot just repeats step two and retrains the model and produces new outputs with our forecasted values. These results look much better.
How do we Integrate with the rest of our Data?
Now, all we have to do is make a few finishing tweaks to the code and move it to where we want the predicted values to show up:
Initialize a py= formula
Combine the individual scripts into one single script by copying the script from A6 into your new python formula and repeating this with the script in A24
Change the output from a DataFrame to Excel Values
Let’s remove the Date Field
Looks great!
Now let’s put it into our main dataset but copying the script in the formula bar, selecting where we want the script to output, and entering in the code. Remember to set the output to excel values.
That’s it! In less than 15 mins, you’ve deployed a predictive model that took me five years to learn on my own.