Featured Post

Customer focus is a data imperative

Age of information is really the age of confirmation and it is upon us. Gone are the days of naive customer focus termed as providing the b...

Saturday, August 29, 2015

Customer Analysis Continued -2

For those who are interested in something similar to C1 and C2 in industrial setting, there is another paper by Dr. Schmittlein and Dr. Paterson Customer Base Analysis: An Industrial Purchase process application. I will hereto refer as C3.

Okay, so I want to continue to document how maths helps figure out when a particular customer will make a purchase. We can't really claim we will know when exactly the purchase will occur, but we can claim that we know to reasonable sense of assurance the chance the customer making a purchase given that he has shown certain buying behavior.

That behavior is defined in C1 as:

P(Customer Active|Purchasing Information)

If we are persistent enough we might be able to answer questions like what are the expected number of transactions in certain time period and what will be the probability of those transactions occurring. Finally, if we are brave enough to venture ahead, we will unlock the potential of an individual account in terms of its expected transactions provided we have the purchasing information defined as number of transactions in a given time period, and the time of the last transaction which gives us the recency.

Most companies now a days have past purchases nicely stored in a database.  But if we are keeping transactions with Al Capone's book keeper in a general ledger than probably this type of analysis will not really work. But I am sure almost everybody has enough historical data to play with. As long as we can establish a long run transaction rate, get individual customer retention/dropout rates, which basically are function of the time period chosen for analysis, the transactions are independent and there are these heterogeneity in transaction and dropout rates mentioned in C1 and C2. Here the maths starts getting thicker, when we have to assume, that the heterogeneity (the transaction rates are different for different customers as are the dropout rates) follows certain mathematical distributions, gamma in case of C1 and geometric in case of C2.

Now I gawk at these things just much as any sane marketing professional will do. Why make life complicated when some logistic regression with some basic assumptions will give us a reasonable estimate (or something even simpler), and that will work in most cases. But again accuracy and precision are art and connected themes make better sense than disconnected answers to the connected questions (this thought is blatantly stolen from C1). More on this later, but for now we get more into C1, C2 and C3.

The first step in predicting the customer future purchases is to identify the mathematical process for the customer transactions. In practice the customers make purchases just like you and I. Some by impulse, some by need, and some god only knows why, peer pressure, love to shop; economics of behavior, we can go in many directions. In the end, it is a random process, and each purchase (we can assume) is made independent of the next purchase or what was bought earlier. This behavior is equivalent to a poison process.

Next we will try to dive into Gamma distribution and Geometric distribution and try to figure out what the particular distributions mean in the assumptions for solutions presented in C1 and C2.


Sunday, August 9, 2015

Statistical Distributions and Customer Analytics

This blog is more like a diary, sometimes a reference and sometimes just a scratch pad for me to get my head wrapped around few concepts. With that, I disregard all my earlier promises about what I will write, I will delve right into the topics of interest and the complication at hand.

After listening to few lectures of Dr Peter Fader, I had to dig through the articles about customer segmentation and surrounding predictive analysis.

The non-contractual buying customers is my area of interest for now and there are tons of papers on customer segmentation but they all point to one paper that started it all, and it was before its time. Because almost all other papers point to the difficulty in implementation of the theoretical frame work in to more meaningful practice. Mainly because of the computational complexity and the required computing horse power, unavailable at time or limited in availability. We will get into why of that later, but that paper was By Dr David Schmittlein, Donald Morrison, and late Dr. Richard Columbo.

That research paper was Counting Your Customers: Who Are They and What Will They Do Next? (1987)by the Dr. Schmittlein (I will refer to as C1). I will skip the number of papers in between that talk reference the above paper and just focus on a more recent one namely Counting Your Customers’ the Easy Way: An Alternative to the Pareto/NBD Model (by Dr Fader. ) (I will refer as C2).

We might digress a bit into other papers which we might need in order to understand few things in the above two but to understand these papers one has to have some math background to at least enjoy the story being told. The implementation is yet another discussion. But just to have a feeling of why changing the practical story changes the maths behind the story and how it impacts the implementation is complex but interesting.

Each paper points to plethora of other research papers and after going through a few, I have come to the conclusion, that we need to look at Negative binomial distribution and Beta Geometric distribution on the side while we go through these papers. And that will be the story following few blogs, till we reach the Bruce Hardie excel implementation.