Tek Tok: Ashar Mairaj: 2015

Saturday, September 26, 2015

Statistical Distributions and Customer Analytics -3

What have probability distributions to do with predictions about customers? If we are to predict a customer's chances of buying again, or not buying at all, we have to base it on some knowledge about the customer and we need some way to figure out the chances of our guess being right or completely out of reality. Without getting into details of why or how, we will assume that one of leading predictors of repeat purchase is the history of the customer with the company. That history can be quantified in few ways and translated into few variables. When the customer bought, what did they buy and how many times did they buy, what and how. All the combinations of the above variables form an understanding about the customer and can be useful in making some predictions which in turn can feed into prioritizing. One easy and basic form of analysis is RFM, to score the customers and based on those scores, predict their future value. To learn more about RFM, simple online search will result in countless very informative resources. Some very reasonable actions can be taken just by looking at the RFM scores. All customers, contrary to what we like to believe, are not equally important. All customer expectations, beyond a certain level, are not equal either. Understanding those even in simple measures can be very useful to any organization.

In developing understanding about predictions and modelling, the concept of chance of something happening (in our case the customer buying again and quantifying that contribution) given we know something about them (in our case their purchase history) is basically conditional probability, with well some not so basic mathematical derivations. If we know a mathematical function that takes our purchase history variables as input and gives us the probability of that particular customer behavior (purchase) then we have created a model. We have assigned a numerical chance of occurrence to subsets of occurrences which will form the basis of our future predictions. The nature and derivation of that function can vary but whatever function we arrive to will be able to take different values of quantified customer historical behavior and give particular probabilities to each of those.

For us to figure out if a customer will buy again, and if they do how many or how much they will contribute is a matter of assigning a probability to similar occurrences and then finding the expected purchase value weighted by all the chances or probabilities that we figured out and assigned the mathematical model (or function) to.

So now we have why we need probability distributions...the two probability distributions that we want to discuss for C1, C2 and C3 still await us...and.we still have work to do to understand how nature of a process ties to probability distributions and functions. Learning is a process in itself....and we can definitely call it continuous....

Saturday, August 29, 2015

Customer Analysis Continued -2

For those who are interested in something similar to C1 and C2 in industrial setting, there is another paper by Dr. Schmittlein and Dr. Paterson Customer Base Analysis: An Industrial Purchase process application. I will hereto refer as C3.

Okay, so I want to continue to document how maths helps figure out when a particular customer will make a purchase. We can't really claim we will know when exactly the purchase will occur, but we can claim that we know to reasonable sense of assurance the chance the customer making a purchase given that he has shown certain buying behavior.

That behavior is defined in C1 as:

P(Customer Active|Purchasing Information)

If we are persistent enough we might be able to answer questions like what are the expected number of transactions in certain time period and what will be the probability of those transactions occurring. Finally, if we are brave enough to venture ahead, we will unlock the potential of an individual account in terms of its expected transactions provided we have the purchasing information defined as number of transactions in a given time period, and the time of the last transaction which gives us the recency.

Most companies now a days have past purchases nicely stored in a database. But if we are keeping transactions with Al Capone's book keeper in a general ledger than probably this type of analysis will not really work. But I am sure almost everybody has enough historical data to play with. As long as we can establish a long run transaction rate, get individual customer retention/dropout rates, which basically are function of the time period chosen for analysis, the transactions are independent and there are these heterogeneity in transaction and dropout rates mentioned in C1 and C2. Here the maths starts getting thicker, when we have to assume, that the heterogeneity (the transaction rates are different for different customers as are the dropout rates) follows certain mathematical distributions, gamma in case of C1 and geometric in case of C2.

Now I gawk at these things just much as any sane marketing professional will do. Why make life complicated when some logistic regression with some basic assumptions will give us a reasonable estimate (or something even simpler), and that will work in most cases. But again accuracy and precision are art and connected themes make better sense than disconnected answers to the connected questions (this thought is blatantly stolen from C1). More on this later, but for now we get more into C1, C2 and C3.

The first step in predicting the customer future purchases is to identify the mathematical process for the customer transactions. In practice the customers make purchases just like you and I. Some by impulse, some by need, and some god only knows why, peer pressure, love to shop; economics of behavior, we can go in many directions. In the end, it is a random process, and each purchase (we can assume) is made independent of the next purchase or what was bought earlier. This behavior is equivalent to a poison process.

Next we will try to dive into Gamma distribution and Geometric distribution and try to figure out what the particular distributions mean in the assumptions for solutions presented in C1 and C2.

Sunday, August 9, 2015

Statistical Distributions and Customer Analytics

This blog is more like a diary, sometimes a reference and sometimes just a scratch pad for me to get my head wrapped around few concepts. With that, I disregard all my earlier promises about what I will write, I will delve right into the topics of interest and the complication at hand.

After listening to few lectures of Dr Peter Fader, I had to dig through the articles about customer segmentation and surrounding predictive analysis.

The non-contractual buying customers is my area of interest for now and there are tons of papers on customer segmentation but they all point to one paper that started it all, and it was before its time. Because almost all other papers point to the difficulty in implementation of the theoretical frame work in to more meaningful practice. Mainly because of the computational complexity and the required computing horse power, unavailable at time or limited in availability. We will get into why of that later, but that paper was By Dr David Schmittlein, Donald Morrison, and late Dr. Richard Columbo.

That research paper was Counting Your Customers: Who Are They and What Will They Do Next? (1987)by the Dr. Schmittlein (I will refer to as C1). I will skip the number of papers in between that talk reference the above paper and just focus on a more recent one namely Counting Your Customers’ the Easy Way: An Alternative to the Pareto/NBD Model (by Dr Fader. ) (I will refer as C2).

We might digress a bit into other papers which we might need in order to understand few things in the above two but to understand these papers one has to have some math background to at least enjoy the story being told. The implementation is yet another discussion. But just to have a feeling of why changing the practical story changes the maths behind the story and how it impacts the implementation is complex but interesting.

Each paper points to plethora of other research papers and after going through a few, I have come to the conclusion, that we need to look at Negative binomial distribution and Beta Geometric distribution on the side while we go through these papers. And that will be the story following few blogs, till we reach the Bruce Hardie excel implementation.

Wednesday, June 10, 2015

Internet of things...the end of middle man...social media of machines

I know, I promised some concrete excel and other scripting stuff...but this cannot wait...you may have already figured it out, but it just occurred to me, that the Internet of things will be true end of middle man. Thus we will revert back to the industrial age. Where people who created a truly magnificent product will basically control the selling channels...sort of like apple but even apple sells from distributors. With Internet of things, when the product informs you its at end of its life...it may give you retail choices...but in all earnest..if I am the manufacturer why would I even bother at that point to present other choices...I will simply present myself as the viable choice for repurchase.

The Internet of things will benefit the people who are the innovators, with the products that created their own market. Because they no longer will have to depend on an intermediary. Yes, one can argue that post sales process management is a hassle, but compared to the sharing that must accompany a middle man, I might as well hold that relation ship with my customer myself.

Now imagine the Internet of things combined with the free flow of information and opinion. We have a social media of humans...but the machines can have a social media of their own. Imagine schema.org style of categorization of products. Where certain type of products are talking to each other....Say, I am a machine...I have a status up..that I am old....and I am gray...my battery is end of life...and I need replacement...or a new part...but my owner decides...aaa...I am not going to worry about it...the machine sends messages...and pleas to the owner... to his email..display banners..follow him online...present to him choices to buy the necessary..but the owner ignores....Now...the machine must take the matters in its own hands....and it posts on a "machine" only social forum...."I am in dire need of stuff"...and the other machines...hear it and reach out to their human partners....who reach out to ultimately the human partner of that machine....to help its ...machine partner...AND that will be radically cool...

I will probably not live to see the day...but it will be cool nonetheless....

In the end the product manufacturers can do way with distributors....because they are the ones controlling the "mind" of the machine and they can make it part of the "collective" without having to rely on the middle man....

I think my flight of fancy.....was a nice flight...but I must land now.....

Friday, June 5, 2015

An inch closer than guessing...that's analytics

I have been away some time from writing. If you have had a chance to read any of my previous blogs, I talked a lot about Mobile, some programming stuff for finance and then few random thoughts. Driven by the necessity of the task at hand my interests have changed along the way. But I am starting again, and that is a good thing. A bit scary and a bit work like, but I shall say I have picked few skills along the way which I will share with some practical examples. There is a lot to talk about in world of marketing, data analysis, and business in general.

I am a big fan of open source and forums. While blogging is a nice medium to share ideas and knowledge there is something magical about finding a solution to the problem one is facing. And stack overflow does just that. There is probably a stack over flow for almost all popular software. There are many other sites and forums that give one answers to many a perplexing questions, and I deeply admire the people that lend their subject matter expertise for good of others.

I will not claim such a victory of altruistic ways on my natural self, but I do like the feeling of accomplishment. Achievement is executing on an idea and sharing it. Execution is key here so after much thought I feel compelled to write something that is accompanied by at least some practical snippet. It can be a Google Analytics tip, SPSS modeler recipe, Google Adwords script, R code, SQL based mining, text analytics, practical angle of a theoretical strand, more personal successes and failures with few projects or just plain my feelings about a book I have recently read.

My next blog will be how to use excel for some simple tasks. We will build few simple excel macros and then dive into some more complex stuff like grabbing data from the web and formatting it. In between depending upon my mood we will delve into pivots and slicers, SPSS, R, SQL and mood enhancing customer analytics, cheers!

Tek Tok: Ashar Mairaj

Featured Post

Customer focus is a data imperative