When it comes to Nutella, even if we’re counting the calories we all tend to close an eye! — this isn’t strange right, I mean, who wouldn’t?! — but what’s strange is the ability of Nutella to influence the sale of a salty product. THAT SURPRISED ME!

Before I tell you which was the product that was influenced by Nutella, let me tell you a story.


A couple of years ago in America, a supermarket named Osco (Jewel Osco) discovered a very strange buying behavior of their customers. They discovered that people who bought beers also bought diapers! Yes, I was caught off guard by this too! I mean, what on earth have beers and diapers in-common (unless you’re blind drunk). They found this pattern buy checking loyalty card transactional data, and found that people usually are buying these items on Friday after 17:00 PM; after they leave the office. So here is the correlation, Friday is known for hanging out, but since they have babies at home, they can’t go out, so they choose to buy beers and stay at home.

It would be expected that the store moved beers and diapers close together in order for their sales to increase, which is not true! — they never did that. But, what usually big companies (like Walmart, Osco, Coop, LIDL and others) do is they watch their customer’s behavior and arrange their store in such a way to make them subconsciously buy more.


Since every culture is different, I wanted to see what kind of patterns we would get if I could do a similar analysis to some of the biggest supermarkets in my town. After a bunch of NDA-s, I finally got the data from one of the largest supermarkets in Kosovo.

That’s all great but it had some issues! — some records were invalid, and for about 8% of them had missing values in key attributes. WHAT TO DO! — hmm, let’s see, research time!


The quality of the data dictates the quality of the results, so in order that these data to be useful, we had to pre-process them. Data pre-processing is one of the most important phases in the process of ‘Knowledge Discovery in Data’.

First, we had to check if the missing-ness of the data had a pattern behind like for example when people refuse to tell their age or their salary.

Basically, there are three ways in which data can be missed.

· Missing Completely at Random (MCAR)

· Missing at Random (MAR)

· Missing not at Random (MNAR)

I tried to classify the other data that are complete using these algorithms: KNN (K-Nearest Neighbors), Random Forest and Naïve Bayes.

We split the data using the 70–20–10 technique, 70% for training, 20% for testing and 10% for evaluation, and the result was very disappointed since the algorithms classified most of them incorrectly.

So, after researching some more, I decided to impute with ‘Case Deletion’ for two reasons:

1. The records are reduced by 8% so it’s not much compared to 92% that’s left

2. I want to be sure that my model is not biased for applying the algorithm


After pre-processing, I had to do one little thing. I had to transform the data to be suited for the algorithm

Since everything is the setup I applied a technique called ‘Association Rule Mining’.

Standing there with Classification, Regression, and Clustering, ARM is one of the most used techniques in Data Mining.

The responsibility of ARM is to generate rules or patterns is to find interesting relations in huge datasets. These relations are called rules and basically are found by using only these three Math formulas, Support, Confidence and Lift.

· Support: tells what’s the probability that X and Y to coexist in the same transaction

· Confidence: tells what’s the probability that in a transaction that has X, to have also Y, and

· Lift: Support divided by Confidence

By using this technique, we can identify the most frequent items that are bought together by our store customers.

There are a lot of algorithms out there that deal with Association Rule Mining but the foundation of them and the easiest to understand is APRIORI.


APRIORI states that if an item-set is frequent than all its subsets must be frequent, consequently if an item-set is not frequent all its supersets can’t be frequent. So to reduce the search-space it uses a technique that’s called support-based pruning


A lot of rules were generated. Some of them were really un-expected especially the one related to Nutella.

Here are some:

And remember where we started?! Those who bought sweet and yummy Nutella, they almost for certain bought puffed, peanut and salty snack!


I personally believe that everything that we do defines us, the way we eat, the way we sleep, the way we wake-up, the way we talk and all other countless things that we do daily. These are the things that make me ME, and YOU!

The Marketing ecosystem is more complex than ever, and it is almost impossible to be managed only with gut feeling. The ability of a salesperson in increasing sales is extremely related to the ability to identify the prospect’s (read customer’s) buying behavior.

Based on my findings these results can be extremely useful in these following areas:

1. Pricing Strategy

2. Product Placement and Store Internal Design

3. Identifying hottest spots in the store

4. Identifying items that influence the sale of other items (like in our case, Nutella)

5. Identifying trends


More and more organizations are discovering ways of using Association Rule Mining to gain useful insights into associations and hidden relationships that are not obvious with the naked eye.

Despite its popularity as a retailer’s computational technique, Association Rule Mining is applicable in many other areas with increasing usage.

For instance:

· FinTech — The Financial Regulation News data said that the banking industry lost $2.2 billion in fraud losses in 2016, 58% of which were related to debit card fraud. The good thing is that now more and more companies are using Association Rule Mining to detect and identify suspicious transactions in real-time more accurately and with a lower rate of false declines. They are using this kind of Machine Learning techniques to automatically (without being guided by a human analyst) identify unusual patterns in datasets which can be characteristics of fraud. Blog post on this, coming soon…

· Manufacturing — product design, predicting product failure

· Pharmaceutical Industry — discovering co-occurrence relationships among diagnosis prescribed to different patient groups

· Criminology — discovering co-occurrence of committed crimes among different cultures, genders, age-groups, also predicting the potentially next crime based on previous crimes


Pratibha Mandave, Megha Mane, Prof. Sharada Patil (2013),“Data mining using Association rule based on APRIORI algorithm and improved approach with illustration”, International Journal of Latest Trends in Engineering and Technology, Vol. 3 Issue2, ISSN: 2278–621X.

Ibrahim JG, Chen MH, LipsitzSR, HerringAH (2005) Missing-data methods for generalized linear models. Journal of the American Statistical Association 100: 332–346.

Piatetsky-Shapiro, Gregory (1991), Discovery, analysis, and presentation of strong rules, in Piatetsky-Shapiro, Gregory; and Frawley, William J.; eds., Knowledge Discovery in Databases, AAAI/MIT Press, Cambridge, MA.


Follow ABC TECH Group:

Website: www.abc-techgroup.com LinkedIn: www.linkedin.com/company/abc-techgroup Facebook: www.facebook.com/ABCtech2017 Twitter: https://twitter.com/ABCTECH_Group