My caption 😄

Movie Recommendation

Developed a User-based Collaborative Filtering Algorithm with Spark to help recommend suitable movies to customers. By doing so, I discovered frequent movie item sets by implementing Apriori and SON Algorithms.

A-priori Algorithm is used for frequent item set mining and association rule learning. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database.

For example: There is a transaction itemsets record looks like as below:

Itemsets

{1,2,3,4}

{1,2,4}

{1,2}

{2,3,4}

{2,3}

{3,4}

{2,4}

Item	Support
{1}	3
{2}	6
{3}	4
{4}	5

All the itemsets of size 1 have a support of at least 3, so they are all frequent.

When size becomes bigger, our support will change as well.

{1,2}	3
{1,3}	3
{1,4}	2
{2,3}	3
{2,4}	4
{3,4}	3

We have thus determined the frequent sets of items in the database, and illustrated how some items were not counted because one of their subsets was already known to be below the threshold.

The power of this is:

You can find those items that tend to be purchased together more frequently than other items — the ultimate goal being to get shoppers to buy more. Together, these items are called itemsets. Skillset: Python, Spark, Hadoop

Data Mining Machine Learning