5 min read

Learning Data Science through Fun Demonstrations

Learning Data Science through Fun Demonstrations

As a part of a ‘1 day in Python’ workshop, the capabilities of this versatile language were showcased with cases and demonstrations. We realized the underlying logic of the various data science algorithms through these demonstrations; or, to put it in other words — We got an insight into how computers think!

Natural Language Processing

Natural Language Processing (NLP) is concerned with programming computers to process and analyze large amounts of natural language data. These find implementations in: Search engines, Social website feeds, Speech engines and Spam filters.

There are two components of NLP:

  1. Natural Language Understanding (NLU)- It involves analyzing different aspects of the language.
  2. Natural Language Generation (NLG)- It is the process of producing meaningful phrases and sentences in the form of natural language from some internal representation.

We were given a mixture of words. Our task was to sort and arrange these words in such an order so as to form meaningful phrases or sentences. For this, we tried to structure and organize these words so that they form some meaning or connection with each other. After this, we aggregated the formation of sentences; analyzing what parts should come before or after the sentence so as to form some logic in the story.

The similar stages are followed in natural language generation, and also in ‘end-to-end’ machine learning building of a system.

The applications of NLG are:

  • Generating text summaries of databases and datasets
  • Automated journalism
  • Chatbots
  • Generating product descriptions for e-commerce sites

Facial Emotion Recognition

Human facial expressions can be easily classified into 7 basic emotions: happy, sad, surprise, fear, anger, disgust, and neutral. Facial emotions are expressed through activation of specific sets of facial muscles. These contain an abundant amount of information about our state of mind. Through facial emotion recognition, we can measure the effects that content and services have on the audience/users.

We were given a set of images and our aim was to identify the emotion in each of these. For each of the faces, we started from the top of the face observing the forehead- whether there were wrinkles or not, then we went to eyebrows and the analysis was done so on. These key differentiators helped us in identifying the emotion. After predicting and labelling each of these faces as per their emotions- disgust or neutral, we were shown the actual labels. We then realised which of the faces we had correctly marked and which of the faces we had falsely marked. Thus we found out the true negatives, false negatives, true positives and false positives. After writing these in the form of a confusion matrix, we could find out the accuracy, precision and recall. The computer algorithm for facial emotion detection works in the same way.

Examples of facial emotion recognition include:

  • Retailers may use these metrics to evaluate customer interest.
  • Healthcare providers can provide better service by using additional information about patients’ emotional state during treatment.
  • Entertainment producers can monitor audience engagement in events to consistently create desired content.

Building a Movie Recommender System

Why are recommender systems becoming more and more popular these days? How do sites like Netflix, Amazon Prime, YouTube give us recommendations?

There are basically two types of recommender systems-

  1. Collaborative Filtering- This type of filter is based on users’ ratings, and it will recommend us movies that we haven’t watched yet, but users similar to us have watched and liked. The algorithm predicts a rating for a movie from a user’s past behaviour as well.
  2. Content Based Filtering- This type of filter uses a series of discrete characteristics of a movie in order to recommend additional movies with similar features.

We were given an activity to recommend a movie to a user.

In the first scenario, there were 5 different movies (M1, M2, M3, M4 and M5) and 4 people. There were three tables listing the movie ratings in three different universes. In this simple denotion of a complex problem, we were able to look at a granular level of such problems of a real-world scenario. We started out by filling the missing values — by looking at the pattern of data. This indicated the fact that data at the elementary stage requires pre-processing and cleaning. Looking at the data provided, we were able to form decisions such as — if A likes action movies, and B likes comedy movies, then which type of movies does C like as can be seen from the data? We could identify people having similar movie tastes.

In the second scenario, we were given a table listing the features of movies. The features included whether the film was Animated, or Marvel, or had particular actors, etc. The labels were Yes or No for each of the movies. We were to then answer questions such as, if A was a fan of animated movies, and he just watched the movie Inside Out, then which movie would you recommend to him next? If B is a super Marvel fan, and just watched a movie XYZ, which movie would you recommend her to watch next?

Shape Clustering

The shape clustering problem is of practical importance in many areas where image or video data collections are used. These can significantly facilitate the automatic labeling of objects. For example, it could outline the existing groups of pathological cells in a bank of cyto-images; the groups of species on photographs collected from certain aerials; or, the groups of objects observed on surveillance scenes from an office building.

We were given a mixture of shapes for this one, and had to segregate them in clusters. Firstly, we were asked to do this individually and later in group. The task, when performed in a group was significantly faster. This also represents that when a task is distributed among the nodes in the computer algorithm, the training and testing time is lesser than running the program on a single node.

Story-telling

“Maybe stories are just data with a soul.” -Brené Brown

Storytelling in data science is to take the data, ideas, facts, incidents and convert it into a story. Stories provoke thoughts and bring out insights.

We had initially been grouped based on our interests. Through this activity, we were to choose a common topic and come up with a creative story. We represented data by drawing pictures and illustrations, connecting the elements which have a certain relationship between them. We used different types of visuals to make it more engaging. Some of us used timelines, pie charts, network diagrams, flowcharts and so on. Later, a presentation on each of these stories was conducted by the team members.

“Data makes people think, emotions make them act.” -Antonio Damasio