Data Science

Conda Essentials Notes

-- Introduction Conda in an open source package management system that works on all platforms. It is a tool that helps manage packages and environments for different programming languages. Develop a high level understanding of how Conda works helped me at so many levels especially when it comes to managing environments and make my work more reproducable. Below are the notes that I wrote down during my journey of learning Conda and I always refere back to them:

K-means Clustering - Algorithm, Applications, Evaluation Methods, and Drawbacks

-- Overview Clustering Kmeans Algorithm Implementation Applications Geyser's Eruptions Segmentation Image Compression Evaluation Methods Drawbacks Conclusion Clustering Clustering is one of the most common exploratory data analysis technique used to get an intuition about the structure of the data. It can be defined as the task of identifying subgroups in the data such that data points in the same subgroup (cluster) are very similar while data points in different clusters are very different.

epsilon-Greedy Algorithm

A/B testing can be defined as a randomized controlled experiment that allows us to test if there is a causal relationship between a change to a website/app and the user behavior. The change can be visible such as location of a button on the homepage or invisible such as the ranking/recommendation algorithms and backend infrastructure. Web/Mobile developers and business stakeholders always face the following dilemma: Should we try out all ideas and explore all options continuously?

Predicting Loan Repayment

Introduction The two most critical questions in the lending industry are: 1) How risky is the borrower? 2) Given the borrower's risk, should we lend him/her? The answer to the first question determines the interest rate the borrower would have. Interest rate measures among other things (such as time value of money) the riskness of the borrower, i.e. the riskier the borrower, the higher the interest rate. With interest rate in mind, we can then determine if the borrower is eligible for the loan.

Predicting Employee Turnover

Employee turnover refers to the percentage of workers who leave an organization and are replaced by new employees. It is very costly for organizations, where costs include but not limited to: separation, vacancy, recruitment, training and replacement. On average, organizations invest between four weeks and three months training new employees. This investment would be a loss for the company if the new employee decided to leave the first year. Furthermore, organizations such as consulting firms would suffer from deterioration in customer satisfaction due to regular changes in Account Reps and/or Consultants that would lead to loss of businesses with clients.