Office Hours

What it takes to leverage Data Science in your company

Photo by Franki Chamaki on Unsplash

These days, many companies are trying to become data-, even AI-driven. Data Science, Machine Learning, and Artificial Intelligence (AI) are a few of the most predominant terms used with regards to the digital transformation of companies. One might think that these technologies would solve any business problems. Within corporations, Data Scientists are often considered as “digital wizards” who were able to turn data smoothly into actionable insights and well-working recommendation systems. There is a gap between expectation and reality, especially from non-technical people, when it comes to the requirements and expectable results of a Data Science project. …


Photo by Karolina Grabowska on Pexels

Many data science techniques are based on measuring similarity and dissimilarity between objects. For example, K-Nearest-Neighbors uses similarity to classify new data objects. In Unsupervised Learning, K-Means is a clustering method which uses Euclidean distance to compute the distance between the cluster centroids and it’s assigned data points. Recommendation engines use neighborhood based collaborative filtering methods which identify an individual’s neighbor based on the similarity/dissimilarity to the other users.

In this blog post I will take a look at the most relevant similarity metrics in practice. Measuring similarity between objects can be performed in a number of ways.

Generally we…


Source: Globallinker

Predicting customer churn is a challenging and common problem that data scientists encounter these days. The ability to predict that a particular customer is at a high risk of churning, while there is still time to do something about it, represents a huge additional potential revenue source for every customer-facing business.

In this post, I will guide you through the creation of a machine learning solution which will be able to predict customer churn. This solution will be realized with Apache Spark. Apache Spark is a popular distributed data processing engine which can be deployed in a variety of ways…


Introduction

In this post I will analyze AirBnB data for Munich, Germany. I will focus on the prices and customer reviews as well as on the preferred hotspots in Munich. For the purpose of this analysis, I collected the following datasets from the official AirBnB website (http://insideairbnb.com/get-the-data.html):

  • calendar.csv: detailed calendar data
  • listings.csv: summary information and metrics for listings
  • reviews.csv: summary review data and listing ID.

At first, I will analyze the prices of all AirBnB apartments listed in Munich and search for patterns in the price development over the course of a calendar year.

Then, I will use a generative…

Marvin Lüthe

Data Scientist at Airbus, https://www.linkedin.com/in/mluethe/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store