@njanakiev

All things data // data science / data engineering / data visualization / GIS

Featured Articles

Here you can find a selection of articles that are either important for me or were well received by the community. For more articles have a look at my blog.

Querying S3 Object Stores with Presto or Trino

Querying big data on Hadoop can be challenging to get running, but alternatively, many solutions are using S3 object stores which you can access and query with Presto or Trino. In this guide you will see how to install, configure, and run Presto or Trino on Debian or Ubuntu with the S3 object store of your choice and the Hive standalone metastore.

How to Install Presto or Trino on a Cluster and Query Distributed Data on Apache Hive and HDFS

Presto is an open source distibruted query engine built for Big Data enabling high performance SQL access to a large variety of data sources including HDFS, PostgreSQL, MySQL, Cassandra, MongoDB, Elasticsearch and Kafka among others.

Google Analytics Analytics with Python

Google Analytics is a powerful analytics tool found in an astonishing number of websites. In this tutorial, we will take a look at how to access the Google Analytics API (v4) with Python and Pandas. Additionally, we will take a look at the various ways to analyze your tracking data and create custom reports.

Installing and Running Jupyter on a Server

Jupyter Notebook is a powerful tool, but how can you use it in all its glory on a server? In this tutorial you will see how to set up Jupyter notebook on a server like Digital Ocean, AWS or most other hosting provider available. Additionally, you will see how to use Jupyter notebooks over SSH tunneling or SSL with with Let’s Encrypt.

Analyzing Your File System and Folder Structures with Python

Say you have an external hard drive with layers upon layers of cryptically named folders and intricate mazes of directories (like here, or here). How can you make sense of this mess? Python offers various tools in the Python standard library to deal with your file system and the folderstats module can be of additional help to gain insights into your file system.

Where do Mayors Come From: Querying Wikidata with Python and SPARQL

In this article, we will be going through building queries for Wikidata with Python and SPARQL by taking a look where mayors in Europe are born. This tutorial is building up the knowledge to collect the data responsible for this interactive visualization from the header image which was done with deck.gl.

Predict Economic Indicators with OpenStreetMap

OpenStreetMap (OSM) is a massive collaborative map of the world, built and maintained mostly by volunteers. On the other hand, there exist various indicators to measure economic growth, prosperity, and produce of a country. What if we use OpenStreetMap to predict those economic indicators?

Loading Data from OpenStreetMap with Python and the Overpass API

Have you ever wondered where most Biergarten in Germany are or how many banks are hidden in Switzerland? OpenStreetMap is a great open source map of the world which can give us some insight into these and similar questions. There is a lot of data hidden in this data set, full of useful labels and geographic information, but how do we get our hands on the data?

Framing Parametric Curves

This article explores an efficient way on how to create tubes, ribbons and moving camera orientations based on parametric curves with the help of moving coordinate frames.

Understanding the Covariance Matrix

This article is showing a geometric and intuitive explanation of the covariance matrix and the way it describes the shape of a data set. We will describe the geometric relationship of the covariance matrix with the use of linear transformations and eigen decomposition.