blog

Expansion of the portfolio: YMC repositions itself

Posted by on May 14, 2013 in Blog | No Comments

YMC strategically repositions itself and expands the existing service of Web Solutions by two complementary areas: Big Data Analytics and Mobile Applications. Big Data YMC has acquired the startup Sentric, which is also based in Kreuzlingen. Sentric has been the first Swiss provider of services in the field of Big Data. The team has already ...

Hannibal: New Features and the Future

Posted by on Apr 8, 2013 in Blog | No Comments

A few months have passed since I last worked on Hannibal, but last week I had the opportunity to do so. I worked on a few issues in GitHub and thought about new features for the tool. In this post I will demonstrate the main new features for Hannibal and in the end, will talk ...

Hello Europe! Hadoop has landed.

Posted by on Mar 28, 2013 in Blog | One Comment

Last week we were in Amsterdam at the Hadoop Summit 2013. This was the first Hadoop Summit in Europe, so things are picking up momentum over here too. #HadoopSummit great that #hadoop has landed on the european mainland. Thank you @hortonworks — Rob Dielemans (@robdielemans) March 21, 2013 This event was a great opportunity to ...

Build a Better Customer Experience Model with Big Data

Posted by on Mar 26, 2013 in Blog | No Comments

As a topic of digital marketing, customer experience management is the intersection of many different disciplines, including: design, marketing, branding and interactions. Besides the business and technology aspects of the customer experience management model, there are customers or real people who are the ultimate judges as to whether or not a product/service is desirable. Keeping your finger on the ...

Lambda Architecture, Part 1

Posted by on Mar 8, 2013 in Blog | No Comments

We are witnessing a paradigm shift from batch based data processing to real-time data processing using the Hadoop framework. Despite this progress it is still a challenge to process web-scale data in real-time. A lot of technologies can be used to create such a complete data processing system – but to choose the right tools, ...

Case Study: Retail WiFi Log-file Analysis with Hadoop and Impala, Part 4

Posted by on Feb 8, 2013 in Blog | 2 Comments

In the previous article we explained how to parse, transform and finally load data into Hive’s warehouse. Now it’s time to talk about querying the data. Before we start, here is how a sample of the data looks like:

As you can see, there is still some noise in the last column. We are ...

Case Study: Retail WiFi Log-file Analysis with Hadoop and Impala, Part 3

Posted by on Feb 1, 2013 in Blog | No Comments

In the previous article we described how to collect WiFi router logs with Flume to store in HDFS. This article will describe how we did the transformation, parsing, filtering and finally loading into Hive’s data warehouse. Let’s start by looking at the raw data sample on HDFS.

In order to import the raw data ...

Case Study: Retail WiFi Log-file Analysis with Hadoop and Impala, Part 2

Posted by on Jan 29, 2013 in Blog | No Comments

Following on from Jean-Pierre’s introduction to this experiment in part 1, I will now expand on the technical details of the data ingestion process using Flume. As you can see in figure 2 from the previous post, first of all we had to collect log data as a data source to be read by Flume ...

Case Study: Retail WiFi Log-file Analysis with Hadoop and Impala, Part 1

Posted by on Jan 25, 2013 in Blog | 12 Comments

This week we were inspired to do some research, driven by an idea: It must be possible to bring the concepts of tracking users in the online world to retail stores. We are not the experts in retail but we know that one of the most important key performance indicators is revenue per square metre. ...

Hadoop training by Cloudera

Posted by on Jan 17, 2013 in Blog | 2 Comments

Last week I attended an admin training about Hadoop, held by Cloudera in a comfortable and well prepared location in London. This 3-day course covers several topics of the Hadoop ecosystem, all within 500+ slides and some exercises. The range is from historical information, illustration of why Hadoop is needed, introduction to MapReduce and job ...