Expansion of the portfolio: YMC repositions itself
YMC strategically repositions itself and expands the existing service of Web Solutions by two complementary areas: Big Data Analytics and Mobile Applications. Big Data YMC has acquired the startup Sentric, which is also based in Kreuzlingen. Sentric has been the first Swiss provider of services in the field of Big Data. The team has already ...
Hannibal: New Features and the Future
A few months have passed since I last worked on Hannibal, but last week I had the opportunity to do so. I worked on a few issues in GitHub and thought about new features for the tool. In this post I will demonstrate the main new features for Hannibal and in the end, will talk ...
Hello Europe! Hadoop has landed.
Last week we were in Amsterdam at the Hadoop Summit 2013. This was the first Hadoop Summit in Europe, so things are picking up momentum over here too. #HadoopSummit great that #hadoop has landed on the european mainland. Thank you @hortonworks — Rob Dielemans (@robdielemans) March 21, 2013 This event was a great opportunity to ...
Build a Better Customer Experience Model with Big Data
As a topic of digital marketing, customer experience management is the intersection of many different disciplines, including: design, marketing, branding and interactions. Besides the business and technology aspects of the customer experience management model, there are customers or real people who are the ultimate judges as to whether or not a product/service is desirable. Keeping your finger on the ...
Lambda Architecture, Part 1
We are witnessing a paradigm shift from batch based data processing to real-time data processing using the Hadoop framework. Despite this progress it is still a challenge to process web-scale data in real-time. A lot of technologies can be used to create such a complete data processing system – but to choose the right tools, ...
Case Study: Retail WiFi Log-file Analysis with Hadoop and Impala, Part 4
In the previous article we explained how to parse, transform and finally load data into Hive’s warehouse. Now it’s time to talk about querying the data. Before we start, here is how a sample of the data looks like:
|
1 2 3 4 5 6 7 8 9 |
[root@cdh-master ~]# hadoop fs -cat /user/hive/warehouse/routerlogs/part-00000
1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,MLME,MLME-AUTHENTICATE.indication(98:0c:82:dc:8b:15, OPEN_SYSTEM)
1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,MLME,MLME-DELETEKEYS.request(98:0c:82:dc:8b:15)
1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,IEEE 802.11,authenticated
1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,IEEE 802.11,association OK (aid 2)
1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,IEEE 802.11,associated (aid 2)
1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,MLME,MLME-ASSOCIATE.indication(98:0c:82:dc:8b:15)
1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,MLME,MLME-DELETEKEYS.request(98:0c:82:dc:8b:15)
1358757010,2013,1,21,9,30,10,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,IEEE 802.11,deauthenticated |
As you can see, there is still some noise in the last column. We are ...
Case Study: Retail WiFi Log-file Analysis with Hadoop and Impala, Part 3
In the previous article we described how to collect WiFi router logs with Flume to store in HDFS. This article will describe how we did the transformation, parsing, filtering and finally loading into Hive’s data warehouse. Let’s start by looking at the raw data sample on HDFS.
|
1 2 3 4 5 6 |
2013-01-17T15:50:41+01:00 192.168.201.197 dropbear[1172]: Child connection from 192.168.201.99:55001
2013-01-17T15:50:46+01:00 192.168.201.197 dropbear[1172]: Password auth succeeded for 'root' from 192.168.201.99:55001
2013-01-17T15:50:52+01:00 192.168.201.197 dropbear[1172]: Exit (root): Disconnect received
2013-01-17T15:52:14+01:00 fonera hostapd: wlan0: STA 8c:64:22:3a:74:1f IEEE 802.11: disassociated due to inactivity
2013-01-17T15:52:14+01:00 fonera hostapd: wlan0: STA 8c:64:22:3a:74:1f MLME: MLME-DISASSOCIATE.indication(8c:64:22:3a:74:1f, 4)
2013-01-17T15:52:14+01:00 fonera hostapd: wlan0: STA 8c:64:22:3a:74:1f MLME: MLME-DELETEKEYS.request(8c:64:22:3a:74:1f) |
In order to import the raw data ...
Case Study: Retail WiFi Log-file Analysis with Hadoop and Impala, Part 2
Following on from Jean-Pierre’s introduction to this experiment in part 1, I will now expand on the technical details of the data ingestion process using Flume. As you can see in figure 2 from the previous post, first of all we had to collect log data as a data source to be read by Flume ...
Case Study: Retail WiFi Log-file Analysis with Hadoop and Impala, Part 1
This week we were inspired to do some research, driven by an idea: It must be possible to bring the concepts of tracking users in the online world to retail stores. We are not the experts in retail but we know that one of the most important key performance indicators is revenue per square metre. ...
Hadoop training by Cloudera
Last week I attended an admin training about Hadoop, held by Cloudera in a comfortable and well prepared location in London. This 3-day course covers several topics of the Hadoop ecosystem, all within 500+ slides and some exercises. The range is from historical information, illustration of why Hadoop is needed, introduction to MapReduce and job ...

