Chimpler

Channel: Chimpler

Image may be NSFW.
Clik here to view.

Deploying Hadoop on EC2 with Whirr

January 20, 2013, 3:38 pm

Apache Whirr is a set of tools to deploy cloud services. It can be used on Amazon Elastic Cloud(EC2), Rackspace Cloud and many other cloud providers. Requirement You need to have an account on Amazon...

View Article

Image may be NSFW.
Clik here to view.

Installing Storm on Ubuntu

January 24, 2013, 9:07 pm

Storm is an open source ETL created by Nathan Marz in late 2011. Unlike Hadoop where data are processed offline in big batches, Storm takes another approach by aggregating streaming data on the fly so...

View Article

Image may be NSFW.
Clik here to view.

Faceted Search with Lucene 4

January 30, 2013, 6:02 am

Faceted search is a technique used on several ecommerce websites and search engines to allow users to refine their search results by narrowing down the scope of their queries to a category or a sub...

View Article

Image may be NSFW.
Clik here to view.

Playing with Hadoop Pig

February 4, 2013, 5:34 am

Hadoop Pig is a tool to manipulate data from various sources (CSV file, MySQL, MongoDB, …) using a procedural language (Pig Latin). It can run standalone or distributed with Hadoop. Unlike Hive, it can...

View Article

Image may be NSFW.
Clik here to view.

Using Hadoop Pig with MongoDB

February 7, 2013, 5:38 am

In this post, we’ll see how to install MongoDB support for Pig and we’ll illustrate it with an example where we join 2 MongoDB collections with Pig and store the result in a new collection....

View Article

Image may be NSFW.
Clik here to view.

Pushing real-time data to the browser using cometD and Spring

February 11, 2013, 5:43 am

Comet is a set of techniques which allows web applications to push data to the browser. It is also known as Ajax Push, Reverse Ajax and HTTP push server among others. It is used in web applications to...

View Article

Image may be NSFW.
Clik here to view.

A Hadoop Alternative: Building a real-time data pipeline with Storm

February 16, 2013, 8:21 am

With the tremendous growth of the online advertising industry, ad networks have to deal with a humongous amount of data to process. For years, Hadoop has been the de-facto technology used to aggregate...

View Article

Image may be NSFW.
Clik here to view.

Playing with the Mahout recommendation engine on a Hadoop cluster

February 20, 2013, 6:08 am

Apache Mahout is an open source library which implements several scalable machine learning algorithms. They can be used among other things to categorize data, group items by cluster, and to implement a...

View Article

Image may be NSFW.
Clik here to view.

Playing with HazelCast, a distributed datagrid on Amazon EC2 with jclouds-cli

February 25, 2013, 5:41 am

Hazelcast is an open-source in-memory datagrid that allows to store data in memory distributed across a cluster of servers and to execute distributed tasks. It can be used as an in-memory database that...

View Article

Image may be NSFW.
Clik here to view.

Playing with SOLR Cloud for responsive analytics

February 27, 2013, 6:10 am

SOLR is a popular full text search engine based on Lucene. Not only is it very efficient to search documents but it is also very fast to run simple queries on relational data. SOLR4 was released...

View Article

Image may be NSFW.
Clik here to view.

Playing with Apache Hive, MongoDB and the MTA

March 6, 2013, 6:03 am

Apache Hive is a popular datawarehouse system for Hadoop that allows to run SQL queries on top of Hadoop by translating queries into Map/Reduce jobs. Due to the high latency incurred by Hadoop to...

View Article

Image may be NSFW.
Clik here to view.

Using the Mahout Naive Bayes Classifier to automatically classify Twitter...

March 13, 2013, 6:05 am

Classification algorithms can be used to automatically classify documents, images, implement spam filters and in many other domains. In this tutorial we are going to use Mahout to classify tweets using...

View Article

Image may be NSFW.
Clik here to view.

Playing with Apache Hive and SOLR

March 20, 2013, 5:56 am

As described in a previous post, Apache SOLR can perform very well to provide low latency analytics. Data logs can be pre-aggregated using Hive and then synced to SOLR. To this end, we developed a...

View Article

Image may be NSFW.
Clik here to view.

Generating EigenFaces with Mahout SVD to recognize person faces

April 17, 2013, 6:03 am

In this tutorial, we are going to describe how to generate and use eigenfaces to recognize people faces. Eigenfaces are a set of eigenvectors derived from the covariance matrix of the probability...

View Article

Image may be NSFW.
Clik here to view.

Finding association rules with Mahout Frequent Pattern Mining

May 2, 2013, 5:50 am

Association Rule Learning is a method to find relations between variables in a database. For instance, using shopping receipts, we can find association between items: bread is often purchased with...

View Article

Image may be NSFW.
Clik here to view.

Installing and comparing MySQL/MariaDB, MongoDB, Vertica, Hive and Impala...

May 10, 2013, 6:13 am

A common thing a data analyst does in his day to day job is to run aggregations of data by generally summing and averaging columns using different filters. When tables start to grow to hundreds of...

View Article

Image may be NSFW.
Clik here to view.

Using the Mahout Naive Bayes Classifier to automatically classify Twitter...

June 24, 2013, 6:06 am

In this post, we are going to categorize the tweets by distributing the classification on the hadoop cluster. It can make the classification faster if there is a huge number of tweets to classify. To...

View Article

Image may be NSFW.
Clik here to view.

Implementing a java agent to instrument code

November 5, 2013, 8:12 pm

With a system running 24/7, you have to make sure that it performs well at any time of the day. Several commercial solutions exist to monitor the performance of systems: NewRelic, GraphDat and many...

View Article

Image may be NSFW.
Clik here to view.

Classifiying documents using Naive Bayes on Apache Spark / MLlib

June 11, 2014, 6:57 pm

In recent years, Apache Spark has gained in popularity as a faster alternative to Hadoop and it reached a major milestone last month by releasing the production ready version 1.0.0. It claims to be up...

View Article

Image may be NSFW.
Clik here to view.

Analyzing your audience location with Twitter Streams and Heat Maps

June 26, 2014, 4:52 am

With the democratization of GPS and IP geolocation in portable devices (laptop, tablet, phone, Internet of things, …), more and more data containing geolocation information become available....

View Article

More Pages to Explore .....

Latest Images