Coolthing Of Theday

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Thursday, 11 July 2013

A little Hadoop, HDInsight, Mahout, some .Net and a little StackOverflow and you have...

Posted on 18:02 by Unknown

Amazedsaint's Tech Journal - Building A Recommendation Engine - Machine Learning Using Windows Azure HDInsight, Hadoop And Mahout

Feel like helping some one today?

Let us help the Stack Exchange guys to suggest questions to a user that he can answer, based on his answering history, much like the way Amazon suggests you products based on your previous purchase history.  If you don’t know what Stack Exchange does – they run a number of Q&A sites including the massively popular Stack Overflow. 

Our objective here is to see how we can analyze the past answers of a user, to predict questions that he may answer in future. May Stack Exchange’s current recommendation logic may work better than ours, but that won’t prevent us from helping them for our own  learning purposes .

We’ll be doing the following tasks.

  • Extracting the required information from Stack Exchange data set
  • Using the required information to build a Recommender

But let us start with the basics.   If you are totally new to Apache Hadoop and Hadoop On Azure, I recommend you to read these introductory articles before you begin, where I explain HDInsight and Map Reduce model a bit in detail.

...

Conclusion In this example, we were doing a lot of manual work to upload the required input files to HDFS, and triggering the Recommender Job manually. In fact, you could automate this entire work flow leveraging Hadoop For Azure SDK. But that is for another post, stay tuned. Real life analysis has much more to do, including writing map/reducers for extracting and dumping data to HDFS, automating creation of hive tables, perform operations using HiveQL or PIG, etc. However, we just examined the steps involved in doing something meaningful with Azure, Hadoop and Mahout.

You may also access this data in your Mobile App or ASP.NET Web application, either by using Sqoop to export this to SQL Server, or by loading it to a Hive table as I explained earlier. Happy Coding and Machine Learning!! Also, if you are interested in scenarios where you could tie your existing applications with HD Insight to build end to end workflows, get in touch with me. -

imageimageimageimage

Just the article I've been looking for. It provides a nice start to finish view of playing with HDInsight and Mahout, which is something I was pulling my hair out over a few months ago...

Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in .Net, Azure, Data, Hadoop | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Mr. 7,000! This is my 7,000th post...
    Before this post; After; 20 visits between taking these snaps? Oh wait, that's probably me searching for past related posts....
  • Rad Gate Post... Get your Red Gate Post here...
    simple talk - Melanie Townsend - Get a copy of the Red Gate Post We recently put together a newspaper of some of the best articles fr...
  • "Windows Server Essentials Media Pack" (DNLA Stream, HTML5 and Dashboard Media stuff)
    Microsoft Downloads - Windows Server Essentials Media Pack This pack enables the media streaming functionality for Windows Server 2012...
  • Windows Management Framework 4.0 (PowerShell 4, PowerShell ISE, Management OData, WMI, etc.) now available
    Keith Hill's Blog - PowerShell 4.0 Now Available You can get PowerShell 4.0 for down level operating systems now via the WMF 4.0 d...
  • Viasfora - Your new favorite Visual Studio Text/*ML Editing Extension?
    Winterdom - Introducing Viasfora A couple of days ago, I unveiled Viasfora , my latest attempt at building a decently packaged extensi...
  • "Windows Server [2012 R2]: The Best Infrastructure to Run Linux Workloads"
    In the Cloud - What’s New in 2012 R2: Enabling Open Source Software Part 4 of a 9-part series . ... There are a lot of great s...
  • [Hardware Review] Life with Haswell... Haswell/Harris Beach Intel SDS Ultrabook Review - Part 2
    "So Greg, how's life with Haswell been?" "Pretty Sweet! (Mostly)" I've been given an opportunity to review t...
  • Fuzzy Lookup Add-In for Excel (Insert lame "Fuzzy, wuzzy was an Excel..." snip here)
    Microsoft Downloads - Fuzzy Lookup Add-In for Excel The Fuzzy Lookup Add-In for Excel performs fuzzy matching of textual data in Exce...
  • Caliburn.Micro v1.5.0 released (CM gets Tasks, Async/Await and Share/Setting for RT... and bug fixes of course)
    Caliburn.Micro - Caliburn.Micro v1.5.0 "Release Notes This release fixes many bugs. It also adds support for Task and async/a...
  • Whoa there's allot of the free NOAA [resources]
    Government Book Talk - Be a NOAA-it-all with these FREE NOAA resources about the weather and oceans In the morning when I get on the e...

Categories

  • .Net
  • 3DPrinting
  • AFeedYouShouldRead
  • Agile
  • ALM
  • Amazon
  • Amiga
  • Analytics
  • Android
  • ASP.NET
  • Azure
  • BigData
  • bing
  • Blogging
  • Book
  • BookReview
  • BUILD
  • C
  • C#
  • C++
  • Career
  • Cat
  • cheatsheet
  • ClickOnce
  • Cloud
  • ComputerHardware
  • css
  • Data
  • DBA
  • DependencyInjection
  • Deployment
  • Design
  • Development
  • devops
  • DVCS
  • ebook
  • EDD
  • Education
  • EnterpriseLibrary
  • EntityFramework
  • Exchange
  • Expression
  • gadget
  • Game
  • GIT
  • Google
  • Government
  • Hadoop
  • hardware
  • HardwareReview
  • HaswellReview
  • HTML5
  • Humor
  • IE
  • IEExtension
  • IfAllElseFails
  • IIS
  • ILMerge
  • Image
  • Infographic
  • interview
  • InversionOfControl
  • Java
  • Javascript
  • Kinect
  • LightSwitch
  • LINQ
  • Linux
  • LosAngeles
  • Lucene
  • Lync
  • MEF
  • Metro
  • MicrosoftOffice
  • MicrosoftOutlook
  • Mono
  • MVC
  • MVVM
  • NetMon
  • NLP
  • NoSQL
  • NuGet
  • OData
  • OneNote
  • OpenXML
  • Paint.Net
  • Personal
  • Photosynth
  • Physics
  • portable
  • Poster
  • PowerShell
  • Preparedness
  • Presentation
  • Prism
  • PrivateCloud
  • RegEx
  • RemoteDesktop
  • Reporting
  • RIAServices
  • Science
  • ScienceFiction
  • Scratch
  • Scrum
  • ServiceBus
  • SharePoint
  • Silverlight
  • SimiValley
  • SPA
  • Space
  • SQLServer
  • Storyboard
  • Surface
  • SVG
  • SystemAdministration
  • T4
  • TeamBuild
  • TeamFoundationServer
  • TechEd
  • Training
  • TypeScript
  • UnitTesting
  • UnityApplicationBlock
  • Utility
  • Veteran
  • VirtualMachine
  • Visio
  • VisualBasic
  • VisualStudio
  • WCF
  • Web X.X
  • Webcast
  • WebFeed
  • WebMatrix
  • Windows
  • Windows7
  • Windows8
  • Windows8.1
  • WindowsHomeServer
  • WindowsLiveWriter
  • WindowsPhone
  • WindowsServer
  • WinRT
  • WiX
  • WMI
  • WOPI
  • WPF
  • XAML
  • XBox360
  • XboxOne
  • zombie

Blog Archive

  • ▼  2013 (500)
    • ►  December (12)
    • ►  November (61)
    • ►  October (65)
    • ►  September (38)
    • ►  August (47)
    • ▼  July (75)
      • [Humor] A Quick Byte...
      • Opening the U.S. Code, does the U.S. House, releas...
      • Can you Hekaton? Intro to the SQL Server 2014 Anal...
      • Microsoft Developer Network get's a face and featu...
      • Free eBook of the Day: "Testing for Continuous Del...
      • "Windows 8 App Management Toolkit for Powershell" ...
      • 27 Things only a Dev will find funny...
      • App Dev Training updated for SharePoint/Office 201...
      • Think VS2013 means no more VS2012 updates? Think a...
      • Building big bucks with big data... "Big Data, Ana...
      • [Hardware Review] Hello Haswell... Haswell/Harris ...
      • Don't fear the shell... - "Getting Started with Po...
      • Byte 0-1 - Byte Magazine Volume 00, Number 1 (and ...
      • Await no more, Amazon Web Services .Net SDK v2 pre...
      • Imaging an even simpler image for using the Nokia ...
      • National app privacy code of conduct released by t...
      • "Rethinking Enterprise Storage: A Hybrid Cloud Mod...
      • "Manhattan District History" - History of the Manh...
      • [Irony alert] Worried about the NSA reading your e...
      • modern.IE is OSS... The HTML/CSS/JS code scanner f...
      • The Gu is giving you a car! (Well chance to win on...
      • Here's your chance to tell the .Net Framework Team...
      • Katana? OWIN? What? Here's what and how to get sta...
      • Katana Lifts Its License - Katana v2 nuget bins wi...
      • "Windows Server [2012 R2]: The Best Infrastructure...
      • Building on Build (not that Build) MSBuild and C#/...
      • Is the DHS/Department of Homeland Security Followi...
      • [Hardware Review] Haswell is coming... Haswell/Har...
      • [Humor] PRISM Anxiety Disorder
      • 5 for 11 on 8.1 - Five Fun Features of Modern IE11...
      • Miss the future events on the To-Do pane in Outloo...
      • [Why Didn't I Think of This... Thing of the day] W...
      • Do you DMV (SQL Server Dynamic Management View)? I...
      • Shining some light on your MSI to WiX conversion w...
      • Legislative Data Challenge - Win $5k challenge by ...
      • Silos are for farms, not agile development teams...
      • New C#/VB Language features in VS2013/.Net 4.5.1? ...
      • “Productivity Hub SP1” = Office 2007/2010 Module o...
      • [Awesomeness of the Day] Dragon Skull Found on Beach
      • 6 on 8.1 - VB6 on Windows 8.1...
      • "How to Develop Your DBA Career" Free eBook (and p...
      • List of some of the new Windows 8.1 features that ...
      • Gestalt your way to better data visualization by f...
      • Missed the session about C# 6/.Net 5 at Build? (We...
      • Full O'Fun with the "FEZ Game-O", an open-source ....
      • Northwest Cadence is giving a number of great (and...
      • Lobbing WCF into the LOB with the WCF LOB Adapter ...
      • SkyDrive Pro vs SkyDrive, do you know what each on...
      • New Sponsor - Infragistics
      • Time ENF? "ENF, a New Standard for Managing Native...
      • Reimagining images with the Nokia Imaging SDK for ...
      • A little Hadoop, HDInsight, Mahout, some .Net and ...
      • Privatize your cloud with help from these two new ...
      • Windows 8 Start Screen Layout backup... There's a ...
      • Touching a file (changing last updated Date/time) ...
      • How fast is your WinPhone8/Win8 connection? Micros...
      • [Limited time local offer] LA Dodgers Do Simi a Go...
      • InRelease Released - The InRelease acquisition has...
      • What every Angeleno needs... a .LA domain name.
      • Here are the new HERE Launchers (hear, hear!)
      • Update Tuesday - Should you be worried about the f...
      • Find the "Missing LINQ" - Future project of "hard ...
      • Unofficial Windows 8 vs Windows 8.1 Assemblies - W...
      • 30 years of Hello World - One Developers Life Lang...
      • Think virtualizing SQL Server is evil? Here's a 94...
      • RIA has been freed - RIA Services is being open s...
      • ".NET Technology Guide for Business Applications" ...
      • My Top Five Daily Web Curators - Five Bloggers tha...
      • SSMS Tools Pack 2.7 Released (BTW, if you're on an...
      • "25 Secrets for Faster ASP.NET" Free [ Name and Em...
      • Immutable isn’t just for Breakfast anymore [okay, ...
      • Want to see Bill Gates speak live (via webcast)? W...
      • Playing with SQL Server 2014 (and VS2013) the Azur...
      • Lucian Talks Up Async - The Complete Async Set fro...
      • 350 .Net Dev Interview Questions and Answers PDF f...
    • ►  June (39)
    • ►  May (40)
    • ►  April (42)
    • ►  March (39)
    • ►  February (42)
Powered by Blogger.

About Me

Unknown
View my complete profile