Coolthing Of Theday

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Thursday, 2 May 2013

PII Problems in the Public Enron Data Set (aka "Industry Ouch")

Posted on 07:48 by Unknown

Ride The Lightning - Wow. EDRM/FERC/Enron Data Privacy Breaches

It is a startling revelation when you learn that a dataset that has been public for years and contains over 7,500 instances of unredacted social security numbers, credit card numbers, dates of birth, home addresses and phone numbers. But that is precisely the claim of John Martin, the CEO and founder of BeyondRecognition.

The EDRM Enron Email Data Set v2 (EDRM Data) is a collection of documents originally gathered by the Federal Energy Regulatory Commission (FERC) as part of its investigation of Enron's energy trading practices and then made public by FERC. The EDRM data is a reworked version of the original documents which was available for download over an extended period of time at EDRM's website - it has since been transferred to Amazon Web Services for downloading, though there is a link from EDRM to the download site.

Why have so many people/teams worked with the data for years without discovering all the personally identifiable information (PII)? EDRM teams worked with it. The NIST-sponsored Text Retrieval Conference (TREC) Legal Track for 2010 and 2011 used that data set. Teams from around the world used it.

...

Putting motivation to one side, it is a real issue that publication of this data set necessarily meant that a data breach had taken place and it is astonishing that no one ever checked for PII. EDRM, in an e-mail I have seen, acknowledges that it is aware of the PII content and is working with an EDRM partner to make "a PII clean" version of the data available via EDRM.

...

Lessons from the EDRM/FERC/Enron Data Privacy Breaches

Background. The Electronic Discovery Reference Model (“EDRM”) is an e-discovery industry standards setting group, and the EDRM Enron Email Data Set v2 (“EDRM Data”) is a collection of documents originally gathered by the Federal Energy Regulatory Commission (“FERC”) as part of its investigation of Enron’s energy trading practices and then made public by it. EDRM Data is a reworked version of the original documents, with a label added to each email that reads,

“EDRM Enron Email Data Set has been produced in EML, PST and NSF format by ZL Technologies, Inc. This Data Set is licensed under a Creative Commons Attribution 3.0 United States License <http://creativecommons.org/licenses/by/3.0/us/>. To provide attribution, please cite to ZL Technologies, Inc. (http://www.zlti.com/).”

EDRM served as a direct download point for the EDRM Data for a period of time and later moved it to Amazon Web Services for downloading.

Breach Discovery. While working with the EDRM Data that we downloaded from the EDRM website, BeyondRecognition discovered that there were over 7,500 instances of unredacted social security numbers, credit card numbers, dates of birth, home addresses and phone numbers – a startling breach of privacy. Most of the data breach victims were Enron employees, but the victims also included spouses or children of the employees as well as third party contractors.

LESSONS

...

I've blogged about this data set a number of times. Many, many people in the LitSupport,eDiscovery industry use it (like about everyone), it's be available for almost a decade, and now this is found? Wow. I'd bet it's been seen before, but everyone assumed it was okay because the data was "public?" Anyway, if you use this data and you have a local copy, you need think long and hard about this finding.

Like I said... ouch

 

Related Past Post XRef:
And even more Enron (PST’s that is) We’re talking 107GB, compressed, of data…
EDRM Enron Reference Data v2 now available
Need a ton of email data (10’s of gig’s)? Need it in PST form? Need it to be public data? Want to look behind the curtain into Enron? The EDRM Data Set Project is for you…
Federal Energy Regulatory Commission (FERC) Enron Email Document Database

Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in EDD | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Mr. 7,000! This is my 7,000th post...
    Before this post; After; 20 visits between taking these snaps? Oh wait, that's probably me searching for past related posts....
  • "Windows Server Essentials Media Pack" (DNLA Stream, HTML5 and Dashboard Media stuff)
    Microsoft Downloads - Windows Server Essentials Media Pack This pack enables the media streaming functionality for Windows Server 2012...
  • Rad Gate Post... Get your Red Gate Post here...
    simple talk - Melanie Townsend - Get a copy of the Red Gate Post We recently put together a newspaper of some of the best articles fr...
  • Windows Management Framework 4.0 (PowerShell 4, PowerShell ISE, Management OData, WMI, etc.) now available
    Keith Hill's Blog - PowerShell 4.0 Now Available You can get PowerShell 4.0 for down level operating systems now via the WMF 4.0 d...
  • Viasfora - Your new favorite Visual Studio Text/*ML Editing Extension?
    Winterdom - Introducing Viasfora A couple of days ago, I unveiled Viasfora , my latest attempt at building a decently packaged extensi...
  • "Windows Server [2012 R2]: The Best Infrastructure to Run Linux Workloads"
    In the Cloud - What’s New in 2012 R2: Enabling Open Source Software Part 4 of a 9-part series . ... There are a lot of great s...
  • [Hardware Review] Life with Haswell... Haswell/Harris Beach Intel SDS Ultrabook Review - Part 2
    "So Greg, how's life with Haswell been?" "Pretty Sweet! (Mostly)" I've been given an opportunity to review t...
  • Fuzzy Lookup Add-In for Excel (Insert lame "Fuzzy, wuzzy was an Excel..." snip here)
    Microsoft Downloads - Fuzzy Lookup Add-In for Excel The Fuzzy Lookup Add-In for Excel performs fuzzy matching of textual data in Exce...
  • Caliburn.Micro v1.5.0 released (CM gets Tasks, Async/Await and Share/Setting for RT... and bug fixes of course)
    Caliburn.Micro - Caliburn.Micro v1.5.0 "Release Notes This release fixes many bugs. It also adds support for Task and async/a...
  • Just about everything you ever wanted to know about SQL Server Date and Time Data Types...
    CodeProject - Date and Time Data Types and Functions - SQL Server (2000, 2005, 2008, 2008 R2, 2012) Introduction It would be bette...

Categories

  • .Net
  • 3DPrinting
  • AFeedYouShouldRead
  • Agile
  • ALM
  • Amazon
  • Amiga
  • Analytics
  • Android
  • ASP.NET
  • Azure
  • BigData
  • bing
  • Blogging
  • Book
  • BookReview
  • BUILD
  • C
  • C#
  • C++
  • Career
  • Cat
  • cheatsheet
  • ClickOnce
  • Cloud
  • ComputerHardware
  • css
  • Data
  • DBA
  • DependencyInjection
  • Deployment
  • Design
  • Development
  • devops
  • DVCS
  • ebook
  • EDD
  • Education
  • EnterpriseLibrary
  • EntityFramework
  • Exchange
  • Expression
  • gadget
  • Game
  • GIT
  • Google
  • Government
  • Hadoop
  • hardware
  • HardwareReview
  • HaswellReview
  • HTML5
  • Humor
  • IE
  • IEExtension
  • IfAllElseFails
  • IIS
  • ILMerge
  • Image
  • Infographic
  • interview
  • InversionOfControl
  • Java
  • Javascript
  • Kinect
  • LightSwitch
  • LINQ
  • Linux
  • LosAngeles
  • Lucene
  • Lync
  • MEF
  • Metro
  • MicrosoftOffice
  • MicrosoftOutlook
  • Mono
  • MVC
  • MVVM
  • NetMon
  • NLP
  • NoSQL
  • NuGet
  • OData
  • OneNote
  • OpenXML
  • Paint.Net
  • Personal
  • Photosynth
  • Physics
  • portable
  • Poster
  • PowerShell
  • Preparedness
  • Presentation
  • Prism
  • PrivateCloud
  • RegEx
  • RemoteDesktop
  • Reporting
  • RIAServices
  • Science
  • ScienceFiction
  • Scratch
  • Scrum
  • ServiceBus
  • SharePoint
  • Silverlight
  • SimiValley
  • SPA
  • Space
  • SQLServer
  • Storyboard
  • Surface
  • SVG
  • SystemAdministration
  • T4
  • TeamBuild
  • TeamFoundationServer
  • TechEd
  • Training
  • TypeScript
  • UnitTesting
  • UnityApplicationBlock
  • Utility
  • Veteran
  • VirtualMachine
  • Visio
  • VisualBasic
  • VisualStudio
  • WCF
  • Web X.X
  • Webcast
  • WebFeed
  • WebMatrix
  • Windows
  • Windows7
  • Windows8
  • Windows8.1
  • WindowsHomeServer
  • WindowsLiveWriter
  • WindowsPhone
  • WindowsServer
  • WinRT
  • WiX
  • WMI
  • WOPI
  • WPF
  • XAML
  • XBox360
  • XboxOne
  • zombie

Blog Archive

  • ▼  2013 (500)
    • ►  December (12)
    • ►  November (61)
    • ►  October (65)
    • ►  September (38)
    • ►  August (47)
    • ►  July (75)
    • ►  June (39)
    • ▼  May (40)
      • Brian (and Infragistics) is having a theme give-aw...
      • MarraLAB for Visual Studio, your new DDE (Debuggin...
      • TechEd Reveal - Looks like BizTalk ain't dead yet....
      • From A to W... The US Gov goes Git (and API crazy ...
      • Windows 8.1 Pre-beta WinRT API Spelunking (Think, ...
      • Edward Farley and the Fantastic Library - A 12 par...
      • SQL Server Database Project SqlBuildTask failing a...
      • Your plate will never be full with all the great G...
      • Go long and thanks for the 1's, 2's and 3's... Lot...
      • Many events, lots of devices, one you and your new...
      • Extended WPF Toolkit gets all v2.0 and now include...
      • Cool Kid Training from Pluralsight, that's free to...
      • Visual Studio ALM Ranger Solutions Catalog - All A...
      • Just the SSMS Mama, just the SSMS... (Getting just...
      • "The database is slow!" Here's a SQL Server Perfor...
      • A little heritage of the BSOD
      • "Hello dotPeek plugin" Creating a dotPeek plugin i...
      • Okay, Okra! Windows Store Templates that make MVVM...
      • GBoD [Geo-distributed Bunch of Data centers] or "U...
      • More on Mission Control (Coding4Fun Style) "A Flex...
      • Because every IDE needs a fart-app, right? Farticu...
      • The NSA Untangles the Web - 651 Pages of NSA Web S...
      • Who needs an App Store when we now have a DevStore...
      • Mission Control to Major...C4F - Coding4Fun Missio...
      • And Data for All... President Obama signs Executi...
      • C#'ing Objective C with Xamarin's new Objective Sh...
      • More Metro... Syncfusion Metro Studio 2 released. ...
      • Visualizing TFS Source History now with more source
      • Go can be more than just a GO (in the T-SQL world ...
      • Prep'ing your Pets, National Animal Disaster Prepa...
      • Sometimes you just need a donut [chart]... Free Mo...
      • And it does Windows too... Using Process Monitor t...
      • Free'ish [reg-ware] Windows 8 QRC (Quick Reference...
      • TFS Web Licensing for Greg's (i.e. dumm... well yo...
      • PII Problems in the Public Enron Data Set (aka "In...
      • WebMatix Gets Azure, Version Control Support, Remo...
      • So long and thanks for all the ghosting... Ghost i...
      • 13 Modules, six weeks, 2 exam preps and a whole lo...
      • Git 101 - Beginners guide to groking [command line...
      • Surface Pro Driver Pack now available
    • ►  April (42)
    • ►  March (39)
    • ►  February (42)
Powered by Blogger.

About Me

Unknown
View my complete profile