How to Remove Personally Identifiable Information from Retail Ebooks

Say you've made the plunge, after reaping the benefits from being a member of this community, you decide to give something back by filling a request that is easy to fill with a retail ebook. Or maybe you just bought the latest and greatest ebook and want to share it. Whatever your reasoning, you've gone through the hassle of removing the DRM, renaming the file so that there is enough information to discern what the file contains, and you're all ready to create the torrent and upload to trackers. But then...you start to have some doubts. What if, you wonder, what if there is some personally identifiable information left in the file that would cause my anonymous/seedboxed upload to point right back to me, making all my paranoia for naught.



Well, you might be right to wonder. After some recent testing (admittedly not very rigorous), I've found that Amazon puts a unique identifier in each kindle ebook that they sell, which in all likelihood can be traced back to the account used to purchase it. Also, any epub purchased from a store using Adobe's ADEPT DRM usually has a unique resource identifier that at its most benign can be used to narrow down which store the book was purchased from, and at its most injurious could be unique for each download from the server (I'm leaning towards it being benign, but once I got it into my mind that I could automate the removal I had no choice but to see it through).

To make the removal of these things a little more appealing (and in the case of epub files a lot less tedious), I've thrown together two batch files to take care of all the dirty work. Currently Windows only, although it shouldn't be too hard to port to a shell script for UNIX/UNIX-like systems. Ideas, questions and concerns are welcome.

The epub obfuscator:
01. Fixes some common issues with retail epub's ('value' attribute instead of 'content' for a certain meta tag, 'NONE' as a date in the DC metadata, and having the opf file with the iso-8859-1 charset instead of UTF-8)
02. Replaces the unique resource identifier with an empty uuid
03. Changes the date modified/accessed on all the files in the epub to 1980-01-01
04. Outputs a new file

The kindle obfuscator:
01. Removes the unique section of the file (atv:kin:1:{base64}:{base64})
02. Outputs a new file

Caveat
These two batch files should currently be considered a rough BETA, and could conceivably cause a mess if used incorrectly or they might not work at all in your setup. If you're uncertain of the output, PM me the original and obfuscated files.

Tutorial
01. Download http://dl.dropbox.com/u/519030/ebookobfuscation_b2.zip
An updated version, with the EPUB obfuscator covering old and new EPUB uuid strings, is available at this link: http://ifile.it/4te9nx3.
02. Unzip into a convenient directory
03. Open two directories in separate windows, one containing your unDRM'd files and the other containing the batch files, with the windows preferably not overlapping too much.
04a. If your retail ebook is an epub, drag it onto epub-obfu.bat
04b. If your retail ebook is a kindle file (tpz,mobi,prc,azw), drag it onto kindle-obfu.bat
05. If everything went well you'll now have two files in the directory that originally contained your unDRM'd file. The original, and a new file with a suffix '-obfu'
06. Test the new file in your favorite ebook reader, if it opens the likelihood of something having gone wrong is fairly small.
06a. Optional/Technical Confirmation of Information Removal

For epub files; In every html file, a line similar to below:
<meta name="Adept.resource" value="urn:uuid:cba0d389-8f04-4f2b-b979-15ebf90f7b67"/>

Should now read:
<meta name="Inept.resource" content="urn:uuid:00000000-0000-0000-0000-000000000000"/>


For kindle files (In a hex viewer/editor), a line similar to below [ASCII]:
atv:kin:1:TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQ=:nkgYSBwZXJzZXZlcmFuY2Ugb2Yg=

Alternative Hex representation of the same string:
6174763A6B696E3A313A5457467549476C7A4947527063335270626D643161584E6F5A5751734947357664434276626D783549474A3549476870637942795A57467A6232347349474A31644342696553423061476C7A49484E70626D6431624746794948426863334E70623234675A6E4A76625342766447686C63694268626D6C745957787A4C43423361476C6A61434270637942684947783163335167623259676447686C49473170626D51734948526F5958513D3A6E6B67595342775A584A7A5A585A6C636D467559325567623259673D

Should now read (depending on your viewer/editor) [ASCII]:
...................................................................................................................................................................................................................

Alternative Hex representation of the same string:
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000


Since (if the batch file worked correctly) you can't really search for a portion of the zeroed out string, the atv:kin:1:{base64}:{base64} section is generally preceded ~80 bytes by EBOOKBASE (Hex:45424f4f4b42415345).

07. Delete (or archive) the original unDRM'd file, and remove the '-obfu' suffix from the new file.
08. Upload to tracker.

EDIT: [BETA2 Changelog] Removed GNU recode from script, was doing more harm than good (mangling some characters causing ADE to balk and refuse to load the obfuscated epubs)

0 comments:

Post a Comment

 
© 2009 windows 8 download free Software | Powered by Blogger | Built on the Blogger Template Valid X/HTML (Just Home Page) | Design: Choen | PageNav: Abu Farhan