A Computer Scientist

"Stand on the shoulders of giants"

Extracting DRM-restricted epub from my Android

There are many reasons that motivate me to obtain “my” ebooks (instead of breaking DRM, which seems to be scary at first glance). By the way, I like Aladin, a Korean online book store, because as far as I know this is only place that I can purchase a book without installing ActiveX in my linux machine though I can’t pay with my VISA credit card issued outside of Korea.

The very first reason is that I can’t read ‘my’ ebook without Internet connection when opening a stupid Aladin ebook reader. I love reading books out of my office, in subway, and on air. Initially I tried figuring out a way to workaround this. What I found is that when opening a downloaed epub file in Aladin, they initiate an ‘activity’ to open this epub. Without initiating this Aladin app, we can directly request the ‘activity’ — surely embedded epub reader itself isn’t designed by Aladin but by Haansoft.

I think I don’t have to motivate you further by telling many other reasons other than above. That is a big thing. Let’s start carving my epub to read without Aladin involved. Then, what are the plans? Very first hypothesis is that Aladin app fully decrypts my epub when I open, and keep them into memory while reading. It is a very reasonable assumption if you see how Aladin app behaves. If I disconnect Internet while reading, I am still able to display all pages even though they enforce Internet connection at start. Standard procedures are like this:

1. Dump memory before/after opening epub
2. Compare them in Java object level
3. Carve decrypted memory of epub into a standard epub format



There are many ways to dump memory. Among many different trials with my rooted gtab, it is the east way to use ddms with emulator, and convert them into standard memory analyzer format (hprof). But bad news is that we have to install eclipse … anyway.



You can convert ddms’s hprof to mat’s hprof with below commands:

1
2
hprof-conv mem1.hprof mat1.hprof
hprof-conv mem2.hprof mat2.hprof

What we are interested in is the ‘difference’ of two dumps. As you can see, there is the ‘biggest’ byte string in the second dump (mat2.hprof after opening the epub file). Here is command to extract that byte object from dumped memory file.

1
dd if=mat2.hprof skip=16677160 bs=1 count=1572344 of=my.epub

Ok, it seems we are in the right track. If you open ‘my.epub’ with archive manager, you can see all content files are decrypted already! Let’s see the difference between carved my.epub and original epub file.



Awesome right? we got the decrypted epub file as it is. Unfortunately trying this decrypted epub in gtab doesn’t work. Thus I compare DRM-free epub with carved epub file. Interestingly, there is NO META-INFO/encryption.xml file in the DRM-free file, even though contents of all files are decrypted already. Ah! after uncompressing carved epub and deleting this encryption file, I just re-compress all files into a single epub (it’s simply zip, and 1/2 smaller size!). Here we go. I finally obtain DRM-free epub and can read them in different ebook reader!

To Aladin: did you try reading ebooks? or do you like reading? .. please think of what users like to do.

Syscall Table

Sicking from looking over arguments of system calls, I made a reference table to play with. ( here )

It is actually a command line tool to lookup. Have a fun!

Backup & Sync multiple ubuntu machines

I am running total four ubuntu machines and one fedora server (to host blog/web/files, this!). I have two ubuntu desktops in my office and dorm, one ubuntu on my air, and one toy box (for my kernel debugging). In my everyday life, I need fully synchronized three machines (office, dorm and laptop) and one partially synchronized toy box. As you imagine, synchronization is pain in my ass. To avoid this annoying situation, I often use autofs/sshfs on my music/movie/working folders. It saves my day but do you remember an adage “don’t put every eggs in one basket?” It is a point of failure that we learn from a text so many times. According to this study,

The data sheets for those drives listed MTBF between 1 million to 1.5 million hours, which the study said should mean annual failure rates “of at most 0.88%.”

The probability to meet a hard drive failure is about 1% annually. I have about 10 SATA and one SSD, which is a terribly unreliable storage device. This disaster highly likely will happen to me too. To save my life from disk failure, I recently purchased an external RAID storage and plan my strategies:

  1. maintain one latest copy with configuration files in one folder
  2. symlink all configuration files to proper locations rooted from this
  3. git/dropbox it!
  4. sync all four machines (config files are about 1G)
  5. crontab/rsync working directories in two desktops (sigh .. too big to monitor with inotify)
  6. autofs/sshfs in laptop
  7. crontab/rsync them all to my RAID storage (weekly)
  8. finally btrfs snapshot the latest copy them all to revision on RAID

It seems complicate but two things. First, sync/distribute the latest copies of config files to all four machines with git/dropbox. Second, rsync the latest copy of the huge working directory on desktops and take a snapshot on them on RAID. Then, I can save my storage on RAID and still can sync/keep duplicate copies on multiple places.