Computer Files as Waves

Tuesday, September 26, 2006

Few days ago, I was out for lunch with some friends, among which was Zeid. After having our meals, we settled down somewhere and started talking about various stuff, like we always do.

Zeid told me he’s been thinking about a new file compression mechanism but he’s still uncertain of its effectiveness. He said he thought of turning a computer file into a “wave”, literally transforming the bits and bytes into a waveform, then use Fourier Series to represent the wave with much less data. This small quantity of data can be used later to re-construct the wave and produce the original file again!

The idea sounded crazy at first; how can a computer file be represented by a wave? I was wondering while Zeid continued to discuss his idea and I soon began to figure what’s in his mind …

Waves or Signals, whether analogue or digital, are a primary concern of engineering students. Many engineering courses focus on waves, their applications and how to process them. One wave processing technique that all engineers know about is the “Fourier Transform”. The idea of this transform is quiet simple; you take a huge wave with a large number of points, make some calculations, and then extract a number of certain “coefficients” that is less than the number of points. Those coefficients (which are actually just numbers) can be used to reconstruct the wave again!

So if you have a wave of say 1000 numbers, you can compute special 50 numbers (coefficients), and use these anytime to get the original 1000 numbers back.

It sounds like magic, but there’s a catch that all engineers know about: The re-constructed wave is not an exact replica of the original wave; it is only a very similar wave!

During the transformation, the more coefficients you grab, the more close your generated wave will be to the actual wave. So it’s like a tradeoff between a high compression ratio and an accurate representation of the wave.

Considering our digital concerns, any series of “continuous” numbers can be called a wave. By the term “continuous” I mean “progressive”, or “gradually changing”. For example, consider the following two series of numbers:

Series A : 1 , 2 , 3 , 4 , 5 , 4 , 3 , 2 , 1
Series B : 1 , 2 , 3 , 7 , 8 , 7 , 3 , 2 , 1

Both series can be called waves! This is because there is some kind of “continuity” between the numbers. Series A, however, shows more continuity than Series B and is, therefore, a more appropriate “waveform”.

Now if you want to treat a bunch of numbers as a wave, those numbers must be acting like a wave; they must show some continuity, or else they don’t deserve to be treated like a gentle wave ;)

Therefore, the biggest debate over Zeid’s theory must be: why would bytes be continuous? i.e. who said that bytes in any computer file are continuous? For example, a simple text file with the phrase “Hello World” is represented in memory by:

72 , 101 , 108 , 108 , 111 , 32 , 87 , 111 , 114 , 108 , 100

As you can see, these bytes are not continuous, and representing them by a waveform is useless. This is the case in most computer files; bytes are not linked together!

Still, Zeid thought that there must be some continuity if we looked in the right place. Say for example a picture file! Pictures are actually series of numbers representing colours of pixels, and in most pictures those numbers change “gradually” making them a very good candidate for wave representation!

During the subsequent days of our conversation, I was investigating the proclaimed “continuity” of picture files and its effectiveness. I took a sample picture and split it into three arrays, each holding the values for one of the three basic colours (Red, Green, and Blue) for all the pixels. I found that the three arrays show a good deal of continuity and are therefore very suitable for wave representation.

So next I made a program that computes the Fourier Sine Series coefficients for a picture, and then uses them to re-construct the picture again. The results were amazing! Take a look at these:

The number N indicates the number of Fourier coefficients per line of pixels (actually, I split the picture into 96 waves per each primary colour). The last transform (N = 48) rendered very close to the original image although it dismissed an exact 50% of the data of the original file :D

There is a lot more to talk about concerning this subject but I’ll have to look deeper into it and have a discussion with Zeid. This compression mechanism may work for other computer file types as well; sounds, videos and maybe others. I’ll have to research more!

Gosh I still can’t believe I did this ...

I have successfully represented a computer file as a wave!! :D

12 Response(s) to "Computer Files as Waves"

  • Sep 27, 2006 9:25 AM

    Devil's Mind said:

    actually, I split the picture into 96 waves per each primary colour: So does this mean that you have [3*96] different waveforms, each one represented by 48 Fourier coefficients?! (Just to make sure, "coefficients" or sinusoidal waves?) - so it kinda needed [3*96*48] = 13824 coefficients... Am i making the calculation right?

    I'm not sure if you misinterpretted my original hypothesis, or just figured something on your own, but my initial suggestion to use pictures had nothing to do with continuity (within my limited classroom knowledge, Fourier Transform handles sharp edges seemlessly - but you have practical experience so correct me if im wrong)... Anyways, for the record, I suggested using pictures because The re-constructed wave is not an exact replica of the original wave; it is only a very similar wave!, so it wouldnt work on EXEs and similar stuff... This issue of continuity I don't remmember discussing it with you!!

    Finally, Im very glad that a good representation of the file was obtained using only 50% of the original file size... This is a good proof of concept example! :D


  • Sep 27, 2006 4:34 PM

    Waseem said:

    well...i never thought i would say this but your idea did work man thats amazing! c u this weekend for a chat about the subject.


  • Sep 27, 2006 5:23 PM

    Ghaith said:

    Devil’s Mind

    Your calculations are correct.

    The number of coefficients in the last render = 3*96*48 = 13824
    The number of bytes in the original file = 3*96*96 = 27648

    Regarding the continuity issue, I think the more continuity the bytes show the more effectively the wave method applies. The way I see it, continuous points can be more accurately represented by Fourier series than scattered points.

    Anyway, I’ll make sure of this by doing a test. I will also look into the 100% accuracy thing, the one you’ve been talking about, and test it on an executable or something :P

    As I told you, I personally believe we won’t be able to achieve a positive compression ratio with 100% accuracy, but who knows, we’ll try it and see :)

    Waseem

    I wish I could make more tests before our next meeting but I guess am out of time. Anyway, we already have a lot to discuss :)


  • Sep 27, 2006 9:28 PM

    Ghaith said:

    Update

    Okay guys, I just made a simple test:

    I represented two arrays, one of which is very continuous and the other is very discontinuous, in Fourier Sine series with the same number of coefficients, then regenerated them again.

    The continuous array was regenerated pretty accurately (very low cumulative error), while the regeneration of the discontinuous array was a disaster!

    This shows, like I expected, that continuity is definitely a key point in this matter.

    You can download an excel sheet showing the details of the test here:

    Simple Continuity Test.xls (16 Kb)


  • Sep 27, 2006 10:10 PM

    Lubna said:

    I loved the original idea i.e. turning a computer file into a wave! pretty unique thought Zeid, and pure neat work Ghaith!

    but if this compression method doesn't satisfy a 100% recovery for the files, I believe it would be of no use, what do you think?

    you're making me start loving the FS, and signals :)


  • Sep 27, 2006 11:21 PM

    naturalblu said:

    O-o ok :S


  • Sep 28, 2006 1:14 AM

    Ghaith said:

    Lubna

    Walla 9ayreh tfakre zayye bil zab6 :D

    Actually, this is a very good point. I told Zeid the exact thing when he told me about his idea. No “file compression” algorithm in computer history did settle for less than 100% file integrity. Actually, the term “accuracy” has never been associated with computer data before! We’re facing big questions here: what is the meaning of “inaccurate data”? And is it acceptable to have “inaccurate data”, whatever is it that is referred to by the term?

    There is no point of compressing a “Hello there” and later extracting it as a “Hello these”. The conversion is totally unacceptable although the magnitude of error in the ASCII code is tiny. The thing is even worse in other file types; a one-byte change can make a 10 MB executable file corrupted and unable to run!

    On the other hand, some file types can be represented with much less data if we settle for only little “quality” reduction. The term “quality” is a synonym for accuracy when it comes to file types like pictures, audio and video. For these file types, there are Lossy Data Compression algorithms which are based on a winning tradeoff between “quality” and storage space. However, these algorithms do what is more like an “optimized conversion” to files; they don’t offer decompression, which is a bit different from what we’re trying to do.

    naturalblu

    lol, you're free to skip over when you see a post like this next time :P


  • Sep 30, 2006 8:09 PM

    ِAbed. Hamdan said:

    Great Idea Gaith,

    I didn't read the rest of the post because It's too long, but I think I got the idea.

    I don't know if you are familiar with the term "Steganography", but steganography means hiding a message in other thing. For example, you can hide a text message in a wave file (as u converted the txt to wave) or you can hide a message in an image (this is famous, and Al-Qaeda used this in 9/11 attacks *winks*).

    I made a small software that takes two inputs: text message, and an image, then it hides the text in the image., and It can extracts it back. It's easy, and not hard, u can make use of it if u want.

    and to hide text in wave, it's also easy, there are un-used bits in any wave file, that can be over written by txt (of course u have to sample the text into bytes).


  • Oct 01, 2006 12:30 AM

    Ghaith said:

    Thanks man! It was my friend's idea, I just did the implementation :)

    Anyway, it’s the first time I hear of the term “Steganography”, though I actually did the exact same program you talked about, and many others with the same concept! I love to toy with things like this :D

    The best thing I ever came up with was a program that can embed a text message in itself (in its executable file) and later extract it! This combined many twisted concepts in addition to the manipulation of executable structures.

    But hey, people already know how to inject whole executables into other executables, not mentioning picture files, so why bother? ;)


  • Oct 02, 2006 4:50 PM

    Abed. Hamdan said:

    actually the use of Steganography is very wide. there are complex algorithms to do the job and to distrubute your secret message sometimes in multiple files. Its use is different than ecryption.

    good job you've done. :)


  • Oct 02, 2006 5:39 PM

    Ghaith said:

    Thank you Abed :)


  • Oct 07, 2006 11:44 AM

    Qwaider قويدر said:

    Adeemeh ya Shater ;-) (sorry)
    I actually did the same with real wave files in 1996 when RealAudio first came out and they employed LZH compression. Unfortunately at the time PC power wasn't that much to to do the transformation on the fly
    Then the MPEG group came out with the same idea and soon followed real audio with G2

    At any rate, that's a fantastic idea Ghaith, again, I'm so proud of you... You're very promising ... PLEASE ... I beg you ... keep it up




Write a Comment


Name:


Email:


Web Site:


Remember Me



Back to Home Page