EmbLogic's Blog

Project 01: A C Programming and Data Structures based Project

PROJECT TITLE:
Multiple Data Compression and Encryption using Iterative Technique.

Abstract:

In digital forms of data storage, data can be represented by patterns of 0s and 1s.
The more the patterns the more data can be compressed. Text may be compressed upto 40% of its original size. The percent of compression that can be done on a piece of text or a file depends on the type of file and the format of the text used.
Compression of a file may not be useful un till it can be decompressed back to its original form for further usage.

The requirement of compressing a file arises form various reasons, some of which may be
:
1. Security purpose.
2. Transmission purpose.(better bit rates, small latencies, etc)
3. Secrecy of the data or file being transferred.(Only some people may access, understand and manipulate data that is visible to all)
4. Storage purpose.(Efficient use of storage space)
etc,

When a file is compressed using any algorithm, it may be decompressed using more or less of that same algorithm in reverse order. An efficient compression may only be tested upon successful decompression of the compressed file if it generates the original file as it was before compression.
General Idea.
Any text file that has some text in it must be compressed depending on the size of the file and distinct characters it have. The same file may be decompressed using suitable algorithm and it must generate a file having same information of the source file used for compression. The program must be able to find number of distinct characters in the text file and store them in am array. In computer architecture (x86 machines) the data encoding scheme used to store, manipulate and transmit data is ASCII(American Standard Code for Information Interchange). It defines that maximum number of distinct characters available are 256 (i.e., 2^8). I used this very property of ASCII codes in my algorithm for compressing and decompressing files of any length but must have 256 distinct characters.

A simple algorithm used in compressing a text file may be :

1) Open a source file that comtains some text.(this file will be subjected to compression and decompression).
2) Read first character from source file.
3) Store first character in an array.
3) Read a character from this source file.
4) Compare the character with already existing character(s) in the array.
if the character matches with any character present in array goto step 4.
otherwise, modify the size of array by appending this character into it.
NOTE : this array now will have all the distinct characters in the file. This is the “key” used to compress and decompress the source file.
5) Now calculate the code length according to the number of elements in the master array. This is the actual size that my character must use in memory.
6) Seek the file position of the opened file to its begining.
7) Read a character from this file.
8. Compare this character with the one’s in array.
if it is present in array then assign its array index into a variable with appropriate shifting and goto step 7.
otherwise goto step 7.
NOTE: step 8 may differently handle the characters depending the code length and algorithm but the working principle is same.
9) Save the newely available coded word(after shifting) in to a new file. This is the compressed file.
10) Close all opened files.
11) Exit.

A simple algorithm used in decompressing a text file may be :

NOTE : I must have the “key” used in compression.(i.e., array holding distinct characters that were present in the source file)
1) Open the compressed file.
2) Read a character from this file.
3) save the bits of this character in different variables with appropriate shifting.
NOTE: step 3 may differently handle the character depending the code length and algorithm but the working principle is same.
4) Save the new bytes(new variables) in a new file.
5) goto step 2.
6) close all the opened files.
7) Exit.

Now to test the compression algorithm, compare the final newely created file’s information with that of original source file’s.

Conclusion:
I used the above algorithm to successfully compress and decompress many source files having variable text lengths.

Thank You.
HARPREET SINGH
ece.harpreetsingh@gmail.com

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>