In this post I implement Huffman Encoding in x86 assembly language using FASM – the Flat Assembler.
For a set of symbols, Huffman coding uses a frequency sorted binary tree to generate binary codes for each symbol. For a block of data that is built from the symbols, the frequency (or weight) for each symbol measures how often that symbol will appear in the data. For the symbols that appear more frequently the Huffman algorithm will generate binary codes that are much shorter in length than the codes that are usually used to specify those symbols. The binary codes generated by the Huffman algorithm can then be used to compress the block of data.
For example, ASCII characters are specified with 8 bits. The Huffman algorithm can be applied to a block of text and used to generate codes that are only 2 or 3 bits in length for the most frequently appearing characters (such as E, T, or the space character). However, characters that appear less frequently will have codes that are longer than 8 bits (16 bits or 20 bits for example). Significant compression of the text is still possible because the characters with the longer binary codes only rarely appear in the text.
Huffman codes are prefix codes, in that the sequence of bits at the start of a longer code cannot be the same as the sequence of bits making up any one of the shorter codes. This is guaranteed by the binary tree – each symbol has its own unique path.
A Huffman tree contains internal nodes and leaf nodes. An internal node is an intermediate node from which further pathways branch off. Each internal node can branch in two directions only, left and right. A left branch is assigned the binary code of 0 and a right branch is assigned a code of 1. At the top of the tree is the root node, which is an internal node. A pathway from the root node can pass through several internal nodes, until it reaches a leaf node. Once a leaf node is reached, there are no further branches and the path from the root node terminates. A leaf node represents a symbol. The binary code for that symbol is then given by the combination of left and right branches from the root node of the tree.
To build a Huffman tree, start with a list of the symbols and their weights. Once the tree has been built, these will all become leaf nodes.
(1) Take the two nodes with the lowest weights (w1 and w2). (2) Create an internal node. (3) Attach the two nodes to the internal node. (4) Remove the two nodes from the list. (5) Add the internal node to the list, with weight = w1 + w2. (6) Repeat from 1, until there is only one node left. |
The remaining node becomes the root node. Note that for a given set of symbols and their weights, it is possible to generate multiple Huffman trees. This is because at each stage of the process, whether a node is attached to the left or to the right is arbitrary.
Building The Huffman Tree
This is how I build Huffman trees in my program.
I use the set of ASCII characters from 0-127 and analyse a block of text to get the subset of characters and their weights. An array of 128 elements stores the frequency table, where the index is also the ASCII code for that element. Any elements with a frequency of zero are ignored.
Three other arrays are defined, all with 128 elements: a leaf array, a node array, and a node index array. All elements of these arrays are initialised to -1 (FF).
Each element in the leaf array is 2 bytes. The first byte is the ASCII character code. The second byte is an index to the node array and points to the parent (internal) node.
The node array is the array of internal nodes that are created as the tree is built. Each element is one byte and is an index within the node array that points to the parent node.
The node index array records whether an element in the frequency table has become an internal node. When greater than -1, the element is an index that points to the node array.
At the start of the tree building process, the leaf and node arrays are empty, and leaves and nodes are added as the frequency table is processed. The pointers Next_Leaf and Next_Node index the next available positions within the leaf and node arrays, and are initially set to zero.
The pseudo-code describes how the various arrays are built as the frequency table is processed:
// asc1 and asc2 are the indexes of the two lowest // weighted elements in the frequency table // with weights w1 and w2 if(Node_Index[asc1] == -1) { // add a new leaf Leaf_Arr[Next_Leaf].char = asc1; Leaf_Arr[Next_Leaf].index = Next_Node; Next_Leaf++; } else { // update the node index Node_Arr[Node_Index[asc1]] = Next_Node; } if(Node_Index[asc2] == -1) { // add a new leaf Leaf_Arr[Next_Leaf].char = asc2; Leaf_Arr[Next_Leaf].index = Next_Node; Next_Leaf++; } else { // update the node index Node_Arr[Node_Index[asc2]] = Next_Node; } // update the frequency table weight = w1 + w2; // one element is removed Freq_Table[asc1] = 0; // the other element becomes an internal node Freq_Table[asc2] = weight; Node_Index[asc2] = Next_Node; Next_Node++; |
Computing The Binary Codes
As a result of the tree building process, we now have two arrays that specify the structure of the tree, the leaf array and the node array. Each element in the leaf array has an index which points to that leafs parent node in the node array. Each element in the node array has an index which points to that nodes parent node in the same array.
For each leaf the path up to the root node can be walked, by first jumping over to its parent node in the node array, and then by jumping from node to node until the root node is reached.
To compute the binary code for each leaf, the length of the code first needs to be determined, which is why we need to be able to walk up the tree to the root node from each leaf node. The length of a path (and hence the bit code) is just the number of internal nodes (including the root node) on the path between the leaf and the root node.
Once we have the code lengths for each character in the set, we can then compute the binary codes using the method specified in RFC 1951:
https://www.ietf.org/rfc/rfc1951.txt
RFC 1951 specifies the DEFLATE compression algorithm, of which Huffman encoding is a part. Section 3.2.2 describes a method by which Huffman binary codes can be computed based only on the code lengths. Codes generated using this method will also satisfy the following conditions (quoting the RFC):
- All codes of a given bit length have lexicographically consecutive values, in the same order as the symbols they represent. - Shorter codes lexicographically precede longer codes. |
This means that if the codes are sorted first by length and then by ASCII code, they will be in numerical order.
The method to compute the codes is:
- Define a length counts array. - For each length, count the number of leafs that have a code of that length. - Define a next code array, which stores the numerically smallest code for each code length. - In my program the maximum length is 23 bits: // compute the numerically smallest code for each length code = 0; for(len=1;len<24;len++) { // shift left and add code = code + SHL(Length_Counts[len-1]); Next_Code[len] = code; } - Finally, assign codes for all of the symbols. - For each code length, sort the symbols with that code length by ASCII value. - Assign the code in Next_Code[len] to the first symbol. - Increment Next_Code[len] by one and assign that code to the next symbol. Repeat for the rest of the symbols. |
So now we have the binary codes, which starting from the most significant bit, describe how to walk down the tree from the root node to a leaf node.
Building A Lookup Table
Encoding bytes with a Huffman tree is done simply by substituting a byte value with the corresponding bit code. Decoding on the other hand means using some kind of data structure that will enable a string of input bits to be matched to a path in the tree.
The leaf and node arrays that were built previously are not suited to this purpose. They can’t be used to walk from the root node down to any given leaf.
My program uses a lookup table, which as the input bits are read will jump from table entry to table entry, based on whether the branch is to the left or to the right.
Each table entry is 4 bytes, and represents either an internal node or a leaf. If the entry is a node then the first 2 bytes are for the left branch, and the last 2 bytes for the right branch. For each 2 byte field, the first byte gives the index of the next entry in the table that is reached by taking that branch. The second byte is always -1.
Note that the first entry in the table is always the root node.
If the entry represents a leaf, then bytes 1 and 3 are always -1, because there are no further branches. The entry stores the ASCII character value of the leaf. If the table entry was reached by taking a left branch, then byte 2 stores the ASCII value. If it was a right branch, then byte 4 stores the value. This means that if the input bits describe a path that does not exist, a value of -1 will be returned.
Here is a simple example:
Binary Codes: 63 c 0 61 a 10 62 b 110 64 d 111 Lookup Table: 00 05FF 01FF 01 02FF 03FF 02 FF61 FFFF 03 04FF 06FF 04 FF62 FFFF 05 FF63 FFFF 06 FFFF FF64 To decode 110: - Branch 1 from the root (Table[0]) leads to Table[1]. - Branch 1 from Table[1] leads to Table[3]. - Branch 0 from Table[3] leads to Table[4]. - Table[4] contains the ascii value 0x62. |
Encoding Data
When my program writes Huffman encoded data to file, the first 128 bytes of that file always specify the code lengths for each of the ASCII characters from 0 to 127.
When the binary codes are written to file, they are written in reverse bit order. Thus the most significant bit of a code is written first.
Decoding Data
To decode a compressed file, the Huffman tree needs to be generated. Since the first 128 bytes specify the code lengths for each of the ASCII characters from 0-127, the method from RFC 1951 can be applied to compute the binary codes. Then the first 128 file bytes can be discarded and the encoded data in the rest of the file processed.
Since the codes were written to file in reverse bit order (msb first), when the file is read into a buffer, the very first bit (lsb) at the beginning of the buffer (lowest memory address) will be the msb of the first Huffman code. Therefore, bits can be processed and the tree walked simply by right shifting bits out of the buffer.
Program Usage
The program menus:
Tree From Text
A Huffman tree is built from the text that is currently visible in the Edit window. The text can either be typed in, pasted in, or loaded from file via File|Open.
Tree From Text: Use the text in the Edit window to build a Huffman tree and calculate binary codes.
Display Frequency Table: Display a table of symbol frequencies for all characters that appear in the input text. Note that the last column shows the index within the leaf array for each of the characters. This should not be FF (-1) for any character.
Display Tree: Display a hex dump of the leaf and node arrays.
Display Code Table: Display a table of the Huffman binary codes. The third column gives the code lengths, and the fourth column is the binary code. These codes were computed using the algorithm from RFC 1951 and are sorted lexicographically.
Display Lookup Table: Show the lookup table.
Display Lengths: Show a comma separated list of the code lengths ordered by ASCII value for all ASCII values 0-127.
Tree From Lengths
The text in the Edit window must be a comma separated list of the codes lengths (in decimal) for all of the ASCII characters in the range 0-127, sorted by ASCII value. The method from RFC 1951 is used to compute the bit codes.
Tree From Lengths: Use a CSV list of code lengths as the input text to generate the Huffman binary codes.
Display Code Table: Display a table of the Huffman binary codes. The third column gives the code lengths, and the fourth column is the binary code. These codes were computed using the algorithm from RFC 1951 and are sorted lexicographically.
Display Lookup Table: Show the lookup table.
Compression
Encode and decode text containing ASCII characters in the range 0-127.
Encode Text: Compute a Huffman tree for the text currently visible in the Edit window, and then encode that text, writing the output to file. The first 128 bytes in the output file always specify the code lengths for the ASCII values 0-127. A text file can be encoded simply by using File|Open to display its contents in the Edit window.
Decode File: Decode a file that was previously encoded using the Encode Text menu option. The decoded text is written to another file. Two file dialogs appear consecutively, the first to open the input file, the second to set the output file. The first 128 bytes of the input file must specify code lengths, so that Huffman codes can be computed and the file decoded.
The FASM Code
To grab this code, select with the mouse and then copy and paste directly into the FASM IDE.
; ------------------------------------------------------------------------------------- format PE GUI 4.0 entry win_main include 'win32a.inc' ; ------------------------------------------------------------------------------------- IDR_MY_MENU = 1 IDM_FILE = 100 IDM_NEW = 101 IDM_OPEN = 102 IDM_SAVE = 103 IDM_SAVE_AS = 104 IDM_EXIT = 105 IDR_ACCEL = 106 ID_SELECT_ALL = 107 IDM_TREE_TEXT = 108 IDM_TREE_FROM_TEXT = 109 IDM_PRINT_TREE = 110 IDM_PRINT_CODES = 111 IDM_PRINT_FREQS = 112 IDM_PRINT_LENGTHS = 113 IDM_TREE_LENGTH = 114 IDM_TREE_FROM_LENGTHS = 115 IDM_PRINT_LOOKUP = 116 IDM_COMPRESS = 117 IDM_ENCODE_TEXT = 118 IDM_DECODE_FILE = 119 ; ------------------------------------------------------------------------------------- section '.code' code readable executable win_main: ; initialise the members of the wcex structure ; -------------------------------------------- ; WNDCLASSEX ; UINT cbSize ; UINT style ; WNDPROC lpfnWndProc ; int cbClsExtra ; int cbWndExtra ; HINSTANCE hInstance ; HICON hIcon ; HCURSOR hCursor ; HBRUSH hbrBackground ; LPCTSTR lpszMenuName ; LPCTSTR lpszClassName ; HICON hIconSm ; -------------------------------------------- ; the instance handle invoke GetModuleHandle,0 mov [wcex.hInstance],eax ; cbSize mov eax,sizeof.WNDCLASSEX mov [wcex.cbSize],eax ; the windows proc mov [wcex.lpfnWndProc],WndProc ; the windows style mov [wcex.style],CS_HREDRAW+CS_VREDRAW ; load the icons invoke LoadIcon,[wcex.hInstance],IDI_APPLICATION mov [wcex.hIcon],eax mov [wcex.hIconSm],eax ; load the cursor invoke LoadCursor,NULL,IDC_ARROW mov [wcex.hCursor],eax ; the brush for the background mov [wcex.hbrBackground],COLOR_BTNFACE+1 ; the windows class name mov dword [wcex.lpszClassName],szClass ; set to NULL mov [wcex.cbClsExtra],0 mov [wcex.cbWndExtra],0 mov dword [wcex.lpszMenuName],NULL ; register wcex ; RegisterClassEx(&wcex) invoke RegisterClassEx,wcex test eax,eax jz reg_error ; load the menu invoke LoadMenu,[wcex.hInstance],IDR_MY_MENU mov [h_menu],eax ; create the main window invoke CreateWindowEx,0,szClass,szTitle,WS_OVERLAPPEDWINDOW,\ CW_USEDEFAULT,CW_USEDEFAULT,\ 700,500,NULL,[h_menu],[wcex.hInstance],NULL ; save the windows handle: mov [h_wnd],eax test eax,eax jz cr_error ; load the accelerator table invoke LoadAccelerators,[wcex.hInstance],IDR_ACCEL test eax,eax jz acc_error mov [h_accel],eax ; show and update the window ; ShowWindow(hWnd,SW_SHOWNORMAL) invoke ShowWindow,[h_wnd],SW_MAXIMIZE ; UpdateWindow(hWnd) invoke UpdateWindow,[h_wnd] msg_loop: ; the main message loop ; GetMessage(&msg,NULL,0,0) invoke GetMessage,msg,NULL,0,0 cmp eax,1 jb exit jne msg_loop ; TranslateAccelerator(hwnd,h_accel,&msg) invoke TranslateAccelerator,[h_wnd],[h_accel],msg test eax,eax ; no need to call TranslateMessage and DispatchMessage, ; if an accelerator is successfully translated jnz msg_loop ; TranslateMessage(&msg) invoke TranslateMessage,msg ; DispatchMessage(&msg) invoke DispatchMessage,msg jmp msg_loop reg_error: invoke MessageBox,NULL,szRegError,szTitle,MB_ICONERROR+MB_OK jmp exit cr_error: invoke MessageBox,NULL,szCreateError,szTitle,MB_ICONERROR+MB_OK jmp exit acc_error: invoke MessageBox,NULL,szAccelError,szTitle,MB_ICONERROR+MB_OK exit: ; return msg.wParam invoke ExitProcess,[msg.wParam] ; ------------------------------------------------------------------------------------- proc WndProc uses ebx esi edi,hwnd,wmsg,wparam,lparam ; WndProc(hwnd,wmsg,wparam,lparam) ; callback function to process messages for the main window cmp [wmsg],WM_CREATE je .ON_CREATE cmp [wmsg],WM_SIZE je .SIZE cmp [wmsg],WM_SETFOCUS je .SET_FOCUS cmp [wmsg],WM_COMMAND je .COMMAND cmp [wmsg],WM_DESTROY je .DESTROY .DEFAULT: ; DefWindowProc(hWnd,wmsg,wParam,lParam) invoke DefWindowProc,[hwnd],[wmsg],[wparam],[lparam] jmp .DONE .ON_CREATE: ; create the child EDIT control invoke CreateWindowEx,WS_EX_CLIENTEDGE,szEdit,0,\ WS_VISIBLE+WS_CHILD+WS_VSCROLL+ES_AUTOVSCROLL+ES_MULTILINE,\ 0,0,0,0,[hwnd],0,[wcex.hInstance],NULL test eax,eax jz .EDIT_FAILED mov [h_wnd_edit],eax ; create a font for the edit control invoke CreateFont,18,0,0,0,FW_NORMAL,FALSE,FALSE,FALSE,ANSI_CHARSET,\ OUT_DEVICE_PRECIS,CLIP_DEFAULT_PRECIS,PROOF_QUALITY,FIXED_PITCH+FF_DONTCARE,szFontFace test eax,eax jz .FONT_FAILED mov [h_font],eax invoke SendMessage,[h_wnd_edit],WM_SETFONT,[h_font],FALSE stdcall Reset_Arrays xor eax,eax jmp .DONE .EDIT_FAILED: invoke MessageBox,NULL,szCreateError,szTitle,MB_ICONERROR+MB_OK jmp .DONE .FONT_FAILED: invoke MessageBox,NULL,szFontError,szTitle,MB_ICONERROR+MB_OK jmp .DONE .SIZE: ; resize the child window ; new width = LOWORD(lParam) ; new height = HIWORD(lParam) mov eax,[lparam] and eax,0xffff mov edx,[lparam] shr edx,16 invoke MoveWindow,[h_wnd_edit],0,0,eax,edx,TRUE xor eax,eax jmp .DONE .SET_FOCUS: ; set the focus to the child EDIT control invoke SetFocus,[h_wnd_edit] xor eax,eax jmp .DONE .COMMAND: mov eax,[wparam] and eax,0xffff cmp eax,IDM_NEW je .NEW cmp eax,IDM_OPEN je .OPEN cmp eax,IDM_SAVE je .SAVE cmp eax,IDM_SAVE_AS je .SAVE_AS cmp eax,IDM_EXIT je .DESTROY cmp eax,ID_SELECT_ALL je .SELECT_ALL cmp eax,IDM_TREE_FROM_TEXT je .BUILD_TREE cmp eax,IDM_PRINT_TREE je .PRINT_TREE cmp eax,IDM_PRINT_CODES je .PRINT_CODES cmp eax,IDM_PRINT_FREQS je .PRINT_FREQS cmp eax,IDM_PRINT_LENGTHS je .PRINT_LENGTHS cmp eax,IDM_TREE_FROM_LENGTHS je .TREE_FROM_LENGTHS cmp eax,IDM_PRINT_LOOKUP je .PRINT_LOOKUP cmp eax,IDM_ENCODE_TEXT je .ENCODE_TEXT cmp eax,IDM_DECODE_FILE je .DECODE_FILE jmp .DEFAULT .NEW: ; File|New or CTRL+N ; clear the Edit window invoke SendMessage,[h_wnd_edit],WM_SETTEXT,0,0 invoke SetWindowText,[hwnd],szTitle ; reset the file name lea esi,[szFileName] mov [esi],byte 0 xor eax,eax jmp .DONE .OPEN: ; File|Open or CTRL+O stdcall File_Open,[hwnd] test eax,eax jz .DONE ; load the file contents into the Edit window stdcall Text_From_File invoke SetWindowText,[hwnd],szFileName xor eax,eax jmp .DONE .SAVE: ; File|Save or CTRL+S ; if there isn't a valid file name, ; call the File Save As dialog cmp [szFileName],byte 0 jnz .TXT_TO_FILE stdcall File_Save_As,[hwnd] test eax,eax jz .DONE invoke SetWindowText,[hwnd],szFileName .TXT_TO_FILE: ; save the Edit window text to the file stdcall Text_To_File xor eax,eax jmp .DONE .SAVE_AS: ; File|Save As stdcall File_Save_As,[hwnd] test eax,eax jz .DONE invoke SetWindowText,[hwnd],szFileName ; save the Edit window text to the file stdcall Text_To_File xor eax,eax jmp .DONE .SELECT_ALL: ; CTRL+A (via Accerlerator Table) invoke SendMessage,[h_wnd_edit],EM_SETSEL,0,-1 xor eax,eax jmp .DONE .BUILD_TREE: ; reset the file name lea esi,[szFileName] mov [esi],byte 0 invoke SetWindowText,[hwnd],szTitle ; build the tree from the text in the Edit window stdcall Tree_From_Text stdcall Print_Code_Table invoke SetWindowText,[h_wnd_edit],Print_Buffer xor eax,eax jmp .DONE .TREE_FROM_LENGTHS: ; reset the file name lea esi,[szFileName] mov [esi],byte 0 invoke SetWindowText,[hwnd],szTitle ; capture a comma delimited list of lengths ; from the Edit window to build the tree invoke GetWindowText,[h_wnd_edit],Print_Buffer,8000 stdcall Tree_From_CSV_Lengths stdcall Print_Code_Table invoke SetWindowText,[h_wnd_edit],Print_Buffer xor eax,eax jmp .DONE .ENCODE_TEXT: ; reset the file name lea esi,[szFileName] mov [esi],byte 0 invoke SetWindowText,[hwnd],szTitle ; the compressed text is saved to file ; get the output file name stdcall File_Save_As,[hwnd] test eax,eax jz .DONE ; build the tree from the text in ; the Edit window stdcall Tree_From_Text stdcall Build_Encode_Table ; compress the text and save to file stdcall Encode_Text stdcall Print_Code_Table invoke SetWindowText,[h_wnd_edit],Print_Buffer xor eax,eax jmp .DONE .DECODE_FILE: ; reset the file name lea esi,[szFileName] mov [esi],byte 0 invoke SetWindowText,[hwnd],szTitle ; get the input file name stdcall File_Open,[hwnd] test eax,eax jz .DONE stdcall GetInputFileName ; get the output file name stdcall File_Save_As,[hwnd] test eax,eax jz .DONE stdcall GetOutputFileName ; decode the file stdcall Reset_Arrays stdcall Decode_File stdcall Print_Code_Table invoke SetWindowText,[h_wnd_edit],Print_Buffer xor eax,eax jmp .DONE .PRINT_TREE: stdcall Print_Tree invoke SetWindowText,[h_wnd_edit],Print_Buffer xor eax,eax jmp .DONE .PRINT_CODES: stdcall Print_Code_Table invoke SetWindowText,[h_wnd_edit],Print_Buffer xor eax,eax jmp .DONE .PRINT_FREQS: stdcall Print_Freq_Table invoke SetWindowText,[h_wnd_edit],Print_Buffer xor eax,eax jmp .DONE .PRINT_LENGTHS: stdcall Print_Lengths invoke SetWindowText,[h_wnd_edit],Print_Buffer xor eax,eax jmp .DONE .PRINT_LOOKUP: stdcall Print_Lookup_Table invoke SetWindowText,[h_wnd_edit],Print_Buffer xor eax,eax jmp .DONE .DESTROY: ; destroy the font invoke DeleteObject,[h_font] ; PostQuitMessage(0) invoke PostQuitMessage,0 xor eax,eax .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Tree_From_Text ; build the tree and code tables from the text in the edit window stdcall Calc_Freq_Table stdcall Build_Tree_From_Text stdcall Calc_Code_Lengths stdcall Codes_From_Lengths stdcall Build_Lookup_Table ret endp ; ------------------------------------------------------------------------------------- proc Tree_From_CSV_Lengths ; capture a comma delimited list of lengths ; from the Edit window to build the tree stdcall Reset_Arrays stdcall Get_Lengths stdcall Codes_From_Lengths stdcall Build_Lookup_Table ret endp ; ------------------------------------------------------------------------------------- proc Build_Tree_From_Text ; build the Huffman Tree mov [Next_Leaf],byte 0 mov [Next_Node],byte 0 lea edi,[Node_Index] cinvoke memset,edi,0xff,128 lea edi,[Leaf_Arr] cinvoke memset,edi,0xff,256 lea edi,[Node_Arr] cinvoke memset,edi,0xff,128 .LOOP: ; find the 2 lowest weights in the freq table stdcall Get_Two_Lowest test al,al jz .DONE test ah,ah jz .DONE ; add nodes to the tree stdcall Update_Tree jmp .LOOP .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Update_Tree ; add new nodes and leaves to the tree ; AL and AH index the two lowest freqs ; add the nodes stdcall Add_Node xchg al,ah stdcall Add_Node ; update the frequency table ; calculate the weight by adding the ; freqs indexed by AL and AH lea esi,[Freq_Table] xor ecx,ecx mov cl,al shl cx,2 mov edx,[esi+ecx] ; zero the freq indexed by AL mov [esi+ecx],dword 0 xor ecx,ecx mov cl,ah shl cx,2 ; add to get the weight add edx,[esi+ecx] ; save the weight to the freq table ; at the location indexed by AH mov [esi+ecx],edx ; the index AH now points to a node ; update the node index lea edi,[Node_Index] xor ecx,ecx mov cl,ah mov dl,[Next_Node] mov [edi+ecx],dl ; increment the next node counter inc byte [Next_Node] ret endp ; ------------------------------------------------------------------------------------- proc Add_Node ; add a new node or leaf to the tree ; the index is passed in AL ; check if AL points to an internal node xor ecx,ecx lea esi,[Node_Index] mov cl,al mov cl,[esi+ecx] cmp cl,0xff je .LEAF .LINK: ; link the node indexed by CL ; to the rest of the tree lea edi,[Node_Arr] ; write the index of the parent node mov dl,[Next_Node] mov [edi+ecx],dl jmp .DONE .LEAF: ; create a new leaf lea edi,[Leaf_Arr] xor ecx,ecx mov cl,[Next_Leaf] shl ecx,1 add edi,ecx ; the first byte is the char ASCII value mov [edi],al ; the second byte is the index of the parent node mov dl,[Next_Node] mov [edi+1],dl inc byte [Next_Leaf] .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Calc_Code_Lengths ; for each leaf, walk the path up to the root node ; to calculate the code length ; save the lengths to the code table lea edi,[Code_Table] cinvoke memset,edi,0,512 ; count the lengths in the Length_Counts array lea edi,[Length_Counts] cinvoke memset,edi,0,24 locals Length db 0 endl lea edi,[Code_Table] xor ecx,ecx .LEAVES: ; for each leaf: mov byte [Length],0 lea esi,[Leaf_Arr] ; set AL to the ASCII value xor eax,eax mov al,[esi+ecx] cmp al,0xff je .DONE xor edx,edx ; set DL to the index of the parent node mov dl,[esi+ecx+1] .NODES: ; walk up the nodes to the root node lea esi,[Node_Arr] .NEXT: inc byte [Length] mov dl,[esi+edx] cmp dl,0xff je .CODE jmp .NEXT .CODE: ; write the length to the code table mov dl,[Length] shl eax,2 mov [edi+eax+3],dl mov al,dl ; update the length counts array stdcall Add_Count add cx,2 test cx,256 jz .LEAVES .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Encode_Text ; encode the text in the Edit window ; get a handle to the default heap for this process invoke GetProcessHeap test eax,eax jz .HEAP_FAILED mov [h_heap],eax ; get the number of bytes in the Edit window invoke GetWindowTextLength,[h_wnd_edit] ; +1 for the null terminator inc eax mov [nBytes],eax ; allocate nBytes from the heap for the buffer invoke HeapAlloc,[h_heap],NULL,eax test eax,eax jz .ALLOC_FAILED ; point pFileBuffer to the buffer mov [pFileBuffer],eax ; read the contents of the Edit window into the buffer invoke GetWindowText,[h_wnd_edit],[pFileBuffer],[nBytes] ; allocate memory for the encode buffer stdcall Calc_Encoded_Size mov [nEncodeBytes],eax inc eax invoke HeapAlloc,[h_heap],NULL,eax test eax,eax jz .ALLOC_FAILED ; point pEncodeBuffer to the buffer mov [pEncodeBuffer],eax ; zero the buffer cinvoke memset,eax,0,[nEncodeBytes] ; do the huffman encoding mov esi,[pFileBuffer] mov edi,[pEncodeBuffer] xor ecx,ecx .LOOP: ; get a byte mov al,[esi] cmp al,0 je .PAD test al,128 jnz .NEXT stdcall Get_Code ; the code is now in EAX shl eax,cl ; OR the bits into EDI or [edi],eax ; add the code length to CL add cl,dl ; compute the number of bytes to ; increment EDI by = floor(CL/8) xor edx,edx mov dl,cl shr dl,3 add edi,edx and cl,7 .NEXT: ; get the next byte inc esi jmp .LOOP .PAD: ; pad the buffer mov al,0xff shl al,cl or [edi],al .WRITE: stdcall Write_Encoded_Bytes .FREE: invoke HeapFree,[h_heap],NULL,[pFileBuffer] invoke HeapFree,[h_heap],NULL,[pEncodeBuffer] jmp .DONE .ALLOC_FAILED: invoke MessageBox,NULL,szAllocError,szTitle,MB_ICONERROR+MB_OK jmp .DONE .HEAP_FAILED: invoke MessageBox,NULL,szHeapError,szTitle,MB_ICONERROR+MB_OK .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Write_Encoded_Bytes ; write the encoded bytes to file ; open the file, create it if it doesn't exist invoke CreateFile,szFileName,GENERIC_WRITE,0,NULL,\ CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL,NULL test eax,eax jz .CRF_FAILED ; get the file handle mov [h_file],eax ; write the code lengths array to file stdcall Fill_Lengths_Arr mov eax,128 lea esi,[Lengths_Arr] invoke WriteFile,[h_file],esi,eax,nBytes,NULL ; write the encode buffer to file mov eax,[nEncodeBytes] invoke WriteFile,[h_file],[pEncodeBuffer],eax,nBytes,NULL .HCLOSE: invoke CloseHandle,[h_file] jmp .DONE .CRF_FAILED: invoke MessageBox,NULL,szFileError,szTitle,MB_ICONERROR+MB_OK .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Decode_File ; decode a huffman encoded file ; get a handle to the default heap for this process invoke GetProcessHeap test eax,eax jz .HEAP_FAILED mov [h_heap],eax ; open the input file invoke CreateFile,szInputFile,GENERIC_READ,FILE_SHARE_READ,NULL,\ OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,NULL test eax,eax jz .CRF_FAILED ; get the file handle mov [h_file],eax ; get the file size invoke GetFileSize,[h_file],NULL cmp eax,0xffffffff je .INVALID_SIZE cmp eax,128 jl .INVALID_SIZE mov [FileSize],eax ; allocate EAX bytes from the heap for the buffer invoke HeapAlloc,[h_heap],NULL,eax test eax,eax jz .ALLOC_FAILED ; point pFileBuffer to the buffer mov [pFileBuffer],eax ; read the contents of the file into the buffer invoke ReadFile,[h_file],[pFileBuffer],[FileSize],nBytes,NULL test eax,eax jz .READ_FAILED ; don't need the file anymore, close it invoke CloseHandle,[h_file] ; decode the buffer: pFileBuffer stdcall Decode_Buffer ; free the buffer invoke HeapFree,[h_heap],NULL,[pFileBuffer] jmp .DONE .ERR_CLEAN_UP: ; something went wrong - free the buffer invoke HeapFree,[h_heap],NULL,[pFileBuffer] .ERR_HCLOSE: ; something went wrong - close the file invoke CloseHandle,[h_file] jmp .DONE .READ_FAILED: invoke MessageBox,NULL,szReadError,szTitle,MB_ICONERROR+MB_OK jmp .ERR_CLEAN_UP .INVALID_SIZE: invoke MessageBox,NULL,szSizeError,szTitle,MB_ICONERROR+MB_OK jmp .ERR_HCLOSE .ALLOC_FAILED: invoke MessageBox,NULL,szAllocError,szTitle,MB_ICONERROR+MB_OK jmp .ERR_HCLOSE .CRF_FAILED: invoke MessageBox,NULL,szFileError,szTitle,MB_ICONERROR+MB_OK jmp .DONE .HEAP_FAILED: invoke MessageBox,NULL,szHeapError,szTitle,MB_ICONERROR+MB_OK .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Decode_Buffer ; decode the encoded data in pFileBuffer ; read the code lengths from the first ; 128 bytes of the buffer stdcall Read_Lengths ; compute the codes stdcall Codes_From_Lengths ; build the lookup table for the decoding stdcall Build_Lookup_Table ; open the output file invoke CreateFile,szOutputFile,GENERIC_WRITE,0,NULL,\ CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL,NULL test eax,eax jz .CRF_FAILED ; get the file handle mov [h_file],eax ; count the bits and bytes locals nbits db 0 bcount dd 0 endl ; start reading after the first 128 bytes mov esi,[pFileBuffer] add esi,128 lea edi,[Print_Buffer] lea ebx,[Lookup_Table] ; the number of bytes to process from the buffer mov edx,[FileSize] sub edx,128 mov [bcount],edx ; EAX counts the number of output bytes xor eax,eax ; start decoding mov dl,[esi] .BITS: ; get a bit shr dl,1 jc .BRANCH_1 .BRANCH_0: ; jump to the next entry in the lookup table xor ecx,ecx mov cl,[ebx] shl ecx,2 lea ebx,[Lookup_Table] ; set EBX to the next table entry add ebx,ecx ; test if a leaf has been reached mov cx,[ebx] cmp cl,0xff je .LEAF jmp .NEXT_BIT .BRANCH_1: ; jump to the next entry in the lookup table xor ecx,ecx mov cl,[ebx+2] shl ecx,2 lea ebx,[Lookup_Table] ; set EBX to the next table entry add ebx,ecx ; test if a leaf has been reached mov cx,[ebx+2] cmp cl,0xff je .LEAF jmp .NEXT_BIT .LEAF: ; add the decoded ASCII char to the output mov [edi],ch inc edi inc eax cmp eax,8000 jl .NEXT ; write the buffer to file lea edi,[Print_Buffer] invoke WriteFile,[h_file],edi,eax,nBytes,NULL xor eax,eax lea edi,[Print_Buffer] .NEXT: ; reset the lookup table lea ebx,[Lookup_Table] .NEXT_BIT: ; check the bit counter inc byte [nbits] test byte [nbits],8 jz .BITS mov byte [nbits],0 ; get the next byte inc esi mov dl,[esi] dec dword [bcount] cmp dword [bcount],0 jg .BITS .FINISH: ; write the buffer to file lea edi,[Print_Buffer] invoke WriteFile,[h_file],edi,eax,nBytes,NULL .HCLOSE: invoke CloseHandle,[h_file] jmp .DONE .CRF_FAILED: invoke MessageBox,NULL,szFileError,szTitle,MB_ICONERROR+MB_OK .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Build_Encode_Table ; use the code table to build the encode table ; the encode table is used for the actual encoding ; of ASCII chars ; a code in the code table is translated to a ; code in the encode table by reversing the bit order ; such that the MSB becomes the LSB ; when ASCII chars are encoded and stored in a buffer, ; the very first code will be stored at the lowest ; memory address in that buffer, with the very first ; bit representing the root node of the huffman tree ; then the tree can be walked and chars decoded just ; by right shifting bits out of the buffer lea edi,[Encode_Table] cinvoke memset,edi,0,512 lea esi,[Code_Table] lea edi,[Encode_Table] xor ecx,ecx .LOOP: mov al,[esi+3] cmp al,0 je .NEXT ; copy the length mov [edi+3],al mov dl,al mov eax,[esi] xor ebx,ebx .BITS: ; reverse the bit order by ; rotating through the carry bit rcr eax,1 rcl ebx,1 dec dl jnz .BITS ; OR the code into the encode table or [edi],ebx .NEXT: add esi,4 add edi,4 inc cl test cl,128 jz .LOOP ret endp ; ------------------------------------------------------------------------------------- proc Build_Lookup_Table uses esi edi ; use the codes in the code table ; to build the lookup table lea edi,[Lookup_Table] cinvoke memset,edi,0xff,1024 mov [Next_Node],byte 1 lea esi,[Code_Table] xor ecx,ecx .LOOP: ; for each code in the code table mov edx,ecx shl edx,2 ; get the code length mov al,[esi+edx+3] ; valid lengths are 1-23 bits cmp al,1 jl .NEXT cmp al,23 jg .NEXT ; add a path to the lookup table ; pass the code and length in EAX mov eax,[esi+edx] ; pass the char ascii value in DL mov dl,cl stdcall Add_Path .NEXT: inc cl test cl,128 jz .LOOP ret endp ; ------------------------------------------------------------------------------------- proc Add_Path uses ecx ; add a path to the lookup table ; each 4 byte entry in the table is a node ; bytes 0-1 are for the 0 branch ; bytes 2-3 are for the 1 branch ; byte 0 gives the index of the next table ; entry (node) that is reached by the 0 branch ; byte 2 gives the index of the next table ; entry (node) that is reached by the 1 branch ; if there are no further branches, then these ; bytes are set to -1 (FF) ; if the node is not a leaf, bytes 1 and 3 ; are set to -1 (FF) ; if the node is a leaf then either byte 1 or ; byte 3 gives the ASCII char value (0-127) ; and there are no further branches locals Index db 0 asc db 0 endl ; the ASCII value is passed in DL mov [asc],dl ; get the bit length into ECX mov ecx,eax bswap ecx and ecx,0xff ; minus 1 to get the offset of the msb dec cl lea edi,[Lookup_Table] ; test all of the bits in the code stored ; in EAX to add a path to the lookup table .LOOP: xor edx,edx mov dl,[Index] shl edx,2 ; test the bit at the offset given by ECX bt eax,ecx jc .BRANCH_1 .BRANCH_0: ; the next node is reached by branching ; to the left (0) mov bl,[edi+edx] ; is there an index for the next node? cmp bl,0xff ; if there isn't, add an entry to the table je .ADD_0 ; store the index of the next table entry mov [Index],bl jmp .NEXT .BRANCH_1: ; the next node is reached by branching ; to the right (1) mov bl,[edi+edx+2] ; is there an index for the next node? cmp bl,0xff ; if there isn't, add an entry to the table je .ADD_1 ; store the index of the next table entry mov [Index],bl jmp .NEXT .ADD_0: ; create an entry in the table for the next node ; which is on the branch 0 from the current node mov bl,[Next_Node] ; link the nodes mov [edi+edx],bl ; jump to the next node in the next iteration mov [Index],bl inc byte [Next_Node] jmp .NEXT .ADD_1: ; create an entry in the table for the next node ; which is on the branch 1 from the current node mov bl,[Next_Node] ; link the nodes mov [edi+edx+2],bl ; jump to the next node in the next iteration mov [Index],bl inc byte [Next_Node] .NEXT: ; the lsb in the code is a leaf cmp ecx,0 je .LEAF ; update the bit offset counter dec ecx jmp .LOOP .LEAF: xor edx,edx mov dl,[Index] shl edx,2 ; a leaf has been reached ; test the last bit in the code ; to get the branch (0 or 1) bt eax,0 jc .ASC_1 .ASC_0: ; write the ASCII value mov al,[asc] mov [edi+edx+1],al jmp .DONE .ASC_1: ; write the ASCII value mov al,[asc] mov [edi+edx+3],al .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Codes_From_Lengths uses esi edi ; use the algorithm in RFC 1951 (3.2.2) ; to generate codes from the bit lengths ; Next_Code holds the smallest code ; for each code length lea edi,[Next_Code] cinvoke memset,edi,0,96 ; Length_Counts stores the number of ; codes for each code length lea esi,[Length_Counts] lea edi,[Next_Code] ; EAX stores the code xor eax,eax xor ecx,ecx .LOOP: xor edx,edx mov dl,[esi+ecx] ; code = (code + Length_Counts[CL]) add eax,edx ; code = SHL(code,1) shl eax,1 mov edx,ecx inc edx shl edx,2 ; Next_Code[CL+1] = code mov [edi+edx],eax inc cl cmp cl,23 jl .LOOP ; use Next_Code to generate codes ; store the codes in Code_Table lea esi,[Next_Code] lea edi,[Code_Table] ; CL is the index into Code_Table ; CL is the ASCII value (0-127) xor ecx,ecx ; each entry in Code_Table is 4 bytes ; first 3 bytes is the code ; 4th byte is the length .CALC_CODES: mov edx,ecx shl edx,2 add edx,3 ; for CL, get the length of the code mov al,[edi+edx] ; a valid code length is 1-23 bits cmp al,0 je .NEXT cmp al,23 jg .NEXT ; look up the next code xor edx,edx mov dl,al shl edx,2 ; code = Next_Code[length] mov eax,[esi+edx] ; Next_Code[length]++ inc dword [esi+edx] mov edx,ecx shl edx,2 ; OR the code into Code_Table[CL] ; upper byte of code is always zero ; so length stored in 4th byte is unchanged or [edi+edx],eax .NEXT: inc cl test cl,128 jz .CALC_CODES ret endp ; ------------------------------------------------------------------------------------- proc Get_Lengths ; extract the lengths from the string captured ; from the Edit window (in the print buffer) ; store the lengths in Code_Table lea edi,[Code_Table] cinvoke memset,edi,0,512 ; count the lengths in the Length_Counts array lea edi,[Length_Counts] cinvoke memset,edi,0,24 lea esi,[Print_Buffer] lea edi,[Code_Table] locals Length db 0 endl xor ecx,ecx ; the string is a comma delimited ; list of decimal numbers .LOOP: mov al,[esi] cmp al,0 je .NEXT cmp al,',' je .NEXT cmp al,'0' jl .SKIP cmp al,'9' jg .SKIP ; a decimal digit has been read sub al,'0' ; multiply the previous Length by 10 ; and then add the new digit mov ah,[Length] shl ah,3 add al,ah mov ah,[Length] shl ah,1 add al,ah mov [Length],al .SKIP: inc esi jmp .LOOP .NEXT: ; max length is 23 bits cmp byte [Length],24 jl .OK mov [Length],byte 0 .OK: mov al,byte [Length] ; store the length in Code_Table ; each entry is 4 bytes ; the first 3 bytes store the code ; the 4th byte is the length ; CL is the index mov edx,ecx shl edx,2 add edx,3 ; write the length mov [edi+edx],al ; update the Length_Counts array stdcall Add_Count mov [Length],byte 0 ; test for null terminator mov al,[esi] cmp al,0 je .DONE inc esi inc cl test cl,128 jz .LOOP .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Fill_Lengths_Arr ; copy the code lengths from the encode table ; to the lengths array lea esi,[Encode_Table] lea edi,[Lengths_Arr] xor ecx,ecx .LOOP: mov al,[esi+3] mov [edi],al add esi,4 inc edi inc cl test cl,128 jz .LOOP ret endp ; ------------------------------------------------------------------------------------- proc Add_Count uses ecx edi ; the length is passed in AL ; increment the element indexed by AL lea edi,[Length_Counts] ; valid bit lengths = 1-23 cmp al,0 je .DONE cmp al,24 jge .DONE xor ecx,ecx mov cl,al inc byte [edi+ecx] .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Read_Lengths ; the first 128 bytes of a compressed file ; contain the code lengths from which the ; huffman tree can be built ; store the lengths in Code_Table lea edi,[Code_Table] cinvoke memset,edi,0,512 ; count the lengths in the Length_Counts array lea edi,[Length_Counts] cinvoke memset,edi,0,24 mov esi,[pFileBuffer] lea edi,[Code_Table] xor ecx,ecx .LOOP: mov al,[esi] mov [edi+3],al ; update the Length_Counts array stdcall Add_Count inc esi add edi,4 inc cl test cl,128 jz .LOOP ret endp ; ------------------------------------------------------------------------------------- proc Get_Code uses esi ecx ; look up the code in the encode table ; AL is the index ; return the code in EAX ; return the length in DL lea esi,[Encode_Table] xor ecx,ecx mov cl,al shl ecx,2 mov eax,[esi+ecx] mov edx,eax and eax,0xffffff bswap edx ret endp ; ------------------------------------------------------------------------------------- proc Calc_Freq_Table ; for the text in the Edit window ; count the frequency of each ASCII char (0-127) lea edi,[Freq_Table] cinvoke memset,edi,0,512 lea edi,[Freq_Copy] cinvoke memset,edi,0,512 ; get a handle to the default heap for this process invoke GetProcessHeap test eax,eax jz .HEAP_FAILED mov [h_heap],eax ; get the number of bytes in the Edit window invoke GetWindowTextLength,[h_wnd_edit] ; +1 for the null terminator inc eax mov [nBytes],eax ; allocate nBytes from the heap for the buffer invoke HeapAlloc,[h_heap],NULL,eax test eax,eax jz .ALLOC_FAILED ; point pFileBuffer to the buffer mov [pFileBuffer],eax ; read the contents of the Edit window into the buffer invoke GetWindowText,[h_wnd_edit],[pFileBuffer],[nBytes] ; start counting mov esi,[pFileBuffer] lea edi,[Freq_Table] xor ecx,ecx .LOOP: xor eax,eax mov al,[esi] cmp al,0 je .COPY test al,128 jnz .SKIP shl eax,2 inc dword [edi+eax] .SKIP: inc esi inc ecx cmp ecx,[nBytes] jl .LOOP .COPY: ; copy the freq table lea esi,[Freq_Table] lea edi,[Freq_Copy] xor cl,cl .NEXT: mov eax,[esi] mov [edi],eax add esi,4 add edi,4 inc cl test cl,128 jz .NEXT .FREE: invoke HeapFree,[h_heap],NULL,[pFileBuffer] jmp .DONE .ALLOC_FAILED: invoke MessageBox,NULL,szAllocError,szTitle,MB_ICONERROR+MB_OK jmp .DONE .HEAP_FAILED: invoke MessageBox,NULL,szHeapError,szTitle,MB_ICONERROR+MB_OK .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Get_Two_Lowest ; scan the frequency table ; get the two lowest frequencies (greater than zero) ; return indexes in AX locals index db 1 endl ; ECX and EDX store the 2 lowest frequencies ; with EDX greater than or equal to ECX mov ecx,0x7fffffff mov edx,0x7fffffff ; set AL and AH to zero xor ax,ax lea esi,[Freq_Table] add esi,4 .SCAN: ; if the freq is zero, do nothing test [esi],dword 0xffffffff jz .NEXT ; compare freq against EDX cmp edx,[esi] jle .NEXT mov edx,[esi] mov ah,[index] ; make sure that EDX is still GTE ECX cmp edx,ecx jge .NEXT ; swap ECX and EDX xchg ecx,edx ; swap AL and AH xchg al,ah .NEXT: add esi,4 inc byte [index] test byte [index],128 jz .SCAN .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Calc_Encoded_Size ; compute the size in bits of the encoded data ; for each ASCII char, multiply the frequency ; by the bit length of the huffman code lea esi,[Freq_Copy] lea edi,[Encode_Table] xor ebx,ebx xor ecx,ecx .LOOP: xor eax,eax mov al,[edi+3] ; skip if the code length is zero cmp al,0 je .NEXT ; get the frequency mov edx,[esi] ; multiply by the accumulator EAX mul edx ; the total is stored in EBX add ebx,eax .NEXT: add esi,4 add edi,4 inc cl test cl,128 jz .LOOP ; convert to bytes mov eax,ebx test eax,7 jz .DONE and eax,0xfffffff8 inc eax .DONE: shr eax,3 ret endp ; ------------------------------------------------------------------------------------- proc Dword_To_Hex uses ecx ; convert 4 bytes to 8 hex digits ; dword passed in EAX ; result returned in EDX and EAX ; given EAX = [b1,b2,b3,b4] in little endian ; then the 4 bytes are printed as a number ; b4 b3 b2 b1 mov ecx,eax bswap ecx mov al,ch stdcall Byte_To_Hex mov dx,ax shl edx,16 mov al,cl stdcall Byte_To_Hex mov dx,ax shr ecx,16 mov al,ch stdcall Byte_To_Hex shl eax,16 mov al,cl stdcall Byte_To_Hex ret endp ; ------------------------------------------------------------------------------------- proc DD_To_Hex uses ecx ; print 4 bytes as 8 hex digits ; dword passed in EAX ; result returned in EDX and EAX ; given EAX = [b1,b2,b3,b4] in little endian ; then the 4 bytes are printed as a hex dump ; b1 b2 b3 b4 mov ecx,eax mov al,ch stdcall Byte_To_Hex mov dx,ax shl edx,16 mov al,cl stdcall Byte_To_Hex mov dx,ax shr ecx,16 mov al,ch stdcall Byte_To_Hex shl eax,16 mov al,cl stdcall Byte_To_Hex ret endp ; ------------------------------------------------------------------------------------- proc Byte_To_Decimal uses ecx ; print the byte in AL as a decimal number ; AL must be less than 100 cmp al,99 jge .99 xor ecx,ecx xor edx,edx .10: cmp al,10 jl .D1 sub al,10 inc cl jmp .10 .D1: mov dl,cl add dl,'0' xor ecx,ecx .1: cmp al,0 je .D2 sub al,1 inc cl jmp .1 .D2: mov dh,cl add dh,'0' mov ax,dx jmp .DONE .99: mov ax,0x3939 .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Byte_To_Hex ; convert 1 byte to 2 hex digits ; byte passed in AL ; result returned in AX mov ah,al shr al,4 and ah,0xf cmp al,9 jg .AF0 add al,'0' jmp .AH .AF0: sub al,10 add al,'A' .AH: cmp ah,9 jg .AF1 add ah,'0' jmp .DONE .AF1: sub ah,10 add ah,'A' .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Print_Lengths ; print the code lengths as a comma delimited ; string of decimal values lea esi,[Code_Table] lea edi,[Print_Buffer] xor ecx,ecx .LOOP: ; each entry in the code table is 4 bytes ; the 4th byte is the length in bits mov edx,ecx shl edx,2 add edx,3 mov al,[esi+edx] stdcall Byte_To_Decimal mov [edi],ax add edi,2 mov [edi],byte ',' inc edi inc cl test cl,128 jz .LOOP dec edi mov [edi],byte 0 ret endp ; ------------------------------------------------------------------------------------- proc Print_Lookup_Table ; print the lookup table ; 256 4 byte entries lea esi,[Lookup_Table] lea edi,[Print_Buffer] xor ecx,ecx xor edx,edx .LOOP: ; print the index (0-255) mov al,cl stdcall Byte_To_Hex mov [edi],ax add edi,2 mov [edi],byte 32 inc edi mov [edi],byte 32 inc edi mov edx,ecx shl edx,2 mov eax,[esi+edx] ; print the 4 byte lookup table entry stdcall DD_To_Hex ; first 2 bytes mov [edi],edx add edi,4 mov [edi],byte 32 inc edi ; next 2 bytes mov [edi],eax add edi,4 mov [edi],byte 13 inc edi mov [edi],byte 10 inc edi inc cx test cx,256 jz .LOOP .DONE: mov [edi],byte 0 ret endp ; ------------------------------------------------------------------------------------- proc Print_Code_Table ; display the Code_Table lea esi,[Code_Table] lea edi,[Print_Buffer] locals Length db 1 endl ; order the codes by length, then by ASCII value ; valid lengths are 1-23 bits .NEXT_LENGTH: ; CL is the ASCII value (0-127) xor ecx,ecx .LOOP: ; each code table entry is 4 bytes ; 4th byte is the length mov eax,ecx shl eax,2 mov al,[esi+eax+3] cmp al,[Length] jne .NEXT mov al,cl stdcall Byte_To_Hex mov [edi],ax add edi,2 mov [edi],byte 32 inc edi mov [edi],byte 32 inc edi mov al,cl cmp al,31 jg .CHAR mov [edi],byte 32 inc edi jmp .CODE .CHAR: mov [edi],al inc edi .CODE: mov [edi],byte 32 inc edi mov [edi],byte 32 inc edi mov eax,ecx shl eax,2 mov eax,[esi+eax] stdcall Print_Code mov [edi],byte 13 inc edi mov [edi],byte 10 inc edi .NEXT: inc cl test cl,128 jz .LOOP ; increment the length and then repeat the ; loop to print any codes of that length inc byte [Length] cmp byte [Length],24 jl .NEXT_LENGTH .DONE: mov [edi],byte 0 ret endp ; ------------------------------------------------------------------------------------- proc Print_Code uses ecx ; code table entry passed in EAX ; write binary digits to EDI mov edx,eax ; first print the length bswap eax stdcall Byte_To_Hex mov [edi],ax add edi,2 mov [edi],byte 32 inc edi mov [edi],byte 32 inc edi mov eax,edx bswap edx ; the length is in DL ; shift left EAX by (32-DL) bits mov cl,32 sub cl,dl shl eax,cl ; print the bits in the code mov cl,dl .LOOP: shl eax,1 jc .1 mov [edi],byte '0' jmp .NEXT .1: mov [edi],byte '1' .NEXT: inc edi dec cl cmp cl,0 je .DONE jmp .LOOP .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Print_Tree ; print the Leaf_Arr and Node_Arr data structures lea esi,[Leaf_Arr] lea edi,[Print_Buffer] xor ecx,ecx .LEAVES: ; print value mov al,[esi] stdcall Byte_To_Hex mov [edi],ax add edi,2 inc esi ; print parent node index mov al,[esi] stdcall Byte_To_Hex mov [edi],ax add edi,2 inc esi ; print a space mov [edi],byte 32 inc edi inc cl test cl,128 jz .LEAVES ; print CRLF mov [edi],byte 13 inc edi mov [edi],byte 10 inc edi ; print CRLF mov [edi],byte 13 inc edi mov [edi],byte 10 inc edi lea esi,[Node_Arr] xor ecx,ecx .NODES: ; print parent node index mov al,[esi] stdcall Byte_To_Hex mov [edi],ax add edi,2 inc esi ; print a space mov [edi],byte 32 inc edi inc cl test cl,128 jz .NODES mov [edi],byte 0 ret endp ; ------------------------------------------------------------------------------------- proc Print_Freq_Table ; print the frequency table lea esi,[Freq_Copy] lea edi,[Print_Buffer] xor ecx,ecx .LOOP: mov eax,ecx shl eax,2 mov eax,[esi+eax] ; skip if the frequency is zero cmp eax,0 je .NEXT ; print the ASCII value mov al,cl stdcall Byte_To_Hex mov [edi],ax add edi,2 mov [edi],byte 32 inc edi mov [edi],byte 32 inc edi mov al,cl cmp al,31 jg .CHAR mov [edi],byte 32 inc edi jmp .COUNT .CHAR: ; print the char mov [edi],al inc edi .COUNT: mov [edi],byte 32 inc edi mov [edi],byte 32 inc edi mov eax,ecx shl eax,2 mov eax,[esi+eax] ; print the freq as a 4 byte hex number stdcall Dword_To_Hex ; first 2 bytes mov [edi],edx add edi,4 mov [edi],byte 32 inc edi ; next 2 bytes mov [edi],eax add edi,4 mov [edi],byte 32 inc edi mov [edi],byte 32 inc edi ; print the index of the leaf mov al,cl stdcall Find_Leaf stdcall Byte_To_Hex mov [edi],ax add edi,2 mov [edi],byte 13 inc edi mov [edi],byte 10 inc edi .NEXT: inc cl test cl,128 jz .LOOP mov [edi],byte 0 .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Find_Leaf uses ecx esi ; ASCII char value passed in AL ; find the index of the leaf for this char lea esi,[Leaf_Arr] xor ecx,ecx .FIND: cmp al,[esi] je .OK add esi,2 inc cl test cl,128 jz .FIND ; no leaf for the char ; return -1 mov al,0xff jmp .DONE .OK: ; return the index in AL mov al,cl .DONE: ret endp ; ------------------------------------------------------------------------------------- proc GetInputFileName ; string copy to the input file name lea esi,[szFileName] lea edi,[szInputFile] .LOOP: mov al,[esi] mov [edi],al cmp al,0 je .DONE inc esi inc edi jmp .LOOP .DONE: ret endp ; ------------------------------------------------------------------------------------- proc GetOutputFileName ; string copy to the output file name lea esi,[szFileName] lea edi,[szOutputFile] .LOOP: mov al,[esi] mov [edi],al cmp al,0 je .DONE inc esi inc edi jmp .LOOP .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Reset_Arrays ; when building a tree from the code lengths ; reset the arrays that are not needed lea edi,[Freq_Table] cinvoke memset,edi,0,512 lea edi,[Freq_Copy] cinvoke memset,edi,0,512 lea edi,[Leaf_Arr] cinvoke memset,edi,0xff,256 lea edi,[Node_Arr] cinvoke memset,edi,0xff,128 lea edi,[Node_Index] cinvoke memset,edi,0xff,128 ret endp ; ------------------------------------------------------------------------------------- proc Text_From_File uses ebx esi edi ; load the contents of the file into the Edit window ; get a handle to the default heap for this process invoke GetProcessHeap test eax,eax jz .HEAP_FAILED mov [h_heap],eax ; open the file invoke CreateFile,szFileName,GENERIC_READ,FILE_SHARE_READ,NULL,\ OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL,NULL test eax,eax jz .CRF_FAILED ; get the file handle mov [h_file],eax ; get the file size invoke GetFileSize,[h_file],NULL cmp eax,0xffffffff je .INVALID_SIZE mov [FileSize],eax ; allocate (FileSize + 1) bytes, with the extra byte for the null terminator inc eax ; allocate nBytes from the heap for the buffer invoke HeapAlloc,[h_heap],NULL,eax test eax,eax jz .ALLOC_FAILED ; point pFileBuffer to the buffer mov [pFileBuffer],eax ; read the contents of the file into the buffer invoke ReadFile,[h_file],[pFileBuffer],[FileSize],nBytes,NULL test eax,eax jz .READ_FAILED ; null terminate the file buffer mov esi,[pFileBuffer] add esi,[FileSize] mov [esi],byte 0 ; set the text of the Edit window invoke SetWindowText,[h_wnd_edit],[pFileBuffer] .FREE: invoke HeapFree,[h_heap],NULL,[pFileBuffer] .HCLOSE: invoke CloseHandle,[h_file] jmp .DONE .READ_FAILED: invoke MessageBox,NULL,szReadError,szTitle,MB_ICONERROR+MB_OK jmp .FREE .INVALID_SIZE: invoke MessageBox,NULL,szSizeError,szTitle,MB_ICONERROR+MB_OK jmp .HCLOSE .ALLOC_FAILED: invoke MessageBox,NULL,szAllocError,szTitle,MB_ICONERROR+MB_OK jmp .HCLOSE .CRF_FAILED: invoke MessageBox,NULL,szFileError,szTitle,MB_ICONERROR+MB_OK jmp .DONE .HEAP_FAILED: invoke MessageBox,NULL,szHeapError,szTitle,MB_ICONERROR+MB_OK .DONE: ret endp ; ------------------------------------------------------------------------------------- proc Text_To_File uses ebx esi edi ; save the contents of the Edit window to the file ; get a handle to the default heap for this process invoke GetProcessHeap test eax,eax jz .HEAP_FAILED mov [h_heap],eax ; get the number of bytes in the Edit window invoke GetWindowTextLength,[h_wnd_edit] ; +1 for the null terminator inc eax mov [nBytes],eax ; allocate nBytes from the heap for the buffer invoke HeapAlloc,[h_heap],NULL,eax test eax,eax jz .ALLOC_FAILED ; point pFileBuffer to the buffer mov [pFileBuffer],eax ; open the file, create it if it doesn't exist invoke CreateFile,szFileName,GENERIC_WRITE,0,NULL,\ CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL,NULL test eax,eax jz .CRF_FAILED ; get the file handle mov [h_file],eax ; read the contents of the Edit window into the buffer invoke GetWindowText,[h_wnd_edit],[pFileBuffer],[nBytes] ; EAX now contains the size of pFileBuffer, not including the null terminator ; write the buffer to the file invoke WriteFile,[h_file],[pFileBuffer],eax,nBytes,NULL .HCLOSE: invoke CloseHandle,[h_file] .FREE: invoke HeapFree,[h_heap],NULL,[pFileBuffer] jmp .DONE .ALLOC_FAILED: invoke MessageBox,NULL,szAllocError,szTitle,MB_ICONERROR+MB_OK jmp .DONE .CRF_FAILED: invoke MessageBox,NULL,szFileError,szTitle,MB_ICONERROR+MB_OK jmp .FREE .HEAP_FAILED: invoke MessageBox,NULL,szHeapError,szTitle,MB_ICONERROR+MB_OK .DONE: ret endp ; ------------------------------------------------------------------------------------- proc File_Open,hwnd ; display the GetOpenFileName dialog to get the file name ; zero ofn - the OPENFILENAME struct mov eax,sizeof.OPENFILENAME cinvoke memset,ofn,0,eax ; initialise the members of ofn mov eax,sizeof.OPENFILENAME mov [ofn.lStructSize],eax mov eax,[hwnd] mov [ofn.hwndOwner],eax ; the filter mov eax,szFilter mov [ofn.lpstrFilter],eax lea eax,[szFileName] mov [ofn.lpstrFile],eax mov eax,MAX_PATH mov [ofn.nMaxFile],eax mov eax,OFN_EXPLORER or eax,OFN_FILEMUSTEXIST or eax,OFN_HIDEREADONLY mov [ofn.Flags],eax mov eax,szDefFileExtension mov [ofn.lpstrDefExt],eax ; display the dialog to get szFileName invoke GetOpenFileName,ofn ret endp ; ------------------------------------------------------------------------------------- proc File_Save_As,hwnd ; display the GetSaveFileName dialog to get the file name ; zero ofn - the OPENFILENAME struct mov eax,sizeof.OPENFILENAME cinvoke memset,ofn,0,eax ; initialise the members of ofn mov eax,sizeof.OPENFILENAME mov [ofn.lStructSize],eax mov eax,[hwnd] mov [ofn.hwndOwner],eax ; the filter mov eax,szFilter mov [ofn.lpstrFilter],eax lea eax,[szFileName] mov [ofn.lpstrFile],eax mov eax,MAX_PATH mov [ofn.nMaxFile],eax mov eax,OFN_EXPLORER or eax,OFN_PATHMUSTEXIST or eax,OFN_HIDEREADONLY or eax,OFN_OVERWRITEPROMPT mov [ofn.Flags],eax mov eax,szDefFileExtension mov [ofn.lpstrDefExt],eax ; display the dialog to get szFileName invoke GetSaveFileName,ofn ret endp ; ------------------------------------------------------------------------------------- section '.idata' import data readable writeable library kernel,'KERNEL32.DLL',\ user,'USER32.DLL',\ comdlg32,'COMDLG32.DLL',\ gdi,'GDI32.DLL',\ msvcrt,'MSVCRT.DLL' import kernel,\ GetModuleHandle,'GetModuleHandleA',\ ExitProcess,'ExitProcess',\ GetFileSize,'GetFileSize',\ CloseHandle,'CloseHandle',\ CreateFile,'CreateFileA',\ ReadFile,'ReadFile',\ WriteFile,'WriteFile',\ GetProcessHeap,'GetProcessHeap',\ HeapAlloc,'HeapAlloc',\ HeapFree,'HeapFree' import user,\ RegisterClassEx,'RegisterClassExA',\ CreateWindowEx,'CreateWindowExA',\ ShowWindow,'ShowWindow',\ UpdateWindow,'UpdateWindow',\ MoveWindow,'MoveWindow',\ GetMessage,'GetMessageA',\ SendMessage,'SendMessageA',\ TranslateMessage,'TranslateMessage',\ DispatchMessage,'DispatchMessageA',\ MessageBox,'MessageBoxA',\ DefWindowProc,'DefWindowProcA',\ PostQuitMessage,'PostQuitMessage',\ SetFocus,'SetFocus',\ LoadIcon,'LoadIconA',\ LoadMenu,'LoadMenuA',\ LoadCursor,'LoadCursorA',\ SetWindowText,'SetWindowTextA',\ GetWindowText,'GetWindowTextA',\ GetWindowTextLength,'GetWindowTextLengthA',\ LoadAccelerators,'LoadAcceleratorsA',\ TranslateAccelerator,'TranslateAcceleratorA' import comdlg32,\ GetOpenFileName,'GetOpenFileNameA',\ GetSaveFileName,'GetSaveFileNameA' import gdi,\ CreateFont,'CreateFontA',\ DeleteObject,'DeleteObject' import msvcrt,\ memset,'memset' ; ------------------------------------------------------------------------------------- section '.data' readable writeable szClass TCHAR "Win32app",0 szTitle TCHAR "Huffman Encoding",0 szEdit TCHAR "EDIT",0 szRegError TCHAR "Call to RegisterClassEx failed!",0 szCreateError TCHAR "Call to CreateWindowEx failed!",0 szFontError TCHAR "Call to CreateFont failed!",0 szFileError TCHAR "Call to CreateFile failed!",0 szReadError TCHAR "Call to ReadFile failed!",0 szAllocError TCHAR "Call to HeapAlloc failed!",0 szHeapError TCHAR "Call to GetProcessHeap failed!",0 szSizeError TCHAR "Call to GetFileSize failed!",0 szAccelError TCHAR "Call to LoadAccelerators failed!",0 szFontFace TCHAR "Lucida Console",0 szDefFileExtension TCHAR "txt",0 ; Note that the filter string is double null terminated: szFilter TCHAR "Text Files (*.txt)",0,"*.txt",0,"All Files (*.*)",0,"*.*",0,0 szFileName rb MAX_PATH szInputFile rb MAX_PATH szOutputFile rb MAX_PATH ofn OPENFILENAME wcex WNDCLASSEX msg MSG h_wnd dd 0 h_wnd_edit dd 0 h_font dd 0 h_menu dd 0 h_file dd 0 h_heap dd 0 h_accel dd 0 FileSize dd 0 nBytes dd 0 nEncodeBytes dd 0 pFileBuffer dd 0 pEncodeBuffer dd 0 Print_Buffer rb 8192 Freq_Table rb 512 Freq_Copy rb 512 Code_Table rb 512 Encode_Table rb 512 Lengths_Arr rb 128 Node_Index rb 128 Leaf_Arr rb 256 Node_Arr rb 128 Lookup_Table rb 1024 Length_Counts rb 24 Next_Code rb 96 Next_Leaf db 0 Next_Node db 0 ; ------------------------------------------------------------------------------------- section '.rc' resource data readable directory RT_MENU,menus,\ RT_ACCELERATOR,accelerators resource menus,IDR_MY_MENU,LANG_ENGLISH+SUBLANG_DEFAULT,my_menu resource accelerators,IDR_ACCEL,LANG_ENGLISH+SUBLANG_DEFAULT,accel_keys menu my_menu menuitem 'File',IDM_FILE,MFR_POPUP menuitem 'New',IDM_NEW menuitem 'Open',IDM_OPEN menuitem 'Save',IDM_SAVE menuitem 'Save As',IDM_SAVE_AS menuseparator menuitem 'Exit',IDM_EXIT,MFR_END menuitem 'Tree From Text',IDM_TREE_TEXT,MFR_POPUP menuitem 'Tree From Text',IDM_TREE_FROM_TEXT menuitem 'Display Frequency Table',IDM_PRINT_FREQS menuitem 'Display Tree',IDM_PRINT_TREE menuitem 'Display Code Table',IDM_PRINT_CODES menuitem 'Display Lookup Table',IDM_PRINT_LOOKUP menuitem 'Display Lengths',IDM_PRINT_LENGTHS,MFR_END menuitem 'Tree From Lengths',IDM_TREE_LENGTH,MFR_POPUP menuitem 'Tree From Lengths',IDM_TREE_FROM_LENGTHS menuitem 'Display Code Table',IDM_PRINT_CODES menuitem 'Display Lookup Table',IDM_PRINT_LOOKUP,MFR_END menuitem 'Compression',IDM_COMPRESS,MFR_POPUP+MFR_END menuitem 'Encode Text',IDM_ENCODE_TEXT menuitem 'Decode File',IDM_DECODE_FILE,MFR_END accelerator accel_keys,\ FVIRTKEY+FNOINVERT+FCONTROL,'N',IDM_NEW,\ FVIRTKEY+FNOINVERT+FCONTROL,'O',IDM_OPEN,\ FVIRTKEY+FNOINVERT+FCONTROL,'S',IDM_SAVE,\ FVIRTKEY+FNOINVERT+FCONTROL,'A',ID_SELECT_ALL ; ------------------------------------------------------------------------------------- |