Nakamichi 'Hayabusa' a.k.a. 'Peregrine'
Falconinely fast ( ͡⚆ ͜ʖ ͡⚆) textual decompressor

/Photo by Masaki/
高速伸長
Above four kanjis were written as a reply to paying my respects to Prof. Okumura, they are the motto/banner of this page.
Fastest textual decompression is targeted, here comes Nakamichi.
Update, 2015-Feb-12:
Couldn't resist and finally wrote 'Hayabusa' 隼 (peregrine falcon), a lightweighted 'Tengu' with Match Lengths 8 and 4.
Nakamichi 'Tengu' is a native XMM (128bit) decompression etude, 1MB Sliding Window is in use.
Nakamichi 'Hayabusa' is a native GP (64bit) decompression etude, 2MB Sliding Window is in use.
The beauty of this last Nakamichi variant is in its:
- licenselessness (100% FREE); ( ͡⚆ ͜ʖ ͡⚆)
- falconine textual decompression speed;
- potential ability to remove in future the second reload (in the literal branch), exploiting the fact of 7 bytes MAX long literals.
Since 'Hayabusa' targets only Enlgish texts, in my opinion, the most relevant such benchmark has to be executed on the queen of English prose:
02/13/2015 03:45 AM 33,258,496 Agatha_Christie_89-ebooks_TXT.tar
02/13/2015 03:46 AM 13,365,137 Agatha_Christie_89-ebooks_TXT.tar.lz4
02/13/2015 04:09 AM 13,148,167 Agatha_Christie_89-ebooks_TXT.tar.Nakamichi
D:\_KAZE\EBOOKs_for_benching\z>..\lz4 -9 -Sx -b -T1 Agatha_Christie_89-ebooks_TXT.tar
Nb of threads = 1 ; Compression Level = 9
Agatha_Christie : 33258496 -> 13365090 ( 40.19%), 12.6 MB/s , 908.7 MB/s
D:\_KAZE\EBOOKs_for_benching\z>..\Nakamichi_Hayabusa_GP.exe Agatha_Christie_89-ebooks_TXT.tar.Nakamichi /benchmark
Nakamichi 'Hayabusa', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 13148167 bytes ...
RAM-to-RAM performance: 896 MB/s.
Memory pool starting address: 00000000011D0080 ... 64 byte aligned, OK
Copying a 512MB block 1024 times i.e. 512GB READ + 512GB WRITTEN ...
memcpy(): (512MB block); 524288MB copied in 193533 clocks or 2.709MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 33%
'Hayabusa' happened to outspeed one of the two LZ kings (LZ4) on OSHO.TXT reaching 960 MB/s vs 947.4 MB/s,
but this is the exception that validates the rule, heh-heh.
unsigned int Decompress (char* ret, char* src, unsigned int srcSize) {
char* retLOCAL = ret;
char* srcLOCAL = src;
char* srcEndLOCAL = src+srcSize;
unsigned int DWORDtrio;
unsigned int DWORDtrioDumbo;
while (srcLOCAL < srcEndLOCAL) {
DWORDtrio = *(unsigned int*)srcLOCAL;
#ifndef _N_GP
#ifdef _N_prefetch_64
_mm_prefetch((char*)(srcLOCAL + 64), _MM_HINT_T0);
#endif
#ifdef _N_prefetch_128
_mm_prefetch((char*)(srcLOCAL + 64*2), _MM_HINT_T0);
#endif
#ifdef _N_prefetch_4096
_mm_prefetch((char*)(srcLOCAL + 64*64), _MM_HINT_T0);
#endif
#endif
// |1stLSB |2ndLSB |3rdLSB |
// -------------------------------
// |OO|L|xxxxx|xxxxxxxx|xxxxxx|xx|
// -------------------------------
// [1bit 16bit] 24bit]
// L = 0b means Long MatchLength, (8-L) or 8
// L = 1b means Long MatchLength, (8-L) or 4
// OO = 00b means Literal
// OO = 01b MatchOffset, 0xFFFFFFFF>>OO, 3 bytes long i.e. Sliding Window is 3*8-L-OO=3*8-3=21 or 2MB
// OO = 10b MatchOffset, 0xFFFFFFFF>>OO, 2 bytes long i.e. Sliding Window is 2*8-L-OO=2*8-3=13 or 8KB
// OO = 11b MatchOffset, 0xFFFFFFFF>>OO, 1 byte long i.e. Sliding Window is 1*8-L-OO=1*8-3=5 or 32B
if ( (DWORDtrio & 0x03) == 0x00 ) {
#ifdef _N_GP
memcpy(retLOCAL, (const char *)( (uint64_t)(srcLOCAL+1) ), 8);
#endif
#ifdef _N_XMM
SlowCopy128bit( (const char *)( (uint64_t)(srcLOCAL+1) ), retLOCAL );
#endif
retLOCAL+= ((DWORDtrio & 0xFF)>>3);
srcLOCAL+= ((DWORDtrio & 0xFF)>>3)+1;
} else {
DWORDtrioDumbo = (DWORDtrio & 0x03)<<3; // To avoid 'LEA'
#ifdef _N_GP
memcpy(retLOCAL, (const char *)( (uint64_t)(retLOCAL-((DWORDtrio&(0xFFFFFFFF>>DWORDtrioDumbo))>>3)) ), 8);
#endif
#ifdef _N_XMM
SlowCopy128bit( (const char *)( (uint64_t)(retLOCAL-((DWORDtrio&(0xFFFFFFFF>>DWORDtrioDumbo))>>3)) ), retLOCAL );
#endif
retLOCAL+= 8-(DWORDtrio&0x04);
srcLOCAL+= 4-(DWORDtrioDumbo>>3);
}
}
return (unsigned int)(retLOCAL - ret);
}
; 'Hayabusa' decompression loop, 6d-14+2=91 bytes long:
; mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140";
; mark_description "-O3 -D_N_GP -FAcs";
.B6.3::
00014 41 8b 01 mov eax, DWORD PTR [r9]
00017 89 c1 mov ecx, eax
00019 83 e1 03 and ecx, 3
0001c 75 17 jne .B6.5
.B6.4::
0001e 0f b6 c0 movzx eax, al
00021 c1 e8 03 shr eax, 3
00024 49 8b 49 01 mov rcx, QWORD PTR [1+r9]
00028 49 89 0a mov QWORD PTR [r10], rcx
0002b 4c 03 d0 add r10, rax
0002e ff c0 inc eax
00030 4c 03 c8 add r9, rax
00033 eb 35 jmp .B6.6
.B6.5::
00035 c1 e1 03 shl ecx, 3
00038 41 bb ff ff ff
ff mov r11d, -1
0003e 41 d3 eb shr r11d, cl
00041 44 23 d8 and r11d, eax
00044 83 e0 04 and eax, 4
00047 41 c1 eb 03 shr r11d, 3
0004b f7 d8 neg eax
0004d 83 c0 08 add eax, 8
00050 49 f7 db neg r11
00053 4d 03 da add r11, r10
00056 c1 e9 03 shr ecx, 3
00059 f7 d9 neg ecx
0005b 83 c1 04 add ecx, 4
0005e 4d 8b 1b mov r11, QWORD PTR [r11]
00061 4d 89 1a mov QWORD PTR [r10], r11
00064 4c 03 d0 add r10, rax
00067 4c 03 c9 add r9, rcx
.B6.6::
0006a 4d 3b c8 cmp r9, r8
0006d 72 a5 jb .B6.3
The package (181KB) containing the source:
http://www.sanmayce.com/Hayabusa/Nakamichi_Hayabusa_Intel15.zip

And the performance with epub2txt 'Autobiography_411-ebooks_Collection.tar' corpus:
10/05/2014 03:23 AM 273,401,856 Autobiography_411-ebooks_Collection.tar
10/05/2014 06:54 AM 117,126,206 Autobiography_411-ebooks_Collection.tar.lz4 ! -9 !
10/06/2014 07:43 PM 115,558,538 Autobiography_411-ebooks_Collection.tar.v1.2_19.lzt ! -19 !
10/05/2014 03:23 AM 113,511,873 Autobiography_411-ebooks_Collection.tar.Z
02/12/2015 08:17 AM 113,073,442 Autobiography_411-ebooks_Collection.tar.Hayabusa.Nakamichi
10/05/2014 09:12 AM 107,237,997 Autobiography_411-ebooks_Collection.tar.Tengu.Nakamichi
10/06/2014 04:06 AM 102,514,628 Autobiography_411-ebooks_Collection.tar.lzh / LHA32 version 2.67 /
10/06/2014 02:23 AM 101,966,054 Autobiography_411-ebooks_Collection.tar.method1.zpaq / zpaq 6.50 /
10/05/2014 06:47 AM 97,569,200 Autobiography_411-ebooks_Collection.tar.zip ! -tzip -mx9 !
10/06/2014 02:25 AM 87,439,698 Autobiography_411-ebooks_Collection.tar.method2.zpaq / zpaq 6.50 /
10/06/2014 02:21 AM 82,724,804 Autobiography_411-ebooks_Collection.tar.sr2
10/06/2014 07:18 PM 77,194,175 Autobiography_411-ebooks_Collection.tar.lzhds.nz ! -cD !
10/06/2014 03:34 AM 75,388,190 Autobiography_411-ebooks_Collection.tar.ST3_block256.bsc ! -m0 -Tt -b256 !
10/05/2014 06:53 AM 69,832,119 Autobiography_411-ebooks_Collection.tar.7z ! -t7z -mx9 !
10/06/2014 07:57 PM 69,398,819 Autobiography_411-ebooks_Collection.tar.v1.2_39_block256.lzt ! -39 -b256 -p1 !
10/06/2014 08:36 PM 67,946,358 Autobiography_411-ebooks_Collection.tar.v1.2_49_block256.lzt ! -49 -b256 -p1 !
10/05/2014 06:56 AM 66,517,316 Autobiography_411-ebooks_Collection.tar.order04.PPMonstr ! -m1024 -o4 !
10/06/2014 03:35 AM 65,416,196 Autobiography_411-ebooks_Collection.tar.ST4_block256.bsc ! -m1 -Tt -b256 !
10/06/2014 03:36 AM 61,018,244 Autobiography_411-ebooks_Collection.tar.ST5_block256.bsc ! -m2 -Tt -b256 !
10/06/2014 02:27 AM 59,401,801 Autobiography_411-ebooks_Collection.tar.method3.zpaq / zpaq 6.50 /
10/05/2014 07:00 AM 58,736,456 Autobiography_411-ebooks_Collection.tar.order06.PPMonstr ! -m1024 -o6 !
10/05/2014 02:55 AM 58,266,061 Autobiography_411-ebooks_Collection.tar.tangelo / Version 2.3 /
10/06/2014 02:32 AM 57,962,141 Autobiography_411-ebooks_Collection.tar.method4.zpaq / zpaq 6.50 /
10/05/2014 07:41 AM 56,862,830 Autobiography_411-ebooks_Collection.tar.bbb ! cfm128 !
10/05/2014 07:05 AM 55,962,342 Autobiography_411-ebooks_Collection.tar.order08.PPMonstr ! -m1024 -o8 !
10/06/2014 03:33 AM 55,464,039 Autobiography_411-ebooks_Collection.tar.BWT_block256.bsc ! -m3 -Tt -b256 !
10/06/2014 03:02 AM 55,117,057 Autobiography_411-ebooks_Collection.tar.method5.zpaq / zpaq 6.50 /
10/05/2014 07:26 AM 55,011,526 Autobiography_411-ebooks_Collection.tar.order32.PPMonstr ! -m1024 -o32 !
10/05/2014 07:15 AM 54,635,921 Autobiography_411-ebooks_Collection.tar.order16.PPMonstr ! -m1024 -o16 !
10/06/2014 07:16 PM 52,631,060 Autobiography_411-ebooks_Collection.tar.cm.nz ! -cc !
10/06/2014 02:06 AM 48,194,287 Autobiography_411-ebooks_Collection.tar.paq8hp12 ! -8 !
Benchmarks below are for laptop with Core 2 Q9550s @2833MHz.
D:\_KAZE\EBOOKs_for_benching\z>..\lz4 -9 -Sx -b -T1 Autobiography_411-ebooks_Collection.tar
Nb of threads = 1 ; Compression Level = 9
Autobiography_4 : 273401856 -> 117125927 ( 42.84%), 12.7 MB/s , 902.3 MB/s
D:\_KAZE\EBOOKs_for_benching\z>..\Nakamichi_Hayabusa_GP.exe Autobiography_411-ebooks_Collection.tar.Nakamichi /benchmark
Nakamichi 'Hayabusa', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 113073442 bytes ...
RAM-to-RAM performance: 832 MB/s.
Memory pool starting address: 0000000007130080 ... 64 byte aligned, OK
Copying a 512MB block 1024 times i.e. 512GB READ + 512GB WRITTEN ...
memcpy(): (512MB block); 524288MB copied in 193289 clocks or 2.712MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 30%
D:\_KAZE\EBOOKs_for_benching\z>..\lz4 -9 -Sx -b -T1 enwik8
Nb of threads = 1 ; Compression Level = 9
enwik8 : 100000000 -> 42283793 ( 42.28%), 18.1 MB/s , 929.5 MB/s
D:\_KAZE\EBOOKs_for_benching\z>..\Nakamichi_Hayabusa_GP.exe enwik8.Nakamichi /benchmark
Nakamichi 'Hayabusa', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 43010309 bytes ...
RAM-to-RAM performance: 768 MB/s.
Memory pool starting address: 0000000002EA0080 ... 64 byte aligned, OK
Copying a 512MB block 1024 times i.e. 512GB READ + 512GB WRITTEN ...
memcpy(): (512MB block); 524288MB copied in 193336 clocks or 2.712MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 28%
D:\_KAZE\EBOOKs_for_benching\z>..\lz4 -9 -Sx -b -T1 OSHO.TXT
Nb of threads = 1 ; Compression Level = 9
OSHO.TXT : 206908949 -> 71399094 ( 34.51%), 14.2 MB/s , 947.4 MB/s
D:\_KAZE\EBOOKs_for_benching\z>..\Nakamichi_Hayabusa_GP.exe OSHO.TXT.Nakamichi /benchmark
Nakamichi 'Hayabusa', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 75006396 bytes ...
RAM-to-RAM performance: 960 MB/s.
Memory pool starting address: 0000000004C60080 ... 64 byte aligned, OK
Copying a 512MB block 1024 times i.e. 512GB READ + 512GB WRITTEN ...
memcpy(): (512MB block); 524288MB copied in 193505 clocks or 2.709MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 35%
D:\_KAZE\EBOOKs_for_benching\z>..\lz4 -9 -Sx -b -T1 silesia.tar
Nb of threads = 1 ; Compression Level = 9
silesia.tar : 211948544 -> 78036331 ( 36.82%), 16.5 MB/s , 1132.3 MB/s
D:\_KAZE\EBOOKs_for_benching\z>..\Nakamichi_Hayabusa_GP.exe silesia.tar.Nakamichi /benchmark
Nakamichi 'Hayabusa', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Decompressing 92192717 bytes ...
RAM-to-RAM performance: 832 MB/s.
Memory pool starting address: 0000000005E00080 ... 64 byte aligned, OK
Copying a 512MB block 1024 times i.e. 512GB READ + 512GB WRITTEN ...
memcpy(): (512MB block); 524288MB copied in 193482 clocks or 2.710MB per clock
RAM-to-RAM performance vs memcpy() ratio (bigger-the-better): 30%
And quick comparison with 'LZ4 -9':
02/12/2015 04:18 AM 152,089 alice29.txt
02/12/2015 05:35 PM 63,705 alice29.txt.lz4
09/30/2014 06:31 AM 73,787 alice29.txt.Tengu.Nakamichi
02/12/2015 04:18 AM 74,782 alice29.txt.Hayabusa.Nakamichi
02/12/2015 03:34 AM 273,401,856 Autobiography_411-ebooks_Collection.tar
02/12/2015 05:36 PM 117,126,206 Autobiography_411-ebooks_Collection.tar.lz4
10/05/2014 09:12 AM 107,237,997 Autobiography_411-ebooks_Collection.tar.Tengu.Nakamichi
02/12/2015 08:17 AM 113,073,442 Autobiography_411-ebooks_Collection.tar.Hayabusa.Nakamichi
02/12/2015 03:22 AM 3,153,408 CalgaryCorpus.tar
02/12/2015 05:35 PM 1,195,853 CalgaryCorpus.tar.lz4
09/30/2014 07:15 AM 1,296,773 CalgaryCorpus.tar.Tengu.Nakamichi
02/12/2015 09:34 AM 1,388,997 CalgaryCorpus.tar.Hayabusa.Nakamichi
02/12/2015 03:24 AM 10,192,446 dickens
02/12/2015 05:35 PM 4,442,992 dickens.lz4
09/30/2014 07:49 AM 3,993,467 dickens.Tengu.Nakamichi
02/12/2015 09:45 AM 4,197,881 dickens.Hayabusa.Nakamichi
02/12/2015 02:48 AM 100,000,000 enwik8
02/12/2015 05:36 PM 42,283,904 enwik8.lz4
09/30/2014 12:06 PM 40,860,927 enwik8.Tengu.Nakamichi
02/12/2015 11:47 AM 43,010,309 enwik8.Hayabusa.Nakamichi
02/12/2015 03:15 AM 3,903,143 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd
02/12/2015 05:34 PM 1,539,449 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.lz4
09/30/2014 06:56 AM 1,308,271 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.Tengu.Nakamichi
02/12/2015 09:28 AM 1,269,686 MASAKARI_General-Purpose_Grade_English_Wordlist.wrd.Hayabusa.Nakamichi ( ͡⚆ ͜ʖ ͡⚆)
02/12/2015 03:26 AM 5,582,655 shaks12.txt
02/12/2015 05:35 PM 2,315,036 shaks12.txt.lz4
09/30/2014 06:45 AM 2,211,086 shaks12.txt.Tengu.Nakamichi
02/12/2015 09:37 AM 2,297,940 shaks12.txt.Hayabusa.Nakamichi
02/12/2015 03:22 AM 211,948,544 silesia.tar
02/12/2015 10:02 PM 78,036,550 silesia.tar.lz4
10/01/2014 12:52 PM 82,918,238 silesia.tar.Tengu.Nakamichi
02/12/2015 09:19 PM 92,192,717 silesia.tar.Hayabusa.Nakamichi

/Photo by Masaki/
Update, 2015-Feb-20:
The French artist Rachelle Bartel, a.k.a Lillycat inspired me to ... dream of a new variant called 'Loonette' a.k.a. 'The Rabbit Girl',
probably it will be XMMized 'Hayabusa' i.e. Match Lengths will be 12/8 and the Sliding Windows will be ... slid by one order i.e.
(2*8-3)bit/(3*8-3)bit/(4*8-3)bit or 13bit/21bit/29bit or 8KB/2MB/512MB.
Well, here it comes:





How it behaves is still an open question. 'Loonette' has its future in the future.
Copyleft Sanmayce, 2015 Feb 20; character encoding: charset=utf-8; for contacts: sanmayce@sanmayce.com