SOFTWARE
Graph Compression
This software allows to compress graphs. In particular it is suitable for the Web Graph
since it is able to exploit all the redundacies tipical of this graph.
The compression format is based on the scheme we have proposed in Graph Compression by BFS
(2009) by Alberto Apostolico and me. This implementation (in Java
) presents some
minor differences from the one (in C
) used for tests shown in the paper cited above,
however it almost achieves the same compression results and performance.
- new command 'x': mixed parsing and compression phase (no huge temporary file)
- new option '-s' suggested by Susana Ladra González: less memory used in the compression phase, no exploitation of the BFS it uses orginial indexes
- new option '-sim': does not write the compressed file, only gives bits/link
- bug fix
- Graph Compression - Version 0.3.2
- Jar (with BV extension) - 54 KB
- Dependencies: WebGraph
- Jar - 51 KB
- Dependencies: none
- API (in progress)
- Graph Compression - Version 0.2.1
- Jar (with BV extension) - 47 KB
- Dependencies: WebGraph
- API
- Graph Compression - Version 0.1
- Jar (with BV extension) - 43 KB
- Dependencies: WebGraph
- Jar - 40 KB
- API
Please use the following BibTeX
entry if you would like to cite this software.
@Article{ad-gcbfs-09, AUTHOR = {Apostolico, Alberto and Drovandi, Guido}, TITLE = {Graph Compression by BFS}, JOURNAL = {Algorithms}, VOLUME = {2}, YEAR = {2009}, NUMBER = {3}, PAGES = {1031--1044}, URL = {http://www.mdpi.com/1999-4893/2/3/1031}, ISSN = {1999-4893}, DOI = {10.3390/a2031031} }
Here you can find some compression results on datasets from http://law.dsi.unimi.it/. The compression level is set to 1000 for all the tests. The BFS Root (randomly chosen) is the index with respect to the original graph downloaded from the previous URL.
Graph | Nodes | Links | Bits/Link | BFS Root |
---|---|---|---|---|
arabic-2005 | 22 744 080 | 639 999 458 | 1.462 | 20 703 058 |
cnr-2000 | 325 557 | 3 216 152 | 1.838 | 130 582 |
eu-2005 | 862 664 | 19 235 140 | 2.699 | 232 241 |
in-2004 | 1 382 908 | 16 917 053 | 1.446 | 1 288 532 |
indochina-2004 | 7 414 866 | 194 109 311 | 0.936 | 1 710 995 |
it-2004 | 41 291 594 | 1 150 725 436 | 1.536 | 30 132 872 |
sk-2005 | 50 636 154 | 1 949 412 601 | 1.969 | 10 274 367 |
uk-2002 | 18 520 486 | 298 113 762 | 1.693 | 11 353 878 |
uk-2005 | 39 459 925 | 936 364 282 | 1.420 | 4 392 160 |
uk-2006-05 | 77 741 046 | 2 965 197 340 | 1.410 | 39 633 072 |
uk-2006-06 | 80 644 902 | 2 481 281 617 | 1.671 | 45 295 894 |
uk-2006-07 | 96 395 298 | 3 030 665 444 | 1.506 | 70 680 309 |
uk-2006-08 | 100 751 978 | 3 250 153 746 | 1.617 | 14 827 948 |
uk-2006-09 | 106 288 541 | 3 871 625 613 | 1.472 | 31 005 541 |
uk-2006-10 | 93 463 772 | 3 130 910 405 | 1.528 | 59 983 443 |
uk-2006-11 | 106 783 458 | 3 479 400 938 | 1.562 | 102 080 805 |
uk-2006-12 | 103 098 631 | 3 768 836 665 | 1.369 | 39 137 185 |
uk-2007-01 | 108 563 230 | 3 929 837 236 | 1.268 | 104 537 817 |
uk-2007-02 | 110 123 614 | 3 944 932 566 | 1.435 | 23 821 898 |
uk-2007-03 | 107 565 084 | 3 642 701 825 | 1.551 | 79 095 635 |
uk-2007-04 | 106 867 191 | 3 790 305 474 | 1.459 | 35 266 768 |
uk-2007-05 | 105 896 555 | 3 738 733 648 | 1.440 | 103 033 803 |
uk-union-2006-06-2007-05 | 133 633 040 | 5 507 679 822 | 1.830 | 40 996 664 |
webbase-2001 | 118 142 155 | 1 019 903 190 | 2.870 | 51 779 370 |
Bioinformatic
DMB (http://dmb.iasi.cnr.it) is a powerful data mining tool. It allows to classify many biological stuff (species, viruses, etc.) giving a set of logical formulas that are able to distinguish between the different classes given as input. Please refer to the DMB web-site for more information and the software.