What Java version should I use?
Jacksum 3 runs on JDK 11 and any later release.
Download the JRE from java.com and install it, for example to
/opt/jdk/latest11/
Run update-alternatives. See also https://help.ubuntu.com/community/Java
$ sudo update-alternatives --install "/usr/bin/java" "java"
"/opt/jdk/latest11/bin/java" 1
$ sudo update-alternatives --config java
If you want to be flexible as possible, you could create /opt/jdk/latest11 just as a symbolic link to the actual Java folder. The advantage would be that each time a new JDK comes out you just have to update the symlink.
In addition to the standard command line interface, the Jacksum project also supports the integration into the famous file browsers such as Windows Explorer, KDE Konqueror, Gnome Nautilus, ROX-Filer and Finder on macOS.
Jacksum's primary user interface is and will be the command line, as it provides compatibility with other popular command line tools available on Unix, like cksum, sha1sum, sum, md5sum, b2sum, etc. The commandline interface allows you also to use the tool also easily with cronjobs, in scripts, in file browsers, in web interfaces and of course in combination with pipes and other useful tools like grep, sort, uniq, tar, bzip2, gzip, zip, and so on.What's the meaning of the Jacksum output?
e. g..
599770357 23560 foo.txt
The first number represents the CRC, checksum or hash. It depends on the algorithm that you use (option -a) and what encoding you have specified for the hash encoding (option -x, -X, or -E). The second number represents the filesize, usually in bytes. Exceptions are both algorithms BSD sum and UNIX system V sum, they print the filesize in blocks. The filesize will be not written by any one-way-hash based or cryptographic hash algorithm (e. g. MD5, SHA1, SHA2, SHA3, Skein etc.), because the filesize is already incorporated into the message digest. The third column (it is the second one if you have specified a one-way-hash based algorithm) represents the filename. The filename will not be written if the standard-input device is used.
Starting with Jacksum 1.3.0 you have the option to print the timestamp of files. In this case an additional column appears in each line (just before the filename). The meaning of the timestamp is dependent on the format that you can specify (option -t).
599770357 23560 20031027140042 foo.txt
Starting with Jacksum 1.5.0 you have the full control on the output by using the option -F, and starting with Jacksum 3, you can use compatibility files in order to be compatible with popular output formats.With option -a you can choose one or even multiple algorithms. If you got a file from a trusted source and you only would like to know if a file transfer was successful, many algorithms can be used. However, for a really save file integrity check and verification on thousands of files on today's harddisks (or SSDs), not all algorithms are suitable in this case.
Don't use
From a cryptographic point of view, it is recommended not to use the
following algorithms, because they don't care about the arrangement of
bytes in a file: sum8, sum16, sum24, sum32 and xor8. Furthermore you
shouldn't use MD2 or MD4, because RSA doesn't recommend to use them
anymore. SHA-0 is also weak, it was replaced by SHA-1 in year 1995.
Not really recommended
All checksum and CRC based algorithms do simply not have the bit strength
and the mathematical power to guarantee strong fingerprints for thousands
of files on today's harddiscs (or SSDs). See also the CRC-Faker
and read the article "CRC and how to Reverse it" (http://surf.to/anarchriz).
Viruses
could be able to fake files - with a CRC based algorithm you cannot detect
such intruders. Although Jacksum is able to produce SFV on demand, it is
highly recommended to avoid using SFV files. By default, they are based on
CRC32 values and SFV doesn't take the filesize into account.
ELF-32 is a very simple hash based algorithm and has not the bit strength for save file integrity checks, probably. The Whirlpool has been replaced two times by a new Whirpool. The algorithm eMule/eDonkey (MD4 based, but improved) would have enough bits, but even for the widely used MD5 and HAVAL_3_128 real collisions have been detected already (http://eprint.iacr.org/2004/199/) in August 2004 (with a non-brute force method). The paper also mentions collisions for the full RIPEMD-128, however the hexdump example in the doc doesn't generate a collision (while all the other examples in the doc have been verified, the RIPEMD hexdump example seems to have a typo). For an excellent example of a MD5 collision go to http://www.cits.rub.de/MD5Collisions. Note that RIPEMD-256 is as secure as RIPEMD-128 so it is also not recommended. Tiger/128 and Tiger/160 are truncated hash functions of Tiger/192. Although they seem to be save today, the non-truncated has functions are more secure.
SHA-1
In February 2005, Bruce Schneier reported an attack by Xiaoyun Wang, Yiqun
Lisa Yin, and Hongbo Yu. The attack is outlined in a brief note by the
authors. The authors assert that their attacks can find collisions in the
full version of SHA-1, requiring less than 269 operations (a
brute-force search would require 280). In academic
cryptography, any attack that has less computational complexity than the
expected time needed for brute force is considered a break. This does not,
however, necessarily mean that the attack can be practically exploited. So
in the real world, the SHA-1 is still secure, at least for verifying
integrity. For latest cryptanalysis of SHA-1 read http://en.wikipedia.org/wiki/SHA1#Cryptanalysis_of_SHA-1.
See also https://shattered.io/
Recommendation
From a today's cryptographic point of view (September 2021), all
non-broken crypthograpic one-way-hash algorithms are save to perform
reliable file integrity checks. Since Jacksum 3, the algorithm SHA3-256 is
the default.
Why are there alternate implementations available?
A few algorithms (adler32, crc32, md5, sha-1, sha-256, sha-384, sha-512, sha3) are part of the standard Java API. If an algorithm is provided by your JDK, Jacksum is just calling the API and use the algorithm which is offered by the JDK. Some vendors of the JDK are calling native code which results usually in a good performance. Note, that this is the implementation detail of the vendor of a JDK and can be vary from vendor to vendor and from version to version. Due to several requests I also provide alternate, pure Java implementations for all of the algorithms which are usually covered by the Java API. On some systems an alternate implementation can perform better, but again this depends on the performance on your computer and on the Java Runtime Environment you use. Usually it is a good idea to use the latest JDK and rely on the performance the JDK can offer. Tests have shown that the newer your JDK, the better the performance. In some rare cases with the option -A you can get a better performance.Are CRCs obsolete today ?
No, CRCs are still used in hardware, firmware, protocols, simple software installers (NSIS for example) and today's filesystems (ZFS for example). Usually they are fast (and not so complex like one-way-hashes), and they are useful to verify a small amount of data. Furthermore, they are a very good addition to hash based algorithms. Jacksum supports the "Rocksoft (tm) Model CRC Algorithm" which makes it possible to calculate any customized CRC.How is a CRC calculated ?
A lot of people were asking me this question. Well, there are some good websites explaining how CRC is working. Go to http://surf.to/anarchriz or ftp://ftp.rocksoft.com/papers/crc_v3.txt. You also might have a look at the sourcecode of Jacksum.Many references define the CRC-16/CCITT as
crc:16,1021,FFFF,false,false,0
The notation "crc:width,poly,init,refIn,refOut,xorOut" is used since Jacksum 1.7.0 to specify a CRC algorithm according to the Rocksoft(TM) Model CRC Algorithm. Using the message 123456789, a CRC algorithm with the parameters above returns 0x29B1:
jacksum -q txt:123456789 -a crc:16,1021,FFFF,false,false,0 -X
29B1
However, other sources claim that according to the CCITT standard (see above), a message must be prepended with 16 zero bits before the calculation. This interpretation of the standard can be expressed as
crc:16,1021,1D0F,false,false,0
Notice, that in this case only the init value is different, because using a 16 bit zero message, a CRC with the init parameter 0xFFFF returns 0x1D0F which is the init value for the alternate interpretation of the algorithm.
Using the message 123456789, a CRC algorithm with the parameters above returns 0xE5CC:
jacksum -q txt:123456789 -a crc:16,1021,1D0F,false,false,0 -X
E5CC
What algorihms are currently not supported by Jacksum?
Although Jacksum supports hundreds of algorithms, there are still a lot of algorithms not (yet) supported by Jacksum. Many of those algorithms are cryptographic hashes/checksums with weaknesses or even broken or they have usage restrictions due to patents. Nonetheless they are on my wishlist for educational and completeness purposes.If you know sourcecode of any of those algorithms (a GPL3+ compatible license is required, Java source code is preferred), please just send me the link and I will test, optimize and include them for a next release of Jacksum.
How do I sync two folders on two different computers?
With Jacksum, you can perform an unidirectional synchronization, e. g. if you want to sync two folders on two different computers, even without a connection in between, Jacksum can help you to solve that kind of problem. Let's imagine that you have a good and a faulty computer.
Example for Windows:
1. Change to the good folder on the good computer and execute the command
cd good
jacksum -a sha3-256 . > c:\temp\check.jacksum
2. The file called check.jacksum represents a snapshot of all files on
the good computer from the good folder.
Transfer the file called check.jacksum file
to the faulty computer.
3. Change to the folder on the faulty computer and execute the command
cd faulty
jacksum -a sha3-256 -E hex -c c:\temp\check.jacksum --list
--list-filter bad > c:\temp\files.list
4. The file called files.list now contains a file list of
differences.
Transfer the file called files.list to the
good computer.
5. Change to the good folder on the good computer again and execute the
command
cd good
type files.list | zip -@ patch.zip
In case you prefer .tar.bz2 rather than .zip:
cd good
tar
cfv patch.tar -I files.list
bzip2
-9 patch.tar
How do I create a patch with Jacksum?
As being a developer you might provide a patch to your faithfully customers so they can upgrade easier. Jacksum can help you in this case.
Example for Unix, and GNU/Linux
1. Change to your new version's folder
cd ~/newversion
jacksum -a sha3-256 . > /tmp/check.jacksum
2. Change to your old version's folder
cd ~/oldversion
jacksum -a sha3-256 -E hex -c /tmp/check.jacksum --list
--list-filter bad > /tmp/files.list
3. Change to your new version's folder again and zip the original files
cd ~/newversion
tar cfv patch.tar -T /tmp/files.list (GNU/Linux,
Mac
OS X, PC-BSD)
tar cfv patch.tar -I /tmp/files.list (PC-BSD,
Solaris)
bzip2 -9 patch.tar
In case you prefer .zip rather than .tar.bz2:
cd ~/newversion
cat /tmp/files.list | zip -@ patch.zip (GNU/Linux, Mac OS
X, Solaris)
Is it possible to compare two directory trees or even discs?
Yes, there are multiple ways to do that. I recommend the first way.
1. Since Jacksum 1.3.0 you can use the option -c to check files against a given list, so you will exactly know which file has been modified or deleted. See the following example (batch file for windows). Notice, that you can also compare two directories or even discs even if they are on two different computers without a connection in between! See also How do I sync two folders on two different computers?
Example for Windows:
@echo off :: usage: dircmp dir1 dir2 :: dir2 may be also on a different drive set ERRORLEVEL= if not exist %1 goto error if not exist %2 goto error set BACKUP=%CD% call jacksum -a sha1 -r -f -O %TEMP%\tmp.jacksum -U nul -m -w %2 cd %1 call jacksum -c %TEMP%\tmp.jacksum rem echo %ERRORLEVEL% cd %BACKUP% goto end :error echo direcotry does not exist :end |
2. Execute the following commands on both directories (it is important to change the directory with cd first):
cd <the directory you want to check>
jacksum -r -f -a sha1 . | jacksum -a sha1 -
If both checksums are equal, you will know, that both directories (including subdirectories) are equal.
3. Since Jacksum 1.5.0 there is the option -S to get just one checksum back. You don't need the pipe mechanism anymore.
cd <the directory you want to check>
Let's assume that you would like to get informed about any modifications on this web page (http://www.jonelo.de/java/jacksum).
1. Preparation: Get the web page with the command called wget, and store
the web page to your hard disk by specifying a suitable name (rather than
index.html) and calculate a hash from the file:
wget http://www.jonelo.de/java/jacksum/index.html -O MyFavorite
jacksum -a sha1 -m MyFavorite > websites.jacksum
2. Check frequently: Download the page again and check the web page
content's checksum against the checksum you have stored previously.
wget http://www.jonelo.de/java/index.html -O MyFavorite
jacksum -a sha1 -E hex -c websites.jacksum
[OK] MyFavorite
3. To fresh up the checksum in file website.jacksum, just repeat step 1 again.
As Jacksum can store multiple checksums in a .jacksum file, you can build your own website content change detection system for all your favorite websites on the net. Use scripts and cronjobs to make life easier. If your wget-download triggers a change on a webpage (a text counter for example), you will be informed about the change that you have caused. To avoid this problem, you must cut off this information from the file, before you calculate a checksum from the file (use grep -v or regular expressions for this task).
[GNU/Linux, Unix] I got bash: !": event not found
jacksum -a crc32 -q "txt:Hello World!" gives me:
bash: !": event not found
Actually this is a bash shell issue. The ! in bash has a special meaning by default, it is the history expansion character. You can reproduce it also with echo.
$ echo "!"
bash: !: event not found
$ echo "\!"
\!
$ echo \!
!
Therefore the correct usage is
jacksum -a crc32 -q "txt:Hello World"\!
472456355 12
However, this syntax is not very comfortable. Fortunately using single quotes will also solve the trouble:
$ echo '!'
!
jacksum -a crc32 -q 'txt:Hello World!'
472456355 12
Yes, the quintessential snippet is here:
import java.io.*; import java.security.*; import jonelo.jacksum.*; import jonelo.jacksum.algorithm.*; // ... AbstractChecksum checksum = null; // updates the checksum
with the content of a file |
To update the checksum with bytes or bytearrays, you can use the update
methods:
// reset the object for reuse (any formatting rules remain) checksum.reset(); // update the checksum with a single byte checksum.update(abyte); // update the checksum with a bytearray checksum.update(bytearray); // update the checksum with the first 10 bytes in the bytearray checksum.update(bytearray, 0, 10); System.out.println(checksum); |
To get the information what algorithms are supported, use the following
snippet. If you use JSE 5.0 or above, you can use Generics to avoid the
castings.
Map map = JacksumAPI.getAvailableAlgorithms(); Iterator iterator = map.entrySet().iterator(); while (iterator.hasNext()) { Map.Entry entry = (Map.Entry)iterator.next(); String description =
(String)entry.getValue(); AbstractChecksum
checksum =
JacksumAPI.getChecksumInstance((String)entry.getKey()); |
To control the format of the checksum only, you can perform methods on
the checksum object:
// checksum is printed in hex form (see also option -x) // example: 7d93370d5ef94450151826ca20c6e512 checksum.setEncoding(HEX); // checksum is printed in
uppercase hex form (see also option -X) // checksum is printed in
uppercase hex and grouped form (see also options -g, -G)
// checksum is printed in
Base 64 form (see also option -E) // checksum is printed in
BubbleBabble form (see also option -E) |
To control the format of a complete line, you can use the format method
// print the checksum, the default output format depends on the algorithm System.out.println(checksum); // print the checksum,
the default output format depends on the algorithm
// print the checksum
value only (see also option -F) // print the checksum
value with a customized format (see also option -F)
|
The license gives you the freedom to run the program, for any purpose, to study how the program works, and adapt it to your needs, to redistribute copies, and to improve the program, and release your improvements to the public. The GPL guarantees those freedoms, because you are required to license your work under the same license if you adapt the work.
Can I incorporate Jacksum into my project?
Yes, but Jacksum is not "public domain", and Jacksum is not "freeware". Jacksum is free software which is compliant with the Free Software Foundation idea. Jacksum is distributed under the terms of the GNU General Public License (GPL), as published by the Free Software Foundation (FSF) - GPLv3 or later. If you develop a program that is based on Jacksum, the program must be compatible with the GPL. It is against all policies, if you use Jacksum's source code, and put it in a closed source project, and/or claim it as yours. Jacksum's sources are free and must be free for ever, and you should respect this freedom. Once you have understand the advantages of openness, you have done a step to a greater world.
Can I use the name Jacksum for my own project/website?
No. The name Jacksum is not part of any dictionary, it is my own creation actually. The name has been published in July 2002. The name Jacksum is copyright protected by German law and I encourage everyone to respect that law. Jacksum's source code however has been published under the conditions of the GPL so everybody can use the code.