File Comparison in C# – Part 2

Files and Directories

BinaryIn my previous article – File Comparison in C# – Part 1 – I showed you how to create a simple method which compares the file size of two files using the System.IO.FileInfo class. In this article, we will be building upon that previous article so I suggest you read it here before continuing.

Now we are going to create a file comparison method which will compare files byte by byte. To do this we are going to use the System.IO.FileStream class to read our files. The below code is one way of implementing this comparison technique:

private static bool CompareFileBytes(string fileName1, string fileName2)
{
    // Compare file sizes before continuing. 
    // If sizes are equal then compare bytes.
    if (CompareFileSizes(fileName1, fileName2))
    {
        int file1byte = 0;
        int file2byte = 0;

        // Open a System.IO.FileStream for each file.
        // Note: With the 'using' keyword the streams 
        // are closed automatically.
        using (FileStream fileStream1 = new FileStream(fileName1, FileMode.Open),
                          fileStream2 = new FileStream(fileName2, FileMode.Open))
        {
            // Read and compare a byte from each file until a
            // non-matching set of bytes is found or the end of
            // file is reached.
            do
            {
                file1byte = fileStream1.ReadByte();
                file2byte = fileStream2.ReadByte();
            }
            while ((file1byte == file2byte) && (file1byte != -1));
        }

        return ((file1byte - file2byte) == 0);
    }
    else
    {
        return false;
    }
}

As you can see, this method accepts two parameters, which are the full file names of the files we want to compare. The first line of code makes a call to the CompareFileSizes method we created in part 1 of this article. We are calling this method to optimize our code a bit. You see, if the files we are comparing are different, it is most likely that their sizes are also different, so this way we find out that the files are different without performing the expensive byte by byte comparison. If on the other hand the file sizes are identical we will perform the full byte by byte comparison.

This code makes use of a do-while loop to iterate the file’s bytes and compare them one by one. If you need to refresh your memory on what a do-while loop is, you can check out my article on loops – Control Structures – The Loop. The code in the loop will keep iterating until a non-matching set of bytes is found or the end of the file is reached. Our method will then return true if the files match and false if they don’t.

This file comparison technique is as accurate as you can get, but it is also the slowest comparison technique you can use. When comparing relatively small files together, and you don’t have many files to compare, the performance hit to your application might be acceptable. If on the other hand you are comparing hundreds or thousands of files, this technique will most probably be too slow for you. It all depends on what you are trying to create with your code.

I hope you enjoyed part 2 of this article. Stay tuned for part 3 where I will be showing you how to compute and compare file hashes, and finally I will compare the performance of both comparison methods.

Dave

1 comment… add one

Leave a Comment