File Comparison in C# – Part 1

Files and Directories

ComparisonAs a developer you might need to compare files for equality, maybe because you want to verify that your backups are valid, or maybe because you want to search for duplicate files in a folder. Either way you will need to compare two files together, and I am going to show you how.

There are a number of different ways to compare files – some are good and some are not so good. Below is a list of some of the different techniques that can be used, either on their own, or together, to compare files.

  1. Compare file sizes;
  2. Compare file creation dates;
  3. Compare bytes from beginning and end of files;
  4. Compare whole file byte by byte;
  5. Generate a hash for each file and compare hashes;

All these comparison techniques have advantages and disadvantages. For example technique number 1 has the advantage of being very fast to execute, but then it is very unreliable because the chances of having files with the same size can be quite high. Technique number 3 for example, is very accurate but can be slow for large files. There is no one comparison technique which is good for all situations, so I recommend using multiple techniques in sequence.

In this short series of articles, we are going to build an accurate file comparison tool which uses techniques 1 and 4 listed above. Then we are going to build another file comparison method which instead uses techniques 1 and 5 listed above. Then, finally, we will compare both file comparison techniques and select the better of the two. We are going to use technique number 1 both times because it executes fast and can easily catch files of different sizes, therefore it can filter out the majority of files when doing comparisons. Only files which have an identical size will be compared using the slower more accurate techniques number 4 and 5.

Since both comparison techniques will be comparing file sizes we can start off by coding a CompareFileSizes method.

Compare File Sizes

Before starting, we will be needing the System.IO namespace, so after you create a Windows Console Application from Visual Studio, add the namespace to the top of your main class file.

Now let’s create a method which compares two files by comparing their file sizes.

private static bool CompareFileSizes(string fileName1, string fileName2)
{
    bool fileSizeEqual = true;

    // Create System.IO.FileInfo objects for both files
    FileInfo fileInfo1 = new FileInfo(fileName1);
    FileInfo fileInfo2 = new FileInfo(fileName2);

    // Compare file sizes
    if (fileInfo1.Length != fileInfo2.Length)
    {
        // File sizes are not equal therefore files are not identical
        fileSizeEqual = false;
    }

    return fileSizeEqual;
}

The above CompareFileSizes method accepts two full file names as parameters. The file name must include the full path to the file. These file names are being used by the FileInfo class which is returning meta data on our files. Once we have created both FileInfo objects, we are then using the Length property, which gets the size in bytes of the current file, to compare both of the files together. If the lengths of the files are not identical, this means that they are definitely not equal.

This CompareFileSizes method we just created is not very robust. In fact it will crash if the file names passed as parameters do not exist. This is because the FileInfo class will try to read file properties of a file which does not exist and it will obviously raise an error. To fix this problem you could check if the file exists by wrapping the code in the following if statement.

if (File.Exists(fileName1) && File.Exists(fileName2))
{
}

I am not going to validate the file names because it is not in the scope of this article series, but when developing software for the real world it is of utmost importance that you add good validation code to your applications.

I hope you enjoyed part 1 of this article series. In part 2 we will create a method which first makes a call to the CompareFileSizes method we just created, and then if true is returned, it will compare the files byte by byte. Then, in part 3 of this article series we will be creating a method which compares file hashes, and finally we will analyze the performance of both methods together.

Dave

3 comments… add one
  • anthony Link Reply

    hi dave..thanks for this article,the examples were presise and very educating.i’m very new to c#..but i’m learning pretty fast..thanks to articles like yours.My question is:what if i want to output all equal files in a directory or folder as a series of groups. what do i need to do. Please help..

Leave a Comment