appli.diff
Class Diff

java.lang.Object
  extended byappli.diff.Diff

public class Diff
extends java.lang.Object

diff Text file difference utility. ---- Copyright 1987, 1989 by Donald C. Lindsay, School of Computer Science, Carnegie Mellon University. Copyright 1982 by Symbionics. Use without fee is permitted when not for direct commercial advantage, and when credit to the source is given. Other uses require specific permission. Converted from C to Java by Ian F. Darwin, ian@darwinsys.com, January, 1997. Copyright 1997, Ian F. Darwin. Conversion is NOT FULLY TESTED. USAGE: diff oldfile newfile This program assumes that "oldfile" and "newfile" are text files. The program writes to stdout a description of the changes which would transform "oldfile" into "newfile". The printout is in the form of commands, each followed by a block of text. The text is delimited by the commands, which are: DELETE AT n ..deleted lines INSERT BEFORE n ..inserted lines n MOVED TO BEFORE n ..moved lines n CHANGED FROM ..old lines CHANGED TO ..newer lines The line numbers all refer to the lines of the oldfile, as they are numbered before any commands are applied. The text lines are printed as-is, without indentation or prefixing. The commands are printed in upper case, with a prefix of ">>>>", so that they will stand out. Other schemes may be preferred. Files which contain more than MAXLINECOUNT lines cannot be processed. This can be fixed by changing "symbol" to a Vector. The algorithm is taken from Communications of the ACM, Apr78 (21, 4, 264-), "A Technique for Isolating Differences Between Files." Ignoring I/O, and ignoring the symbol table, it should take O(N) time. This implementation takes fixed space, plus O(U) space for the symbol table (where U is the number of unique lines). Methods exist to change the fixed space to O(N) space. Note that this is not the only interesting file-difference algorithm. In general, different algorithms draw different conclusions about the changes that have been made to the oldfile. This algorithm is sometimes "more right", particularly since it does not consider a block move to be an insertion and a (separate) deletion. However, on some files it will be "less right". This is a consequence of the fact that files may contain many identical lines (particularly if they are program source). Each algorithm resolves the ambiguity in its own way, and the resolution is never guaranteed to be "right". However, it is often excellent. This program is intended to be pedagogic. Specifically, this program was the basis of the Literate Programming column which appeared in the Communications of the ACM (CACM), in the June 1989 issue (32, 6, 740-755). By "pedagogic", I do not mean that the program is gracefully worded, or that it showcases language features or its algorithm. I also do not mean that it is highly accessible to beginners, or that it is intended to be read in full, or in a particular order. Rather, this program is an example of one professional's style of keeping things organized and maintainable. The program would be better if the "print" variables were wrapped into a struct. In general, grouping related variables in this way improves documentation, and adds the ability to pass the group in argument lists. This program is a de-engineered version of a program which uses less memory and less time. The article points out that the "symbol" arrays can be implemented as arrays of pointers to arrays, with dynamic allocation of the subarrays. (In C, macros are very useful for hiding the two-level accesses.) In Java, a Vector would be used. This allows an extremely large value for MAXLINECOUNT, without dedicating fixed arrays. (The "other" array can be allocated after the input phase, when the exact sizes are known.) The only slow piece of code is the "strcmp" in the tree descent: it can be speeded up by keeping a hash in the tree node, and only using "strcmp" when two hashes happen to be equal. Change Log ---------- 1Jan97 Ian F. Darwin: first working rewrite in Java, based entirely on D.C.Lindsay's reasonable C version. Changed comments from /***************** to /**, shortened, added whitespace, used tabs more, etc. 6jul89 D.C.Lindsay, CMU: fixed portability bug. Thanks, Gregg Wonderly. Just changed "char ch" to "int ch". Also added comment about way to improve code. 10jun89 D.C.Lindsay, CMU: posted version created. Copyright notice changed to ACM style, and Dept. is now School. ACM article referenced in docn. 26sep87 D.C.Lindsay, CMU: publication version created. Condensed all 1982/83 change log entries. Removed all command line options, and supporting code. This simplified the input code (no case reduction etc). It also simplified the symbol table, which was capable of remembering offsets into files (instead of strings), and trusting (!) hash values to be unique. Removed dynamic allocation of arrays: now fixed static arrays. Removed speed optimizations in symtab package. Removed string compression/decompression code. Recoded to Unix standards from old Lattice/MSDOS standards. (This affected only the #include's and the IO.) Some renaming of variables, and rewording of comments. 1982/83 D.C.Lindsay, Symbionics: created.

Version:
Java version 0.9, 1997
Author:
Ian F. Darwin, Java version, D. C. Lindsay, C version (1982-1987)

Field Summary
(package private)  boolean anyprinted
           
private  int[] blocklen
          blocklen is the info about found blocks.
private static int change
           
private static boolean DEBUG
           
private static int delete
           
private  java.util.ArrayList diffList
          List of Diff items
private static int idle
           
private static int insert
           
private static int movenew
           
private static int moveold
           
(package private)  FileInfo newinfo
          Keeps track of information about file1 and file2
(package private)  FileInfo oldinfo
          Keeps track of information about file1 and file2
private  int printnewline
           
private  int printoldline
           
private  int printstatus
           
(package private)  java.io.PrintStream prt
           
private static int same
           
private static int UNREAL
          block len > any possible real block len
 
Constructor Summary
(package private) Diff()
          Construct an empty Diff object.
 
Method Summary
private  void computeDiff(FileInfo oldinfo, FileInfo newinfo)
          Computes the differences
 void doDiff(java.io.File oldFile, java.io.File newFile)
          Do one file comparison.
 void doDiff(java.lang.String src1, java.lang.String src2)
          Do one file comparison.
 java.util.ArrayList getDiffList()
          Returns the Difference List array
 boolean hasDifferences()
          Says if there were differences between the files
(package private)  void inputscan(FileInfo pinfo)
          inputscan Reads the file specified by pinfo.file. --------- Places the lines of that file in the symbol table.
(package private)  void newconsume()
           
(package private)  void oldconsume()
          oldconsume Part of printout.
 void println(java.lang.String s)
          Convenience wrapper for println.
(package private)  void printout()
          printout - Prints summary to stdout.
(package private)  void scanafter()
           
(package private)  void scanbefore()
          scanbefore As scanafter, except scans towards file fronts.
(package private)  void scanblocks()
          scanblocks - Finds the beginnings and lengths of blocks of matches.
private  void scanunique()
           
 void setOutputStream(java.io.PrintStream p)
          Sets the output print stream.
(package private)  void showchange()
          showchange Part of printout.
(package private)  void showdelete()
          showdelete Part of printout.
(package private)  void showinsert()
           
(package private)  void showmove()
          showmove Part of printout.
(package private)  void showsame()
          showsame Part of printout.
(package private)  void skipnew()
          skipnew Part of printout.
(package private)  void skipold()
          skipold Part of printout.
private  void storeline(java.lang.String linebuffer, FileInfo pinfo)
          storeline Places line into symbol table. --------- Expects pinfo.maxLine initted: increments.
private  void transform()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEBUG

private static final boolean DEBUG
See Also:
Constant Field Values

UNREAL

private static final int UNREAL
block len > any possible real block len

See Also:
Constant Field Values

oldinfo

FileInfo oldinfo
Keeps track of information about file1 and file2


newinfo

FileInfo newinfo
Keeps track of information about file1 and file2


blocklen

private int[] blocklen
blocklen is the info about found blocks. It will be set to 0, except at the line#s where blocks start in the old file. At these places it will be set to the # of lines in the block. During printout , this # will be reset to -1 if the block is printed as a MOVE block (because the printout phase will encounter the block twice, but must only print it once.) The array declarations are to MAXLINECOUNT+2 so that we can have two extra lines (pseudolines) at line# 0 and line# MAXLINECOUNT+1 (or less).


diffList

private java.util.ArrayList diffList
List of Diff items


prt

java.io.PrintStream prt

idle

private static final int idle
See Also:
Constant Field Values

delete

private static final int delete
See Also:
Constant Field Values

insert

private static final int insert
See Also:
Constant Field Values

movenew

private static final int movenew
See Also:
Constant Field Values

moveold

private static final int moveold
See Also:
Constant Field Values

same

private static final int same
See Also:
Constant Field Values

change

private static final int change
See Also:
Constant Field Values

printstatus

private int printstatus

anyprinted

boolean anyprinted

printoldline

private int printoldline

printnewline

private int printnewline
Constructor Detail

Diff

Diff()
Construct an empty Diff object.

Method Detail

setOutputStream

public void setOutputStream(java.io.PrintStream p)
Sets the output print stream. If null, no output is displayed

Parameters:
p - PrintStream or null

doDiff

public void doDiff(java.io.File oldFile,
                   java.io.File newFile)
Do one file comparison. Called with both File objects.

Parameters:
oldFile - first file in the comparison
newFile - second file in the comparision

doDiff

public void doDiff(java.lang.String src1,
                   java.lang.String src2)
Do one file comparison. Called with both filenames.


computeDiff

private void computeDiff(FileInfo oldinfo,
                         FileInfo newinfo)
Computes the differences

Parameters:
oldinfo - first file information handler
newinfo - second file information handler

inputscan

void inputscan(FileInfo pinfo)
         throws java.io.IOException
inputscan Reads the file specified by pinfo.file. --------- Places the lines of that file in the symbol table. Sets pinfo.maxLine to the number of lines found.

Throws:
java.io.IOException

storeline

private void storeline(java.lang.String linebuffer,
                       FileInfo pinfo)
storeline Places line into symbol table. --------- Expects pinfo.maxLine initted: increments. Places symbol table handle in pinfo.ymbol. Expects pinfo is either oldinfo or newinfo.


transform

private void transform()

scanunique

private void scanunique()

scanafter

void scanafter()

scanbefore

void scanbefore()
scanbefore As scanafter, except scans towards file fronts. Assumes the off-end lines have been marked as a match.


scanblocks

void scanblocks()
scanblocks - Finds the beginnings and lengths of blocks of matches. Sets the blocklen array (see definition). Expects oldinfo valid.


printout

void printout()
printout - Prints summary to stdout. Expects all data structures have been filled out.


newconsume

void newconsume()

oldconsume

void oldconsume()
oldconsume Part of printout. Have run out of new file. Process the rest of the old file, printing any parts which were deletes or moves.


showdelete

void showdelete()
showdelete Part of printout. Expects printoldline is at a deletion.


showinsert

void showinsert()

showchange

void showchange()
showchange Part of printout. Expects printnewline is an insertion. Expects printoldline is a deletion.


skipold

void skipold()
skipold Part of printout. Expects printoldline at start of an old block that has already been announced as a move. Skips over the old block.


skipnew

void skipnew()
skipnew Part of printout. Expects printnewline is at start of a new block that has already been announced as a move. Skips over the new block.


showsame

void showsame()
showsame Part of printout. Expects printnewline and printoldline at start of two blocks that aren't to be displayed.


showmove

void showmove()
showmove Part of printout. Expects printoldline, printnewline at start of two different blocks ( a move was done).


println

public void println(java.lang.String s)
Convenience wrapper for println.

Parameters:
s - String to print

getDiffList

public java.util.ArrayList getDiffList()
Returns the Difference List array

Returns:
the ArrayList of DiffItem objects

hasDifferences

public boolean hasDifferences()
Says if there were differences between the files

Returns:
true if there were any difference.