我是靠谱客的博主 体贴大侠,最近开发中收集的这篇文章主要介绍wdiffA word difference finder (and others),觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

A word difference finder (and others)


Next:  Overview,Up:  (dir)

GNU wdiff

These Info pages document wdiff, a word difference finder, togetherwith some otherdiff related tools. Note that some tools documented here are considered experimental and maynot be part of everywdiff installation. See Experimental.

This is for release 1.0.1.

--- The Detailed Node Listing ---

The word difference finder

The multi-difference finder

The diff format converter

How mdiff differs

Experimental programs

Table of Contents

  • GNU wdiff
  • 1 Overview
  • 2 The word difference finder
    • 2.1 Invoking wdiff
    • 2.2 Actual examples of wdiff usage
  • 3 The multi-difference finder
    • 3.1 Invoking mdiff
    • 3.2 Resource considerations and efficiency
  • 4 The diff format converter
    • 4.1 Invoking unify
  • 5 How mdiff differs
    • 5.1 Differences with diff
    • 5.2 Differences with wdiff
  • 6 Experimental programs
    • 6.1 History of the Experimental programs


Next:  wdiff,Previous:  Top,Up:  Top

1 Overview

wdiff is a front end to diff for comparing files ona word per word basis. It works by creating two temporary files, oneword per line, and then executesdiff on these files. It collects thediff output and uses it to produce a nicer display of word differencesbetween the original files.

mdiff studies one or more input files altogether, and discoversblocks of items which repeat at more than one place. Items may be lines,words, or units defined by user. When in word mode,mdiff comparestwo files, finding which words have been deleted or added to the firstin order to create the second, which is useful when two texts differonly by a few words and paragraphs have been refilled. The program hasmany output formats and interacts well with terminals and pagers (notablywithless).

unify is able to convert context diffs to unidiff format, or theother way around. Some people just prefer one format and despise theother, it is a religious issue. This program brings peace back to Earth.

wdiff2 is intended as a replacement towdiff. It aims at supporting the same set of options, but usesmdiffinstead of diff as its backend.

wdiff, mdiff andwdiff2were written by François Pinard, whileunify has been contributed by Wayne Davison. Please report bugs towdiff-bugs@gnu.org. Include the version number, whichyou can find by running the program with--version. Please includein your message sufficient input to reproduce what you got, the output youindeed expected, and careful explanations about the nature of the problem.


Next:  mdiff,Previous:  Overview,Up:  Top

2 The word difference finder

There are actually two programs for comparing files on a word per wordbasis. wdiff is a front-end to diff as found in theGNU diffutils package. It is quite mature. Its planned successor,wdiff2, is a front end to mdiff,and as experimental asmdiff itself. See Experimental.

A word is anything between whitespace. This is useful for comparing twotexts in which a few words have been changed and for which paragraphshave been refilled.


Next:  wdiff Examples,Up:  wdiff

2.1 Invoking wdiff

The programs wdiff and wdiff2 aim at providing thesame set of command line options. They are described below. Seewdiff Compatibility, for a list of differences.

     wdiff option ... old_file new_file
     wdiff option ... -d [diff_file]

wdiff compares files old_file andnew_file andproduces an annotated copy of new_file on standard output. Theempty string or the string ‘-’ denotes standard input, but standardinput cannot be used twice in the same invocation. The complete path ofa file should be given, a directory name is not accepted. wdiffwill exit with a status of 0 if no differences were found, a status of 1if any differences were found, or a status of 2 for any error.

In this documentation, deleted text refers to text inold_file which is not innew_file, while inserted textrefers to text on new_file which is not inold_file.

wdiff supports the following command line options:

--help -h
Print an informative help message describing the options.
--version -v
Print the version number of wdiff on the standard error output.
--no-deleted -1
Avoid producing deleted words on the output. If neither -1 or -2 is selected, the original right margin may be exceeded forsome lines.
--no-inserted -2
Avoid producing inserted words on the output. When this flag is given,the whitespace in the output is taken from old_file instead of new_file. If neither -1 or -2 is selected, theoriginal right margin may be exceeded for some lines.
--no-common -3
Avoid producing common words on the output. When this option is notselected, common words and whitespace are taken from new_file,unless option -2 is given, in which case common words andwhitespace are rather taken from old_file. When selected,differences are separated from one another by lines of dashes. Moreover, if this option is selected at the same time as -1 or -2, then none of the output will have any emphasis, i.e. no boldor underlining. Finally, if this option is not selected, but both -1 and -2 are, then sections of common words betweendifferences are segregated by lines of dashes.
--ignore-case -i
Do not consider case difference while comparing words. Each lower caseletter is seen as identical to its upper case equivalent for the purposeof deciding if two words are the same.
--statistics -s
On completion, for each file, the total number of words, the number ofcommon words between the files, the number of words deleted or insertedand the number of words that have changed is output. (A changed word isone that has been replaced or is part of a replacement.) Except for thetotal number of words, all of the numbers are followed by a percentagerelative to the total number of words in the file.
--auto-pager -a
Some initiatives which were previously automatically taken in previousversions of wdiff are now put under the control of this option. By using it, a pager is interposed whenever the wdiff output isdirected to the user's terminal. Without this option, no pager will becalled, the user is then responsible for explicitly piping wdiffoutput into a pager, if required.

The pager is selected by the value of the PAGER environmentvariable whenwdiff is run. If PAGER is not defined atrun time, then a default pager, selected at installation time, will beused instead. A defined but empty value ofPAGER means no pagerat all.

When a pager is interposed through the use of this option,one of the options -l or -t is also selected, depending onwhether the string ‘less’ appears in the pager's name or not.

It is often useful to define ‘wdiff’ as an alias for ‘wdiff-a’. However, thishides the normal wdiff behaviour. Thedefault behaviour may be restored simply by piping the output fromwdiff throughcat. This dissociates the output from theuser's terminal.

--printer -p
Use over-striking to emphasize parts of the output. Each character of thedeleted text is underlined by writing an underscore ‘ _’ first,then a backspace and then the letter to be underlined. Each character of theinserted text is emboldened by writing it twice, with a backspace inbetween. This option is not selected by default.
--less-mode -l
Use over-striking to emphasize parts of output. This option works asoption -p, but also over-strikes whitespace associated withinserted text. less shows such whitespace using reverse video. This option is not selected by default. However, it is automaticallyturned on whenever wdiff launches the pager less. Seeoption -a.

This option is commonly used in conjunction with less:

          wdiff -l old_file new_file | less

--terminal -t
Force the production of termcap strings for emphasising parts ofoutput, even if the standard output is not associated with a terminal. The TERM environment variable must contain the name of a valid termcap entry. If the terminal description permits, underliningis used for marking deleted text, while bold or reverse video is usedfor marking inserted text. This option is not selected by default. However, it is automatically turned on whenever wdiff launches apager, and it is known that the pager is not less. Seeoption -a.

This option is commonly used when wdiff output is not redirected,but sent directly to the user terminal, as in:

          wdiff -t old_file new_file

A common kludge uses wdiff together with the pagermore,as in:

          wdiff -t old_file new_file | more

However, some versions of more use termcap emphasis fortheir own purposes, so strange interactions are possible.

--start-delete argument -w argument
Use argument as the start delete string. This string willbe output prior to any sequence of deleted text, to mark where itstarts. By default, no start delete string is used unless there is noother means of distinguishing where such text starts; in this case thedefault start delete string is ‘ [-’.
--end-delete argument -x argument
Use argument as the end delete string. This string will beoutput after any sequence of deleted text, to mark where it ends. Bydefault, no end delete string is used unless there is no other means ofdistinguishing where such text ends; in this case the default end deletestring is -].
--start-insert argument -y argument
Use argument as the start insert string. This string willbe output prior to any sequence of inserted text, to mark where itstarts. By default, no start insert string is used unless there is noother means of distinguishing where such text starts; in this case thedefault start insert string is ‘ {+’.
--end-insert argument -z argument
Use argument as the end insert string. This string will beoutput after any sequence of inserted text, to mark where it ends. Bydefault, no end insert string is used unless there is no other means ofdistinguishing where such text ends; in this case the default end insertstring is ‘ +}’.
--avoid-wraps -n
Avoid spanning the end of line while showing deleted or inserted text. Any single fragment of deleted or inserted text spanning many lines willbe considered as being made up of many smaller fragments not containinga newline. So deleted text, for example, will have an end delete stringat the end of each line, just before the new line, and a start deletestring at the beginning of the next line. A long paragraph of insertedtext will have each line bracketed between start insert and end insertstrings. This behaviour is not selected by default.
--diff-input -d
Use single unified diff as input. If no input file is specified,standard input is used instead. This can be used to post-process diffsgenerated form other applications, like version control systems:
          svn diff | wdiff -d

Note that options -p, -t, and-[wxyz] are notmutually exclusive. If you use a combination of them, you will merelyaccumulate the effect of each. Option-l is a variant of option-p.


Previous:  wdiff invocation,Up:  wdiff

2.2 Actual examples of wdiff usage

This section presents a few examples of usage, most of them have beencontributed bywdiff users.

  • Change bars example.
    • This example comes from a discussion withJoe Wells.

      The following command produces a copy of new_file, shifted rightone space to accommodate change bars since the last revision, ignoringthose changes coming only from paragraph refilling. Any line with newor changed text will get a ‘|’ in column 1. However, deleted textis not shown nor marked.

                     wdiff -1n old_file new_file |
                       sed -e 's/^/  /;/{+/s/^ /|/;s/{+//g;s/+}//g'
      

      Here is how it works. Word differences are found, paying attention onlyto additions, as requested by option-1. For bigger changeswhich span line boundaries, the insert bracket strings are repeated oneach output line, as requested by option-n. This output is thenreformatted with ased script which shifts the text right twocolumns, turns the initial space into a bar only if there is some newtext on that line, then removes all insert bracket strings.

  • LaTeX example.
    • This example has been provided bySteve Fisk.

      The following uses LaTeX to put deleted text in boxes, and new textin double boxes:

                     wdiff -w "fbox{" -x "}" -y "fbox{fbox{" -z "}}" ...
      

      works nicely.

  • troff example.
    • This example comes from Paul Fox.

      Using wdiff, with some troff-specific delimiters givesmuch better output. The delimiters I used:

                     wdiff -w's-5' -x's0' -y'fB' -z'fP' ...
      

      This makes the pointsize of deletions 5 points smaller than normal, andemboldens insertions. Fantastic!

      I experimented with:

                     wdiff -w'fI' -x'fP' -y'fB' -z'fP'
      

      since that's more like the defaults you use for terminals or printers,but since I actually use italics for emphasis in my documents, I thoughtthe point size thing was clearer.

      I tried it on code, and it works surprisingly well there, too...

    • Marty Leisner says:

      In the previous example, you had smaller text being taken out and boldface inserted. I had smaller text being taken out and larger text beinginserted, I'm using bold face for other things, so this is more clear.

                     wdiff -w 's-3' -x's0' -y's+3' -z's0'
      

  • Colored output example.
    • This example comes from Martin von Gagern.

      If you like colored output, and your terminal supports ANSI escapesequences, you can use this invocation:

                     wdiff -n 
                       -w $'33[30;41m' -x $'33[0m' 
                       -y $'33[30;42m' -z $'33[0m' 
                       ... | less -R
      

      This will print deleted text black on red, and inserted text black ongreen, assuming that your normal terminal colors are white on black. Of course you can choose different colors if you prefer.

      The ‘$'...'’ notation is supported by GNU bash, and maybe othershells as well. If your shell doesn't support it, you might need somemore tricks to generate these escape sequences as command linearguments.

On a related note, GNU Emacs users might notice that the interactivefunction compare-windows ignores changes in whitespace, if it isgiven a numeric argument. If the variablecompare-ignore-caseis non-nil, it ignores differences in case as well. So, in a way,this offers a kind of incremental version ofwdiff.


Next:  unify,Previous:  wdiff,Up:  Top

3 The multi-difference finder

The name mdiff stands for multi-diff, and has thepurpose of encompassing the functionnality of a few otherdiff-typeprograms. The prefix multi- also stands for the fact the programis often able to study more than two input files at once.

The theory of operation is simple. The program splits all input filesinto a sequence of items, which may be lines or words.mdiff isthen said to operate either in line mode or in word mode. It then tries to find sequences of items which are repeated in theinput files. Such common sequences are calledclusters of items,and each occurrence of a repetition is called a clustermember. What remains, once all cluster members are conceptually removed from allinput files, is a set ofdifferences. The role of mdiffis to conveniently list either cluster members and differences.

When input files are very similar, it is likely that clusters willencompass many items (lines or words) and differences will be small. So,most listing options inhibit the printing of cluster members. However,one may ask for the few beginning or ending items of cluster membersto be printed nevertheless, as a way to provide a kind of feedbackorcontext of the difference, those context items are sometimessaid to be at thehorizon of the difference. In merged listings,cluster members may just not be printed, except maybe for a few contextitems at the beginning of the member (just after a difference), and afew context items at the end of the member (just before a difference).

When cluster members are short, or if you prefer, when the differencesare not far away from each other, it is quite possible that the requiredcontext items often cover the full extent of the cluster members, whichthen are not inhibited anymore when this happens. A run of differencesintermixed with such non-suppressed members is called ahunk. Some reports produced by mdiff are showned as a list of hunks,and it is to be understood that common items are elided between hunks. However, each hunk in itself has no item missing, and each item of thehunk is analysed as pertaining either to only one of the input file orto many of them. Each hunk is preceded by a header, which explains theline position of all input files prior to the hunk itself. By comparinga hunk header with the previous hunk header, the user can have a hintabout how much printing was spared.

When two input files are quite similar, clusters are usually presented inthe same order in all files. If a cluster memberA in the firstfile corresponds to a cluster member A in the second file, it islikely that another cluster memberB which appears afterA in the first file will correspond to a cluster memberBin the second file which appears after A as well. So, inmany cases, while producing merged listing of files, cluster members maybe made to naturally correspond to one another. However, this is notalways true, in particular when the second file has been produced fromthe first by moving a big chunk of code away from its original position. In such cases, we say that members havecrossed. When members arecrossed and mdiff has to make a merged listing, it selects onecluster member as beingnaturally associated with its correspondant(either the pair of A's or the pair ofB's) and then considerthe other cluster as being part of a difference. The crossed nature ofthe member may still be analysed and reported, or it may be ignored.

The standard diff program is meant for when there are exactly twoinput files, for which crossed members should be ignored.mdiffoutput format has been designed in such a way that it should resemblediff output for this precise case. However,diff formats arenot sufficient for representing all cases whichmdiff may address,and this is not mature yet. That is whymdiff, in its currentstate, still experiments with output formats, which are subject to change.

When the input files are not very similar, or rather different, mergedlistings are not very significant nor useful, and may even be ratherconfusing. The best to do in such cases is usingmdiff for makingan annotated relisting of all input files, in which cluster members areproperly identified and referred to one another.

Statistics.

     Read summary: 137 files, 41975 lines
     Work summary: 439 clusters, 1608 members, 8837 duplicate lines

The summary lines, triggered by the -s option, say that about 8837non-ignorable lines could be removed over the 41975 which has been read,by using functions,#include, #define, or similar devices.

If one manages to execute mdiff within GNU Emacs so the outputdescribed above is collected into the*compilation* buffer, thecommand C-` (‘M-x next-error’) will proceed to the nextcluster member in the other window, and similarily for other compilationmode commands. This is a useful way for handlingmdiff output.

Each line in the hunk, after the header, comes from the compared files,but is shifted right so the first column (or the first few columns)of each line gives information about where the line is coming from. A space indicates a line which is common to all files. In case thereare only two input files, a minus sign indicates a line from the firstfile and a plus sign a line from the second file. Else, a letter from‘a’ to ‘z’, or more than one letter if there are more than26 files, indicates to which file the line pertains. If a line or ablock of line pertains to many files but not to all of them, the firstcolumn holds a vertical bar, and the line or block of lines is bracketedbetween ‘@/’ and ‘@’ lines, which are kind of comments withinthe hunk. The initial bracket lists all file letters that are relatedto the incoming line.

I initially wrote mdiff specifically to help cleaning a C++project which was a bit large, and in which many big monolithic classeswere derived from each other most probably by rough copying followed bylocal modifications. I intended to fragment most common clusters andsegregate the parts into virtual methods in outer classes, and overridethese methods, as appropriate, with less common variants within innerclasses.mdiff was good at pointing me to exactly where I shouldlook at. Of course, it never did the cleanup for me, but it helped doingthe research about what should be done. Reusingmdiff over thehalf-cleaned project gave me more fine grained analysis of what was leftto consider.


Next:  Efficiency,Up:  mdiff

3.1 Invoking mdiff

The format for running the mdiff program is:

     mdiff option ... file ...

mdiff read all input files and produces its results onstandard output. Optionally, standard error might receive a progressreport or a few statistics.

wdiff compares files old_file andnew_file andproduces an annotated copy of new_file on standard output. Theempty string or the string- denotes standard input, but standardinput cannot be used twice in the same invocation. The complete path ofa file should be given, a directory name is not accepted.wdiffwill exit with a status of 0 if no differences were found, a status of 1if any differences were found, or a status of 2 for any error.

In this documentation, deleted text refers to text inold_file which is not innew_file, while inserted textrefers to text on new_file which is not inold_file.

mdiff supports the following command line options:

--version
Merely prints the version numbers on standard output, and exits withoutdoing anything else.
--help
Merely prints a page of help on standard output, and exits without doinganything else.
--threshold=number -t number
Specifies the minimum number of non-ignorable lines which are required fortwo runs of lines to compare as equal. No cluster member may ever haveless than number lines. By default, clusters have 4 lines or more.
--no-deleted -1
Avoid producing deleted words on the output. If neither -1 or -2 is selected, the original right margin may be exceeded forsome lines.
--no-inserted -2
Avoid producing inserted words on the output. When this flag is given,the whitespace in the output is taken from old_file instead of new_file. If neither -1 or -2 is selected, theoriginal right margin may be exceeded for some lines.
--no-common -3
Avoid producing common words on the output. When this option is notselected, common words and whitespace are taken from new_file,unless option -2 is given, in which case common words andwhitespace are rather taken from old_file. When selected,differences are separated from one another by lines of dashes. Moreover, if this option is selected at the same time as -1 or -2, then none of the output will have any emphasis, i.e. no boldor underlining. Finally, if this option is not selected, but both -1 and -2 are, then sections of common words betweendifferences are segregated by lines of dashes.
--ignore-case -i
Do not consider case difference while comparing words. Each lower caseletter is seen as identical to its upper case equivalent for the purposeof deciding if two words are the same.
--auto-pager -A
Some initiatives which were previously automatically taken in previousversions of wdiff are now put under the control of this option. By using it, a pager is interposed whenever the wdiff output isdirected to the user's terminal. Without this option, no pager will becalled, the user is then responsible for explicitly piping wdiffoutput into a pager, if required.

The pager is selected by the value of the PAGER environmentvariable whenwdiff is run. If PAGER is not defined atrun time, then a default pager, selected at installation time, will beused instead. A defined but empty value ofPAGER means no pagerat all.

When a pager is interposed through the use of this option,one of the options -l or -t is also selected, depending onwhether the string ‘less’ appears in the pager's name or not.

It is often useful to define ‘wdiff’ as an alias for ‘wdiff-a’. However, thishides the normal wdiff behaviour. Thedefault behaviour may be restored simply by piping the output fromwdiff throughcat. This dissociates the output from theuser's terminal.

--printer -p
Use over-striking to emphasize parts of the output. Each character of thedeleted text is underlined by writing an underscore ‘ _’ first,then a backspace and then the letter to be underlined. Each character of theinserted text is emboldened by writing it twice, with a backspace inbetween. This option is not selected by default.
--less-mode -l
Use over-striking to emphasize parts of output. This option works asoption -p, but also over-strikes whitespace associated withinserted text. less shows such whitespace using reverse video. This option is not selected by default. However, it is automaticallyturned on whenever wdiff launches the pager less. Seeoption -a.

This option is commonly used in conjunction with less:

          wdiff -l old_file new_file | less

--terminal -t
Force the production of termcap strings for emphasising parts ofoutput, even if the standard output is not associated with a terminal. The ‘ TERM’ environment variable must contain the name of a valid termcap entry. If the terminal description permits, underliningis used for marking deleted text, while bold or reverse video is usedfor marking inserted text. This option is not selected by default. However, it is automatically turned on whenever wdiff launches apager, and it is known that the pager is not less. Seeoption -a.

This option is commonly used when wdiff output is not redirected,but sent directly to the user terminal, as in:

          wdiff -t old_file new_file

A common kludge uses wdiff together with the pagermore,as in:

          wdiff -t old_file new_file | more

However, some versions of more use termcap emphasis fortheir own purposes, so strange interactions are possible.

--start-delete argument -w argument
Use argument as the start delete string. This string willbe output prior to any sequence of deleted text, to mark where itstarts. By default, no start delete string is used unless there is noother means of distinguishing where such text starts; in this case thedefault start delete string is ‘ [-’.
--end-delete argument -x argument
Use argument as the end delete string. This string will beoutput after any sequence of deleted text, to mark where it ends. Bydefault, no end delete string is used unless there is no other means ofdistinguishing where such text ends; in this case the default end deletestring is -].
--start-insert argument -y argument
Use argument as the start insert string. This string willbe output prior to any sequence of inserted text, to mark where itstarts. By default, no start insert string is used unless there is noother means of distinguishing where such text starts; in this case thedefault start insert string is ‘ {+’.
--end-insert argument -z argument
Use argument as the end insert string. This string will beoutput after any sequence of inserted text, to mark where it ends. Bydefault, no end insert string is used unless there is no other means ofdistinguishing where such text ends; in this case the default end insertstring is ‘ +}’.
--avoid-wraps -n
Avoid spanning the end of line while showing deleted or inserted text. Any single fragment of deleted or inserted text spanning many lines willbe considered as being made up of many smaller fragments not containinga newline. So deleted text, for example, will have an end delete stringat the end of each line, just before the new line, and a start deletestring at the beginning of the next line. A long paragraph of insertedtext will have each line bracketed between start insert and end insertstrings. This behaviour is not selected by default.

Some choices are hard-wired into the program, but might well become optionsin later releases. For example:

  • No cluster may span a file boundary, that is, start near the end of oneinput file and continue at the beginning of the next file.
  • A cluster may have many members from the same file.
  • White space is ignored between the beginning of a line and the firstnon-white character.
  • White space is significant when embedded in a line, or when ending a line.
  • Lines having no significant part (only white lines for now) areignorable. Such ignorable lines are logically considered as notbeing part of the input files for the sake of comparisons.
  • Comments from the C language are not especially ignored. Unless ignoredfor other reasons (being white lines), they are indeed significant lines.
  • No cluster member may ever directly start nor end with ignorable lines. However, ignorable lines may still be embedded within a cluster member.
  • In the generated output, clusters containing the biggest number ofignorable lines are output first, while smaller clusters appear last. All lines pertaining to a single cluster are output together. Within acluster, members are listed in the order of the initial reading ofinput files.

Note that options -p, -t, and-[wxyz] are notmutually exclusive. If you use a combination of them, you will merelyaccumulate the effect of each. Option-l is a variant of option-p.


Previous:  mdiff invocation,Up:  mdiff

3.2 Resource considerations and efficiency

Memory consumption
mdiff can easily handle medium-sized project. For a 32 bitsarchitecture, the memory requirements may computed like this:
  • 8 bytes per file
  • 8 bytes per line
  • 4 bytes per cluster
  • 8 bytes per cluster member

Time consumption
To evaluate the speed, consider the example shown above(see mdiff), and yielding these statistics:
          Read summary: 137 files, 41975 lines
          Work summary: 439 clusters, 1608 members ...

Once many files in the memory cache, and redirecting the output to/dev/null, the processing takes 3 seconds of real time on anIntel 486/100, which looks good. I was indeed afraid of some hiddenO(n^2) behaviour1,even if the program is mostly O(n*log(n)). Maybe one willdiscover or construct cases puttingmdiff on its knees. So far,mdiff seemingly behaves well for the little problems given to it. If we devise and generate a more traditionaldiff-like output,in which all input files are relisted, this will add some time to theprocessing, but it will be only linear with regard with the total lengthof input files.

There is a clever optimized sorting algorithm for all substringsof a file, which might be generalised to handle words or lines formdiff. But since the program is already faster than we initiallyexpected, there is no emergency to resort to using such an algorithm.

Trading complexity for clarity
When lines repeat a lot, there are surprisingly many ways to relate blocksof lines, and reporting them all can make very hairy listings. Any choice about reporting similarities, or not, is somewhat arbitrary,but we ought to make some of such choices for the program to be practical. Some of these choices are detailed here.

If all members of a given cluster A are proper subsets of allmembers of another given clusterB, then cluster A is whollyforgotten. However, let's presume for example that there are more membersinA than in B. Then, some members of A necessarilyappear unrelated to any member ofB. In such case, it has beendecided more useful to report all occurrences ofA members,even those embedded within occurrences of B members. When onlyinterested in membersB, annotations pertaining to A may beperceived as clutter. However, when interested in members ofA,getting all of them is probably the most useful choice.

It sometimes happen that members of a very same cluster overlap. In thestring ‘a a a’, there are two overlapping members for the clusterrepresented by the string ‘a a’, one from the first two ‘a’,another from the last two ‘a’. In such cases, one member of suchan overlap is automatically chopped so the overlap does not occur.

White lines and items containing only delimiters are the possible source ofa lot of complexity, if these are fully taken as significant. Since thisdoes not add much to clarity, they are better ignored, usually, throughusing--ignore-blank-lines (-B) or--ignore-delimiters(-j). Increasing the value of--minimum-size=items(‘-J items’) option also cut off complexity in favor ofclarity, yet some small matches may then go unnoticed. Exactly how tobest adjust the items value is left for the user to decide.


Next:  Compatibility,Previous:  mdiff,Up:  Top

4 The diff format converter

The program unify has the purpose of manipulating context diffsand unified context diffs.unify will accept either a regularcontext diff (old- or new-style) or a unified context diff as input,and generate either a unified diff or a new-style context diff as output.

Various other options allow you to echo the non-diff (comment) lines tostderr, modify the diff by removing the comment lines, and/or tweak thediff into a format that is good for releasing patches.

I think most people prefer unified context diffs in general. But someof us just have trouble reading unidiffs, unless they get very simple. Usual context diffs show how the code wasbefore, and then, howthe code is after. Some people just prefer understanding twicethoroughly, than once fuzzily. The tool is useful for those who handlea lot of diffs from various sources, and want them in a uniform format.


Up:  unify

4.1 Invoking unify

The format for running the unify program is:

     unify option ... [file]

The program reads the diff to convert from file, or if thesource file is not mentioned, it will be read from the standard input. The default is to output the diff in the opposite style of whatever wasinput, that is, regular context diffs will become unified context diffs,and unified context diffs will become unified context diffs, but thiscan be overridden by options.

unify supports the following command line options:

--version
Merely prints the version numbers on standard output, and exits withoutdoing anything else.
--help
Merely prints a page of help on standard output, and exits without doinganything else.
--context-diffs -c
Forces context diff output.
--echo-comments -e
Echoes non-diff (comment) lines to stderr. If a comment line isbeing stripped via the -p option, it is echoed with a preceding‘ !!!’. If all comments are being stripped (via the -soption), no special designation is given.
--old-diffs -o
Is used to force a context diff to be interpreted as being of theold-style even if it has the extra trailing asterisks that normally markthe new-style. This is only needed if unify fails to work withyour version of diff.
--patch-format -p
Turns on patch-output mode. This will do two things:
  1. Transform a header like:
                   *** orig/file	Sat May  5 02:59:37 1990
                   --- ./file	Sat May  5 03:00:08 1990
    

    into a line of ‘Index: file’ — we choose the shorter name and stripa leading ‘./’ sequence if present.

  2. Strip lines that begin with ‘Only in ’, ‘Common subdir’,‘Binary files’ or ‘diff -’.


-P
Is the same as -p.
--strip-comments -s
Strips non-diff lines (comments).
--unidiffs -u
Forces unified diff output.
-U
Is the same as -up.
--use-equals -=
Will use a ‘ =’ prefix in a unified diff for lines that are commonto both files instead of using a leading space. Though this is harder toread, it is less likely to be mangled by trailing-space-stripping siteswhen posted to Usenet.


Next:  Experimental,Previous:  unify,Up:  Top

5 How mdiff differs

The GNU project already has a diff program which is part of theGNU diffutils package. There also are various non-GNUdiffprograms provided by various constructors.

There is also the well-established wdiff which usesdiff under the hood. It differs slightly fromwdiff2, its intendedmdiff-based successor.

The following sections compare mdiff specifications with both GNUdiff and withwdiff.


Next:  wdiff Compatibility,Up:  Compatibility

5.1 Differences with diff

GNU diff is a program which matured for a long while, and for whichalgorithms are based on computer science literature. It is a fast program. By comparison,mdiff is not more than a program kludged up rapidlyto satisfy a few precise needs. It only tries not being inordinately slow.

Most diff options are accepted by mdiff under the sameshort and long option names, and is able to produce resembling output,for makingmdiff easier to learn and less surprising to users. Yet, some differences exist in option decoding and output format. Sincediff andmdiff use different matching algorithms, it isvery likely that the differences will not be exactly analyzed identically.

  • A few diff options, which either accept no argument or requirea mandatory one, are implemented inmdiff as options acceptingan optional argument. This may yield some surprises, for example,-c4bir would be accepted bydiff and rejected bymdiff, yet it may rewritten-birc4 for both. See below.
  • Options -c and -u indiff ask for regular contextand unified context output, respectively, without specifying the numberof lines in the context.diff has ‘-C number’ and‘-U number’ options for asking for regular or unified contextdiffs with number context lines. If -c4 asks for four linesof context, the ‘4’ is not really an argument of-c, and thisis really interpreted as ‘-c -4’, where-number ismeant to be a deprecated option for choosing the number of context lines,option whichmdiff does not implement. In mdiff,-cand -u are really two options which are allowed to receive anoptional argument, so the number of lines may, or may not be given, atthe choice of the user. Inmdiff, options -C and-Uare completely equivalent to -c and-u, and are providedonly for the sake of compatibility.
  • Option -v in diff means--version, while itmeans --verbose inmdiff. There is no short form for--version inmdiff.


Previous:  diff Compatibility,Up:  Compatibility

5.2 Differences with wdiff

Even if mdiff is meant to fully supportwdiff, options havebeen shuffled around somdiff could better merge both diffandwdiff options in a common scheme. diff habits werealmost always favored in this option reorganisation.

wdiff2 is now a mere front-end to mdiff that onlyrewrites the options. The following notes apply.

  • Some options are just transmitted unchanged, these are -1,-2,-3 and -i.
  • Option -c also gets turned into -i, to be compatible withwdiff versions up to ‘0.4’.
  • Simple option -a in wdiff becomes-A in mdiff,-l becomes-k, -n becomes-m, -p becomes-o,-s becomes -v and-t becomes -z.
  • Options introducing strings, which are -w,-x, -yand-z in wdiff, respectively become-Y, -Z,-Q and-R in mdiff.
  • Options -C, -h and-v are processed directly bywdiff and are not transmitted tomdiff.
  • Further, the -C option of wdiff has no equivalent inmdiff.
  • A new option -q inhibits the message which explains howmdiffmight have been directly called.
  • The option --diff-input (-d) fromwdiffisn't supported by wdiff2 (yet).


Previous:  Compatibility,Up:  Top

6 Experimental programs

The GNU wdiff source package contains sources for a number of toolsbesides wdiff itself. These are considered experimental: they might work for you, but theymight just as well fail. The following programs are considered experimental:

  • mdiff
  • wdiff2
  • unify

Building these applications can be configured at build time by passing--with-experimental to theconfigure script.

For this build, they have been enabled.If you encounter a bug in an experimental program, the maintainers wouldstill like to learn about it, but there is a greater chance that theydecide not to fix such issues unless you provide a patch as well.


Up:  Experimental

6.1 History of the Experimental programs

Many userssuggested features, which were in turn inviting for the integration ofwdiff into GNU diffutils. Collaboration proved to be ratherdifficult. After a few years, thewdiff author finally gave inand createdmdiff as a way to break out of the situation and forbecoming able to proceed with users' suggestions.

Before mdiff and the new wdiff2 based on it wereofficially released, the original author resigned maintainership. The new maintainers had little experience with the code, and thereforedecided to mark it experimental. That way, the code wouldn't be lost,but it would be clear that it wasn't as testes as the good oldwdiff command.


Footnotes

[1] n is the total number of lines.


最后

以上就是体贴大侠为你收集整理的wdiffA word difference finder (and others)的全部内容,希望文章能够帮你解决wdiffA word difference finder (and others)所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(34)

评论列表共有 0 条评论

立即
投稿
返回
顶部