浅谈OCR之Tesseract

57 阅读 0 评论 38 点赞

我是靠谱客的博主单薄睫毛，这篇文章主要介绍浅谈OCR之Tesseract，现在分享给大家，希望可以做个参考。

光学字符识别(OCR,Optical Character Recognition)是指对文本资料进行扫描，然后对图像文件进行分析处理，获取文字及版面信息的过程。OCR技术非常专业，一般多是印刷、打印行业的从业人员使用，可以快速的将纸质资料转换为电子资料。关于中文OCR，目前国内水平较高的有清华文通、汉王、尚书，其产品各有千秋，价格不菲。国外OCR发展较早，像一些大公司，如IBM、微软、HP等，即使没有推出单独的OCR产品，但是他们的研发团队早已掌握核心技术，将OCR功能植入了自身的软件系统。对于我们程序员来说，一般用不到那么高级的，主要在开发中能够集成基本的OCR功能就可以了。这两天我查找了很多免费OCR软件、类库，特地整理一下，今天首先来谈谈Tesseract，下一次将讨论下Onenote 2010中的OCR API实现。可以在这里查看OCR技术的发展简史。

测试代码下载

转载请注明出处：http://www.cnblogs.com/brooks-dotnet/archive/2010/10/05/1844203.html

1、Tesseract概述

Tesseract的OCR引擎最先由HP实验室于1985年开始研发，至1995年时已经成为OCR业内最准确的三款识别引擎之一。然而，HP不久便决定放弃OCR业务，Tesseract也从此尘封。

数年以后，HP意识到，与其将Tesseract束之高阁，不如贡献给开源软件业，让其重焕新生－－2005年，Tesseract由美国内华达州信息技术研究所获得，并求诸于Google对Tesseract进行改进、消除Bug、优化工作。

Tesseract目前已作为开源项目发布在Google Project，其项目主页在这里查看，其最新版本3.0已经支持中文OCR，并提供了一个命令行工具。本次我们来测试一下Tesseract 3.0，由于命令行对最终用户不太友好，我用WPF简单封装了一下，就可以方便的进行中文OCR了。

1.1、首先到Tesseract项目主页下载命令行工具、源代码、中文语言包：

1.2、命令行工具解压缩后如下（不含1.jpg、1.txt）：

1.3、为了进行中文OCR，将简体中文语言包复制到【tessdata】目录下：

1.4、在DOS下切换到Tesseract的命令行目录，查看一下tesseract.exe的命令格式：

Imagename为待OCR的图片，outputbase为OCR后的输出文件，默认是文本文件（.txt），lang为使用的语言包，configfile为配置文件。

1.5、下面来测试一下，准备一张jpg格式的图片，这里我是放到了和Tesseract同一个目录中：

输入：tesseract.exe 1.jpg 1 -l chi_sim，然后回车，几秒钟就OCR完成了：

这里注意命令的格式：imagename要加上扩展名.jpg，输出文件和语言包不需要加扩展名。

OCR结果：

可以看到结果不是很理想，中文识别还说的过去，但是英文、数字大都乱码。不过作为老牌的OCR引擎，能做到这种程度已经相当不错了，期待Google的后续升级吧，支持一下。

2、使用WPF封装Tesseract命令行

2.1、鉴于命令行书写容易出错，且对最终用户很不友好，我做了一个简单的WPF小程序，将Tesseract的命令行封装了一下：

左边选择图片、预览，右边选择输出目录，显示OCR结果，支持本地及网络图片的预览。

2.2、为了使得图片预览支持缩放、移动，原本打算使用微软的Zoom It API，可惜不支持WPF，于是使用了一个第三方的类：

图片缩放、移动工具类

using System;
using System.Windows.Controls;
using System.Windows.Input;
using System.Windows.Media.Animation;
using System.Windows;
using System.Windows.Media;

namespace PanAndZoom
{
     public class PanAndZoomViewer : ContentControl
    {
         public double DefaultZoomFactor { get ; set ; }
         private FrameworkElement source;
         private Point ScreenStartPoint = new Point( 0 , 0 );
         private TranslateTransform translateTransform;
         private ScaleTransform zoomTransform;
         private TransformGroup transformGroup;
         private Point startOffset;

         public PanAndZoomViewer()
        {
             this .DefaultZoomFactor = 1.4 ;
        }

         public override void OnApplyTemplate()
        {
             base .OnApplyTemplate();
            Setup( this );
        }

         void Setup(FrameworkElement control)
        {
             this .source = VisualTreeHelper.GetChild( this , 0 ) as FrameworkElement;

             this .translateTransform = new TranslateTransform();
             this .zoomTransform = new ScaleTransform();
             this .transformGroup = new TransformGroup();
             this .transformGroup.Children.Add( this .zoomTransform);
             this .transformGroup.Children.Add( this .translateTransform);
             this .source.RenderTransform = this .transformGroup;
             this .Focusable = true ;
             this .KeyDown += new KeyEventHandler(source_KeyDown);
             this .MouseMove += new MouseEventHandler(control_MouseMove);
             this .MouseDown += new MouseButtonEventHandler(source_MouseDown);
             this .MouseUp += new MouseButtonEventHandler(source_MouseUp);
             this .MouseWheel += new MouseWheelEventHandler(source_MouseWheel);
        }

         void source_KeyDown( object sender, KeyEventArgs e)
        {
             // hit escape to reset everything
             if (e.Key == Key.Escape) Reset();
        }

         void source_MouseWheel( object sender, MouseWheelEventArgs e)
        {
             // zoom into the content.  Calculate the zoom factor based on the direction of the mouse wheel.
             double zoomFactor = this .DefaultZoomFactor;
             if (e.Delta <= 0 ) zoomFactor = 1.0 / this .DefaultZoomFactor;
             // DoZoom requires both the logical and physical location of the mouse pointer
            var physicalPoint = e.GetPosition( this );
            DoZoom(zoomFactor, this .transformGroup.Inverse.Transform(physicalPoint), physicalPoint);

        }

         void source_MouseUp( object sender, MouseButtonEventArgs e)
        {
             if ( this .IsMouseCaptured)
            {
                 // we're done.  reset the cursor and release the mouse pointer
                 this .Cursor = Cursors.Arrow;
                 this .ReleaseMouseCapture();
            }
        }

         void source_MouseDown( object sender, MouseButtonEventArgs e)
        {
             // Save starting point, used later when determining how much to scroll.
             this .ScreenStartPoint = e.GetPosition( this );
             this .startOffset = new Point( this .translateTransform.X, this .translateTransform.Y);
             this .CaptureMouse();
             this .Cursor = Cursors.ScrollAll;
        }

         void control_MouseMove( object sender, MouseEventArgs e)
        {
             if ( this .IsMouseCaptured)
            {
                 // if the mouse is captured then move the content by changing the translate transform.
                 // use the Pan Animation to animate to the new location based on the delta between the
                 // starting point of the mouse and the current point.
                var physicalPoint = e.GetPosition( this );
                 this .translateTransform.BeginAnimation(TranslateTransform.XProperty, CreatePanAnimation(physicalPoint.X - this .ScreenStartPoint.X + this .startOffset.X), HandoffBehavior.Compose);
                 this .translateTransform.BeginAnimation(TranslateTransform.YProperty, CreatePanAnimation(physicalPoint.Y - this .ScreenStartPoint.Y + this .startOffset.Y), HandoffBehavior.Compose);
            }
        }

         /// <summary> Helper to create the panning animation for x,y coordinates. </summary>
         /// <param name="toValue"> New value of the coordinate. </param>
         /// <returns> Double animation </returns>
         private DoubleAnimation CreatePanAnimation( double toValue)
        {
            var da = new DoubleAnimation(toValue, new Duration(TimeSpan.FromMilliseconds( 300 )));
            da.AccelerationRatio = 0.1 ;
            da.DecelerationRatio = 0.9 ;
            da.FillBehavior = FillBehavior.HoldEnd;
            da.Freeze();
             return da;
        }

         /// <summary> Helper to create the zoom double animation for scaling. </summary>
         /// <param name="toValue"> Value to animate to. </param>
         /// <returns> Double animation. </returns>
         private DoubleAnimation CreateZoomAnimation( double toValue)
        {
            var da = new DoubleAnimation(toValue, new Duration(TimeSpan.FromMilliseconds( 500 )));
            da.AccelerationRatio = 0.1 ;
            da.DecelerationRatio = 0.9 ;
            da.FillBehavior = FillBehavior.HoldEnd;
            da.Freeze();
             return da;
        }

         /// <summary> Zoom into or out of the content. </summary>
         /// <param name="deltaZoom"> Factor to mutliply the zoom level by. </param>
         /// <param name="mousePosition"> Logical mouse position relative to the original content. </param>
         /// <param name="physicalPosition"> Actual mouse position on the screen (relative to the parent window) </param>
         public void DoZoom( double deltaZoom, Point mousePosition, Point physicalPosition)
        {
             double currentZoom = this .zoomTransform.ScaleX;
            currentZoom *= deltaZoom;
             this .translateTransform.BeginAnimation(TranslateTransform.XProperty, CreateZoomAnimation( - 1 * (mousePosition.X * currentZoom - physicalPosition.X)));
             this .translateTransform.BeginAnimation(TranslateTransform.YProperty, CreateZoomAnimation( - 1 * (mousePosition.Y * currentZoom - physicalPosition.Y)));
             this .zoomTransform.BeginAnimation(ScaleTransform.ScaleXProperty, CreateZoomAnimation(currentZoom));
             this .zoomTransform.BeginAnimation(ScaleTransform.ScaleYProperty, CreateZoomAnimation(currentZoom));
        }

         /// <summary> Reset to default zoom level and centered content. </summary>
         public void Reset()
        {
             this .translateTransform.BeginAnimation(TranslateTransform.XProperty, CreateZoomAnimation( 0 ));
             this .translateTransform.BeginAnimation(TranslateTransform.YProperty, CreateZoomAnimation( 0 ));
             this .zoomTransform.BeginAnimation(ScaleTransform.ScaleXProperty, CreateZoomAnimation( 1 ));
             this .zoomTransform.BeginAnimation(ScaleTransform.ScaleYProperty, CreateZoomAnimation( 1 ));
        }
    }
}

2.3、除了使用鼠标。还可以使用滚动条调节图片预览效果：

数据绑定

             < WrapPanel Grid.Row ="2" Grid.Column ="0" >
                 < Label Name ="lab长度" Content ="长度：" Margin ="3" />
                 < Slider Name ="sl长度" MinWidth ="50" Margin ="3" VerticalAlignment ="Center" Maximum ="400" Value =" {Binding ElementName=img图片, Path=Width, Mode=TwoWay} " />

                 < Label Name ="lab宽度" Content ="宽度：" Margin ="3" />
                 < Slider Name ="sl宽度" MinWidth ="50" Margin ="3" VerticalAlignment ="Center" Maximum ="400" Value =" {Binding ElementName=img图片, Path=Height, Mode=TwoWay} " />

                 < Label Name ="lab透明度" Content ="透明度：" Margin ="3" />
                 < Slider Name ="sl透明度" MinWidth ="50" Margin ="3" VerticalAlignment ="Center" Maximum ="1" Value =" {Binding ElementName=img图片, Path=Opacity, Mode=TwoWay} " />

                 < Label Name ="lab拉伸方式" Content ="拉伸方式：" Margin ="3" />
                 < ComboBox Name ="txt拉伸方式" Margin ="3" MinWidth ="85" >
                     < ComboBoxItem Content ="Fill" />
                     < ComboBoxItem Content ="None" IsSelected ="True" />
                     < ComboBoxItem Content ="Uniform" />
                     < ComboBoxItem Content ="UniformToFill" />
                 </ ComboBox >
             </ WrapPanel >

             < local:PanAndZoomViewer Grid.Row ="3" Grid.Column ="0" Height ="300" Margin ="3" >
                 < Image Name ="img图片" Stretch =" {Binding ElementName=txt拉伸方式, Path=Text, Mode=TwoWay} " />
             </ local:PanAndZoomViewer >

2.4、由于Tesseract命令行不支持直接OCR网络图片，故先下载：

图片下载

         private void fnStartDownload( string v_strImgPath, string v_strOutputDir, out string v_strTmpPath)
        {
             int n = v_strImgPath.LastIndexOf( ' / ' );
             string URLAddress = v_strImgPath.Substring( 0 , n);
             string fileName = v_strImgPath.Substring(n + 1 , v_strImgPath.Length - n - 1 );
             this .__OutputFileName = v_strOutputDir + " \ " + fileName.Substring( 0 , fileName.LastIndexOf( " . " ));

             if ( ! Directory.Exists(System.Configuration.ConfigurationManager.AppSettings[ " tmpPath " ]))
            {
                Directory.CreateDirectory(System.Configuration.ConfigurationManager.AppSettings[ " tmpPath " ]);
            }

             string Dir = System.Configuration.ConfigurationManager.AppSettings[ " tmpPath " ];
            v_strTmpPath = Dir + " \ " + fileName;

            WebRequest myre = WebRequest.Create(URLAddress);
            client.DownloadFile(v_strImgPath, v_strTmpPath);

             // Stream str = client.OpenRead(v_strImgPath);
             // StreamReader reader = new StreamReader(str);
             // byte[] mbyte = new byte[Int32.Parse(System.Configuration.ConfigurationManager.AppSettings["MaxDownloadImgLength"])];
             // int allmybyte = (int)mbyte.Length;
             // int startmbyte = 0;
             // while (allmybyte > 0)
             // {
             //     int m = str.Read(mbyte, startmbyte, allmybyte);
             //     if (m == 0)
             //     {
             //         break;
             //     }

             //     startmbyte += m;
             //     allmybyte -= m;
             // }

             // FileStream fstr = new FileStream(v_strTmpPath, FileMode.Create, FileAccess.Write);
             // fstr.Write(mbyte, 0, startmbyte);
             // str.Close();
             // fstr.Close();
        }

2.5、使用Process来调用Tesseract命令行：

调用Tesseract命令行

         private void fnOCR( string v_strTesseractPath, string v_strSourceImgPath, string v_strOutputPath, string v_strLangPath)
        {
             using (Process process = new System.Diagnostics.Process())
            {
                process.StartInfo.FileName = v_strTesseractPath;
                process.StartInfo.Arguments = v_strSourceImgPath + " " + v_strOutputPath + " -l " + v_strLangPath;
                process.StartInfo.UseShellExecute = false ;
                process.StartInfo.CreateNoWindow = true ;
                process.StartInfo.RedirectStandardOutput = true ;
                process.Start();
                process.WaitForExit();
            }
        }

2.6、测试本地图片：

2.7、测试网络图片：

小结：

本次我们简单讨论了下Tesseract的用法，作为一款开源、免费的OCR引擎，能够支持中文十分难得。虽然其识别效果不是很理想，但是对于要求不高的中小型项目来说，已经足够用了。这里有一份免费OCR工具列表，感兴趣的朋友可以研究一下。下一次将测试一下Onenote 2010中OCR功能，以及如何调用其API，为项目所用。

转载于:https://www.cnblogs.com/brooks-dotnet/archive/2010/10/05/1844203.html