OCR
Written by grashdur the 28 Feb 08 at 19:58.
Global category: Office.
Won't implement
I love the fact that I can plug into my already-existing scanner and start scanning. What I don't like is the lack of OCR capability to convert text images to text data. From the support forums I understand that there are OCR possibilities out there on the internet, but for someone who is not a programmer, they are not usable at all.
(Also, for whatever reason, the XSane program is no longer working since my upgrade to Feisty Fawn.)
891
votes
1073
1
182
4
votes
4
1
0
Solution #2:
OCR within OOO
Why does Open Office not contain an OCR section/Plugin?
Why does Open Office not contain an OCR section/Plugin?
1
votes
3
0
2
Solution #3:
New front-end for Tesseract
Create a graphical front-end for Tesseract OCR engine which is already in the repository.
Canonical does not provide updates for Tesseract. Is this a problem in itself?
Create a graphical front-end for Tesseract OCR engine which is already in the repository.
Canonical does not provide updates for Tesseract. Is this a problem in itself?
2
votes
3
0
1
Solution #4:
EASY-OCR
Dear friends
easy-ocr is a compilation of the best free ocr engines in linux it shows 99% accuracy and even a blind person can scan and read books very easily
you can download from
http://code.google.com/p/easy-ocr
Easy installation
In this mode after entering the folder select easy installation and enter and tab to select run and then enter password and wait . system will reboot after installation automatically.
Text mode installation
after entering the easyocr folder select text mode and enter. then tab to run in terminal. then enter and do as instructed.
Steps to be followed.
be careful to select a scanner which has full support of x sane.
1 after rebooting and connecting the scanner to the computer press super+x to open xsane. 2 tab and come to -- directory /home/username/OCR/1.png and delete the word username and write your username.
3 change colour to lineart,binary or gray (if there is no option as written above you can proceed with colour option)
4 change brightness and resolution as needed resolution usually 300.
5 changing the rotation. if you are keeping the book or letter on the scanner on 90 or 270 degree. please do the following. alt+tab to go to preview menu and press shift tab to reach 000 combo box and change it to 90 or 270 as needed.
a caution for visually impaired, please select 90 which comes below 000. Though, all the 4 changes will remain the same you will have to set the rotation each time you open x sane.
6 alt tab again and come back to scanner menu. and press control+enter for start scanning. now go on scanning as many number of pages as you wish.
7 for converting and reading , press super+f9 and enter first page number and last page number as asked by the programme. after the text appeared you can use the reading key to read it. please note that it is not the page number of the book but the number of the page in the directory /home/username/OCR/1.png
8 if you are reading the document later, use super+f, go to output file to read your text material. you have pages and entire document here.
9 you can clear the output folder by pressing super+delete. .
10 there is facility for converting text in to wav format. super+a will help you for it.and the output will appear on Desktop.
Special features
two engines.
easyocr 1.5 has two engines. you can select engine1 by pressing super+f1 (window key+f1) and engine2 super+f2. engine1 is good for fast text conversion, and picture skipping engine2 is , good for layout analysis. both engines are almost 99 percent accurate in picking . no limitation to number of pages and text conversion.
Now, one can go on scanning and convert the text by following steps
1 after scanning press supper+f9. 2 easyocr will ask you to enter the number of the beginning page and then it will ask you to enter the number of end page. enter the number and enter. then conversion will start and it is noteworthy that orca will announce the number of the page being converted.
now after conversion the text will appear and you can press the add button to read.
output folder is now clean. at any time you can go to your text material by pressing super+f and then output folder will appear. from the folder you can select any page by pressing the number of the page and you can select your full text by pressing first number of the page +dash.
you can clean the output folder by pressing super+delete.
reading letters or checking output quality.
super+1 will always read the first page in the directory.after opening x sane by pressing super+x you can tab to the directory and change the page number to 1.png and again tab to plus1 combo box and press space and bring it to 0 and now you will remain at the same page even after scanning.
wav conversion. you can convert the text in to wav by pressing super+a. As in the case of text conversion, you can enter the page number and output will be saved on the desktop.
easyocr is made as user friendly as possible. you can make it more friendly through your suggestions. please contact the following emails. saatyan.kfb@gmail.com and nalin4linux77@gmail.com
you can download from
http://code.google.com/p/easy-ocr
Dear friends
easy-ocr is a compilation of the best free ocr engines in linux it shows 99% accuracy and even a blind person can scan and read books very easily
you can download from http://code.google.com/p/easy-ocr
Easy installation
In this mode after entering the folder select easy installation and enter and tab to select run and then enter password and wait . system will reboot after installation automatically.
Text mode installation
after entering the easyocr folder select text mode and enter. then tab to run in terminal. then enter and do as instructed.
Steps to be followed.
be careful to select a scanner which has full support of x sane.
1 after rebooting and connecting the scanner to the computer press super+x to open xsane. 2 tab and come to -- directory /home/username/OCR/1.png and delete the word username and write your username.
3 change colour to lineart,binary or gray (if there is no option as written above you can proceed with colour option)
4 change brightness and resolution as needed resolution usually 300.
5 changing the rotation. if you are keeping the book or letter on the scanner on 90 or 270 degree. please do the following. alt+tab to go to preview menu and press shift tab to reach 000 combo box and change it to 90 or 270 as needed.
a caution for visually impaired, please select 90 which comes below 000. Though, all the 4 changes will remain the same you will have to set the rotation each time you open x sane.
6 alt tab again and come back to scanner menu. and press control+enter for start scanning. now go on scanning as many number of pages as you wish.
7 for converting and reading , press super+f9 and enter first page number and last page number as asked by the programme. after the text appeared you can use the reading key to read it. please note that it is not the page number of the book but the number of the page in the directory /home/username/OCR/1.png
8 if you are reading the document later, use super+f, go to output file to read your text material. you have pages and entire document here.
9 you can clear the output folder by pressing super+delete. .
10 there is facility for converting text in to wav format. super+a will help you for it.and the output will appear on Desktop.
Special features
two engines.
easyocr 1.5 has two engines. you can select engine1 by pressing super+f1 (window key+f1) and engine2 super+f2. engine1 is good for fast text conversion, and picture skipping engine2 is , good for layout analysis. both engines are almost 99 percent accurate in picking . no limitation to number of pages and text conversion.
Now, one can go on scanning and convert the text by following steps
1 after scanning press supper+f9. 2 easyocr will ask you to enter the number of the beginning page and then it will ask you to enter the number of end page. enter the number and enter. then conversion will start and it is noteworthy that orca will announce the number of the page being converted.
now after conversion the text will appear and you can press the add button to read.
output folder is now clean. at any time you can go to your text material by pressing super+f and then output folder will appear. from the folder you can select any page by pressing the number of the page and you can select your full text by pressing first number of the page +dash.
you can clean the output folder by pressing super+delete.
reading letters or checking output quality.
super+1 will always read the first page in the directory.after opening x sane by pressing super+x you can tab to the directory and change the page number to 1.png and again tab to plus1 combo box and press space and bring it to 0 and now you will remain at the same page even after scanning.
wav conversion. you can convert the text in to wav by pressing super+a. As in the case of text conversion, you can enter the page number and output will be saved on the desktop.
easyocr is made as user friendly as possible. you can make it more friendly through your suggestions. please contact the following emails. saatyan.kfb@gmail.com and nalin4linux77@gmail.com
you can download from http://code.google.com/p/easy-ocr
0
votes
0
0
0
Solution #5:
Linux-intelligent-ocr-solution
LIOS is a free and open source software for converting print in to text using either scanner or a camera. It can also produce text out of scanned images from other sources. Program is given total accessibility for visually impaired. LIOS is written in python and we release it under GPL3 license. LIOS will work with Debian based operating systems. LIOS is an effort from the easy-ocr development team. There are great many possibilities for this program. Feedback is the key to it. expecting your feedback. nalin4linux77@gmail.com and sath.linux@gmail.com.
HOW TO INSTALL
Download deb file from here
http://linux-intelligent-ocr-solution.googlecode.com/ download the latest deb package and install
What is new in LIOS-1.2
1 Cam-Scan,
2 Cam-Reader,
3 Scan-to-image-only,
4 Scan-to-images-repeatedly,
5 Introduction of py-sane, Glaid library make the program faster and efficient,
6 Multiple arguments are handled effectively,
7 Ocr a single Image,
8 Artha shortcut (alt+control+W),
9 Beta version of spell-checker,
10 Provision for submitting issues in the About Dialog.
Features
1 Single scan & Repeated Scanning,
2 Ocr Folder,
3 Ocr Pdf,
4 Ocr image only,
5 Cam-Scan and Cam-Reader,
6 Scan-for-image-only & repeatedly,
7 24 Language support (Given at the end),
8 Full GUI environment,
9 Selection of starting page number, page numbering mode and number of pages to scan,
10 Selection of Scan area, brightness, resolution and time between repeated scanning,
11 Full Auto Rotation,
12 Brightness optimizer,
13 Audio converter,
14 Easily Accessible Preferences Window,
15 5 OCR Engines (OCROPUS,CUNEIFORM,TESSERACT,GOCR,OCRAD),
16 Good text manipulation with Find, Go-To-Page, Go-To-Line, Append file, Punch File.
17 Display Preferences for Low vision,
18 Dictionary Support for English(Artha)
19 Beta version of spell-checker,
20 Provision for submitting issues,
21 And more features are in the preferences.
How to start using LIOS.
1. Scanning.
In order to start new scan, first press ctrl+n and then press f9 for single scan or ctrl+f9 for repeated scanning. To set the scanning preferences press ctrl+p and set the starting page number, Mode of page numbering, double page mode if you intend to keep 2 pages at a time, rotation to select the way in which you want the program to rotate the images before conversion. In full automatic rotation mode, one can keep the book in 00 90 180 and 270 degree angle. In partial rotation mode program will scan once to find out the position of the book and then the rotation will be kept. In manual mode one should select the angle. partial and manual mode is faster than full auto rotation mode in ocr process. One can select the number of pages to be scanned at a stretch by setting number of pages in the case of repeated scanning. One can stop all scanning process by pressing ctrl f4.
2. Cam-scan.
one can now use Hovercam or a Webcam to produce text in LIOS. Adjustments with these devices can be made using LIOS-cam-preferences in edit menu. This feature will help to read books and other printed materials such as visiting cards currency and like and also it makes the ocr process very fast and accurate. Please be specific to use devices with auto focusing facility. remember that there is no autorotation in this utility.so for the same reason, support of a stand for the webcam will be highly appreciated.
3. Cam-reader.
is the utility which will give a continuous output as one moves the webcam. First it will create the image and then will produce the text and it will start reading. After the completion of reading, it will repeat the process automatically. In cam-scan, one has to take the photo and it will be converted in to text.
4. Ocr Image.
LIOS can convert image file to text which is in jpg, tif, png, pnm and bmp.
5. Ocr folder.
LIOS can convert scanned images from other sources. It can convert jpg, jpeg, tif, tiff png, pnm, formats. To convert the images in a folder, select scan from folder option from scan menu and then select the input folder.
6. Ocr Pdf file.
Select Ocr pdf from scan menu and then select the input file. It is recommended that one can use ocropus as engine more efficiently in pdf conversion.
7. scan for image only and scan for images only repeatedly.
Help one to scan only images and it will give the user opportunity to utilize different ocr engines conveniently. Also it avoids delay between each scan if one does not want to listen to the output. Images will be saved in LIOS or one can choose his own destination. Now conversion can be done using folder option.
8. Brightness checker.
To set a n exact value of brightness or threshold is the best way to ensure maximum efficiency out of ocr engines. To find out the best value, go to tools menu and select brightness checker. This utility will scan for 15 or 17 times to complete the process. After the process, number of words detected at different values will be shone in tabs. If
LIOS is a free and open source software for converting print in to text using either scanner or a camera. It can also produce text out of scanned images from other sources. Program is given total accessibility for visually impaired. LIOS is written in python and we release it under GPL3 license. LIOS will work with Debian based operating systems. LIOS is an effort from the easy-ocr development team. There are great many possibilities for this program. Feedback is the key to it. expecting your feedback. nalin4linux77@gmail.com and sath.linux@gmail.com.
HOW TO INSTALL
Download deb file from here http://linux-intelligent-ocr-solution.googlecode.com/ download the latest deb package and install
What is new in LIOS-1.2
1 Cam-Scan,
2 Cam-Reader,
3 Scan-to-image-only,
4 Scan-to-images-repeatedly,
5 Introduction of py-sane, Glaid library make the program faster and efficient,
6 Multiple arguments are handled effectively,
7 Ocr a single Image,
8 Artha shortcut (alt+control+W),
9 Beta version of spell-checker,
10 Provision for submitting issues in the About Dialog.
Features
1 Single scan & Repeated Scanning,
2 Ocr Folder,
3 Ocr Pdf,
4 Ocr image only,
5 Cam-Scan and Cam-Reader,
6 Scan-for-image-only & repeatedly,
7 24 Language support (Given at the end),
8 Full GUI environment,
9 Selection of starting page number, page numbering mode and number of pages to scan,
10 Selection of Scan area, brightness, resolution and time between repeated scanning,
11 Full Auto Rotation,
12 Brightness optimizer,
13 Audio converter,
14 Easily Accessible Preferences Window,
15 5 OCR Engines (OCROPUS,CUNEIFORM,TESSERACT,GOCR,OCRAD),
16 Good text manipulation with Find, Go-To-Page, Go-To-Line, Append file, Punch File.
17 Display Preferences for Low vision,
18 Dictionary Support for English(Artha)
19 Beta version of spell-checker,
20 Provision for submitting issues,
21 And more features are in the preferences.
How to start using LIOS.
1. Scanning.
In order to start new scan, first press ctrl+n and then press f9 for single scan or ctrl+f9 for repeated scanning. To set the scanning preferences press ctrl+p and set the starting page number, Mode of page numbering, double page mode if you intend to keep 2 pages at a time, rotation to select the way in which you want the program to rotate the images before conversion. In full automatic rotation mode, one can keep the book in 00 90 180 and 270 degree angle. In partial rotation mode program will scan once to find out the position of the book and then the rotation will be kept. In manual mode one should select the angle. partial and manual mode is faster than full auto rotation mode in ocr process. One can select the number of pages to be scanned at a stretch by setting number of pages in the case of repeated scanning. One can stop all scanning process by pressing ctrl f4.
2. Cam-scan.
one can now use Hovercam or a Webcam to produce text in LIOS. Adjustments with these devices can be made using LIOS-cam-preferences in edit menu. This feature will help to read books and other printed materials such as visiting cards currency and like and also it makes the ocr process very fast and accurate. Please be specific to use devices with auto focusing facility. remember that there is no autorotation in this utility.so for the same reason, support of a stand for the webcam will be highly appreciated.
3. Cam-reader.
is the utility which will give a continuous output as one moves the webcam. First it will create the image and then will produce the text and it will start reading. After the completion of reading, it will repeat the process automatically. In cam-scan, one has to take the photo and it will be converted in to text.
4. Ocr Image.
LIOS can convert image file to text which is in jpg, tif, png, pnm and bmp.
5. Ocr folder.
LIOS can convert scanned images from other sources. It can convert jpg, jpeg, tif, tiff png, pnm, formats. To convert the images in a folder, select scan from folder option from scan menu and then select the input folder.
6. Ocr Pdf file.
Select Ocr pdf from scan menu and then select the input file. It is recommended that one can use ocropus as engine more efficiently in pdf conversion.
7. scan for image only and scan for images only repeatedly.
Help one to scan only images and it will give the user opportunity to utilize different ocr engines conveniently. Also it avoids delay between each scan if one does not want to listen to the output. Images will be saved in LIOS or one can choose his own destination. Now conversion can be done using folder option.
8. Brightness checker.
To set a n exact value of brightness or threshold is the best way to ensure maximum efficiency out of ocr engines. To find out the best value, go to tools menu and select brightness checker. This utility will scan for 15 or 17 times to complete the process. After the process, number of words detected at different values will be shone in tabs. If