Rendering text question (context is MSWin UI Automation)

B

Boris Borcic

Hello,

I am trying to use UI Automation to drive an MS Windows app (with pywinauto).

I need to scrape the app's window contents and use some form of OCR to get at
the texts (pywinauto can't get at them).

As an alternative to integrating an OCR engine, and since I know the fonts and
sizes used to write on the app's windows, I reasoned that I could base a simple
text recognition module on the capability to drive MSWindows text rendering - eg
to generate pixmaps of texts I expect to find in the driven app's windows, exact
to the pixel.

The advantage of that approach would be exactitude and self-containment.

I've verified manually inside an Idle window, that indeed I could produce
pixmaps of expected app texts, exact to the pixel (with Tkinter+screen capture
at least).

I could use help to turn this into a programmable capability, ie : A simple -
with Tkinter or otherwise - way to wrap access to the MS Windows UI text
rendering engine, as a function that would return a picture of rendered text,
given a string, a font, a size and colors ?

And ideally, without interfering with screen contents ?

Thanks in advance for any guidance,

Boris Borcic
 
C

Chris Mellon

Hello,

I am trying to use UI Automation to drive an MS Windows app (with pywinauto).

I need to scrape the app's window contents and use some form of OCR to get at
the texts (pywinauto can't get at them).

As an alternative to integrating an OCR engine, and since I know the fonts and
sizes used to write on the app's windows, I reasoned that I could base a simple
text recognition module on the capability to drive MSWindows text rendering - eg
to generate pixmaps of texts I expect to find in the driven app's windows, exact
to the pixel.

The advantage of that approach would be exactitude and self-containment.

I've verified manually inside an Idle window, that indeed I could produce
pixmaps of expected app texts, exact to the pixel (with Tkinter+screen capture
at least).

I could use help to turn this into a programmable capability, ie : A simple -
with Tkinter or otherwise - way to wrap access to the MS Windows UI text
rendering engine, as a function that would return a picture of rendered text,
given a string, a font, a size and colors ?

And ideally, without interfering with screen contents ?

Thanks in advance for any guidance,

Boris Borcic

There are actually several different text rendering methods (and 2 or
more totally different engines) and they will give different results,
so if you want a fully generic solution that could be quite difficult.
However, it sounds like this is for a specific purpose.

Using the pywin32 modules to directly access the appropriate windows
API calls will be the most accurate. It will be fairly complicated and
you'll require knowledge of the win32 api to do it. You could also use
wxPython, which uses what will probably be the right API and will take
less code than win32 will. I'd suggest this if you aren't familiar
with the win32 API.

PyQt uses it's own text rendering engine, as far as I know, so it is
less likely to generate correct bitmaps. I'm not sure at what level
tkinters text drawing is done.

Using either win32 or wxPython you will be able to produce bitmaps
directly, without needing to create a visible window.


Some quick & dirty wxPython code

def getTextBitmap(text, font, fgcolor, bgcolor):
dc = wx.MemoryDC()
dc.SetFont(font)
width, height= dc.GetTextExtent(text)
bmp = wx.EmptyBitmap(width, height)
dc.SelectObject(bmp)
dc.SetBackground(wx.Brush(bgcolor))
dc.Clear()
dc.SetTextBackground(bgcolor)
dc.SetTextForeground(fgcolor)
dc.DrawText(text, 0, 0)
dc.SelectObject(wx.NullBitmap)
return bmp


Raw win32 code will look similar but will be much more verbose.
 
I

imageguy

I am trying to use UI Automation to drive an MS Windows app (with pywinauto).
I need to scrape the app's window contents and use some form of OCR to get at
the texts (pywinauto can't get at them).

As an alternative to integrating an OCR engine, and since I know the fonts and
sizes used to write on the app's windows, I reasoned that I could base a simple
text recognition module on the capability to drive MSWindows text rendering - eg
to generate pixmaps of texts I expect to find in the driven app's windows, exact
to the pixel.

The advantage of that approach would be exactitude and self-containment.

I've verified manually inside an Idle window, that indeed I could produce
pixmaps of expected app texts, exact to the pixel (with Tkinter+screen capture
at least).

I could use help to turn this into a programmable capability, ie : A simple -
with Tkinter or otherwise - way to wrap access to the MS Windows UI text
rendering engine, as a function that would return a picture of rendered text,
given a string, a font, a size and colors ?

And ideally, without interfering with screen contents ?

Thanks in advance for any guidance,

Boris Borcic

I was looking for ( and still am searching for) similiar functionality.
Specifically I would like to be able to capture a small area of the
screen (a number or a code) and convert this to text that can be used
in my application.

When I asked my question, I was directed to the Microsoft Accessibility
tool kit.
Serach on this list for the post titled;
"Reading text labels from a Win32 window"

I work with wxPython and Win32 applications exclusively.

So if I can be of any help or assistance, please let me know.

Geoff.
 
C

Chris Mellon

I was looking for ( and still am searching for) similiar functionality.
Specifically I would like to be able to capture a small area of the
screen (a number or a code) and convert this to text that can be used
in my application.

When I asked my question, I was directed to the Microsoft Accessibility
tool kit.
Serach on this list for the post titled;
"Reading text labels from a Win32 window"

I work with wxPython and Win32 applications exclusively.

So if I can be of any help or assistance, please let me know.

Geoff.

The OP stated that pywinauto couldn't get at the text, so it's
probably drawn directly with GDI methods rather than being a static
text control. The accessibility toolkit only works if it's a static
text control or the application goes to some lengths to expose the
text to screen readers.
 
B

Boris Borcic

imageguy said:
I was looking for ( and still am searching for) similiar functionality.
Specifically I would like to be able to capture a small area of the
screen (a number or a code) and convert this to text that can be used
in my application.

There is a windows executable version of gnu ocr at
http://jocr.sourceforge.net/download.html that (in combination with screen
capture capability that pywinauto distributes) sort of can do that. An issue is
that it's not exceedingly accurate, for instance it recognizes "2" as "1" (in
the font that er, counts for me). I could probably manage such imprecisions but
I would rather have an exact solution.
....
I work with wxPython and Win32 applications exclusively.

So if I can be of any help or assistance, please let me know.

Geoff.

Thanks for the offer, I will keep it in mind,

Boris Borcic
 
B

Boris Borcic

Chris said:
....
>
> There are actually several different text rendering methods (and 2 or
> more totally different engines) and they will give different results,
> so if you want a fully generic solution that could be quite difficult.
> However, it sounds like this is for a specific purpose.
Indeed.

>
> ...You could also use
> wxPython, which uses what will probably be the right API and will take
> less code than win32 will. I'd suggest this if you aren't familiar
> with the win32 API.

Thanks for your guidance and quick code, I am going to try that.

Boris Borcic
 
B

Boris Borcic

Chris said:
Using either win32 or wxPython you will be able to produce bitmaps
directly, without needing to create a visible window.


Some quick & dirty wxPython code

def getTextBitmap(text, font, fgcolor, bgcolor):
dc = wx.MemoryDC()
dc.SetFont(font)
width, height= dc.GetTextExtent(text)
bmp = wx.EmptyBitmap(width, height)
dc.SelectObject(bmp)
dc.SetBackground(wx.Brush(bgcolor))
dc.Clear()
dc.SetTextBackground(bgcolor)
dc.SetTextForeground(fgcolor)
dc.DrawText(text, 0, 0)
dc.SelectObject(wx.NullBitmap)
return bmp


Raw win32 code will look similar but will be much more verbose.

Thx again for this base.

Quickly testing this, it appears that the result is rendered half a pixel off in
the x-direction. Does this make sense ? Is it possible to position text with
subpixel accuracy ?

Regards, Boris Borcic
 
C

Chris Mellon

Thx again for this base.

Quickly testing this, it appears that the result is rendered half a pixel off in
the x-direction. Does this make sense ? Is it possible to position text with
subpixel accuracy ?

The GDI text api, which is what wx is wrapping here, only provides
pixel accuracy. You are probably seeing a kerning effect from your
chosen font and perhaps the effects of ClearType.
 
D

Dennis Lee Bieber

said:
Thx again for this base.

Quickly testing this, it appears that the result is rendered half a pixel off in
the x-direction. Does this make sense ? Is it possible to position text with
subpixel accuracy ?
Well... as I recall the smallest addressable graphics mode on
Windows is the TWIP... Twenty "twips" per (integer) point; 72 points per
inch (on paper, at least) -- 1440twips per inch.

There may also be anti-aliasing active in the text rendering phase.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
B

Boris Borcic

Chris said:
The GDI text api, which is what wx is wrapping here, only provides
pixel accuracy. You are probably seeing a kerning effect from your
chosen font and perhaps the effects of ClearType.

I am not. Turning antialiasing off (as a desktop setting) changes the rendering
but wx._gdi_ still insists that horizontal coordinates are between pixels (to
the contrary of vertical coordinates). This means thin black vertical lines are
rendered by two pixel columns, the left one red, the right one cyan.
Non-aliased, 90-degree rotated text is still smeared likewise left-to-right on
the screen what becomes top-to-bottom relative to the text. Setting the scales
at 0.5 and drawing the text one pixel off (to express a half-pixel shift)
doesn't work. A long almost vertical thin black line that's one pixel off
top-to-bottom results in two parallel vertical uniformly colored red and cyan
pixel columns, broken in the middle.

In short, wx._gdi_ fights quite hard to enforce what I am trying to avoid :( I
might admire its consistency if it extended to treating both axes similarly...

Regards, Boris Borcic
 
C

Chris Mellon

I am not. Turning antialiasing off (as a desktop setting) changes the rendering
but wx._gdi_ still insists that horizontal coordinates are between pixels (to
the contrary of vertical coordinates). This means thin black vertical lines are
rendered by two pixel columns, the left one red, the right one cyan.
Non-aliased, 90-degree rotated text is still smeared likewise left-to-right on
the screen what becomes top-to-bottom relative to the text. Setting the scales
at 0.5 and drawing the text one pixel off (to express a half-pixel shift)
doesn't work. A long almost vertical thin black line that's one pixel off
top-to-bottom results in two parallel vertical uniformly colored red and cyan
pixel columns, broken in the middle.

In short, wx._gdi_ fights quite hard to enforce what I am trying to avoid :( I
might admire its consistency if it extended to treating both axes similarly...

I have not recently had a need to examine drawn text output this
closely, but I am familiar with the C++ code that implements the
drawing and it's a direct wrapping of win32 GDI calls. If it's not
matching your source text, then the source may be drawn using a
different method or using one of the alternate engines, like GDI+.
 
B

Boris Borcic

Chris said:
I have not recently had a need to examine drawn text output this
closely, but I am familiar with the C++ code that implements the
drawing and it's a direct wrapping of win32 GDI calls. If it's not
matching your source text, then the source may be drawn using a
different method or using one of the alternate engines, like GDI+.

Maybe. In any case, color separation solves my (sub)problem : the blue layer
from the wx generated model matches the green layer from the app's window, pixel
for pixel (at least with antialiasing and cleartype on, while writing black on
white).

Best, Boris Borcic
 
C

Chris Mellon

Maybe. In any case, color separation solves my (sub)problem : the blue layer
from the wx generated model matches the green layer from the app's window, pixel
for pixel (at least with antialiasing and cleartype on, while writing black on
white).

That's... extremely interesting. If it works for you, go for it! If
you're interested in some other things to try, wx.EmptyBitmap takes a
depth parameter you can use to eliminate color.
 
B

Boris Borcic

Chris said:
That's... extremely interesting.

Difficult to believe, you mean :) Well, you are right, somehow I mixed up layer
colors; in the end I compare just the blue layers and it does what I wanted.

Cheers, BB
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top