>
   MSDN Home >  MSDN Library >  User Interface Design and Development >  Windows Controls >  Individual Control Information >  Rich Edit Controls >  Text Object Model > 

About Text Object Model


The top-level Text Object Model (TOM) object is defined by the ITextDocument interface, which has methods for creating and retrieving objects lower in the object hierarchy. For simple, plain text processing, you can obtain an ITextRange object from an ITextDocument object.

If you need to add rich-text formatting, you can obtain ITextFont and ITextPara objects from an ITextRange object:

  • ITextFont provides the programming equivalent of the Microsoftฎ Word Font dialog box.
  • ITextPara provides the equivalent of the Microsoft Word Paragraph dialog box.

In addition to these three lower-level objects, TOM has a selection object ITextSelection, which is just an ITextRange object with selection highlighting and additional UI-oriented methods. The range and selection objects include screen-oriented methods that enable programs to examine text on the screen or text that could be scrolled onto the screen. These capabilities aid in making text accessible to the visually impaired, for example.

These objects can be illustrated by the following directory tree:

ITextDocument             Top-level editing object
    ITextStoryRanges      Enumerator for stories in document
    ITextRange            Primary text interface: range of text
        ITextFont         Character-attribute interface
        ITextPara         Paragraph-attribute interface
    ITextSelection        Screen highlighted text range that
                          inherits all ITextRange methods

A story is a contiguous range of text. An ITextDocument object describes one or more stories. In Microsoft Word, a story contains one of the various parts of a document, such as the main text of a document, headers and footers, footnotes, or annotations. In rich edit controls, there is only one story per document, although a client can use multiple documents to represent multiple stories.

An ITextRange object is defined by its start and end character position (cp) offsets and a story object. As such, it does not exist independently of its parent story object, although its text can be copied to the Clipboard or to other targets. A text range object is different from spreadsheet and other range objects, which are defined by other kinds of offsets, like row/column or graphics position (x,y). A text range object can:

  • Modify itself in various ways.
  • Return a duplicate of itself.
  • Copy its start and end character positions and its story pointer to the current selection.

Note that an explicit story object is not needed, since an ITextRange object can always be created to represent any given story. In particular, the ITextDocument object can create an ITextStoryRanges object to enumerate the stories in the document in terms of ranges with start and end cp values that describe complete stories (such as, 0 and tomForward).

The following topics are discussed in this section.

TOM RTF

In Text Object Model (TOM) 1.0, rich-text exchange can be accomplished by sets of explicit method calls or by transfers of rich text in the Rich Text Format (RTF) format. This section gives tables of RTF control words for paragraph properties and for character properties.

TOM RTF Paragraph Control Words
Control wordMeaning
\ fi nFirst-line indent (the default is zero).
\ keep Keep paragraph intact.
\ keepn Keep with the next paragraph.
\ li nLeft indent (the default is zero).
\ noline No line numbering.
\ nowidctlpar Turn off widow/orphan control.
\ pagebb Break page before paragraph.
\ par New paragraph.
\ pard Resets to default paragraph properties.
\ ql Left aligned (the default).
\ qr Right aligned.
\ qj Justified.
\ qc Centered.
\ ri nRight indent (the default is zero).
\ s nStyle n.
\ sa nSpace after (the default is zero).
\ sb nSpace before (the default is zero).
\ sl nIf missing or if n=1000, line spacing is determined by the tallest character in the line (single-line spacing); if n> zero, at least this size is used; if n is < zero, exactly |n| is used. The line spacing is multiple-line spacing if \slmult 1 follows.
\ slmult mFollows \sl. m = zero: At Least or Exactly line spacing as described by \ sl n. m = 1: line spacing = n/240 times single-line spacing.
\ tb nBar tab position, in twips, from the left margin.
\ tldot Tab leader dots.
\ tleq Tab leader equal sign.
\ tlhyph Tab leader hyphens.
\ tlth Tab leader thick line.
\ tlul Tab leader underline.
\ tqc Centered tab.
\ tqdec Decimal tab.
\ tqr Flush-right tab.
\ tx nTab position, in twips, from the left margin.

TOM RTF Character Format Control Words
Control wordMeaning
\ animation nSets animation type to n.
\ b Bold.
\ caps All capitals.
\ cf nForeground color (the default is tomAutocolor).
\ cs nCharacter style n.
\ dn nSubscript position in half-points (the default is 6).
\ embo Embossed.
\ f nFont number, n refers to an entry in the font table.
\ fs nFont size in half-points (the default is 24).
\ highlight nBackground color (the default is tomAutocolor).
\ i Italic.
\ impr Imprint.
\ lang nApplies a language to a character. n is a number corresponding to a language. The \ plain control word resets the language property to the language defined by \ deflangn in the document properties.
\ nosupersub Turns off superscript or subscript.
\ outl Outline.
\ plain Resets character formatting properties to a default value defined by the application. The associated character-formatting properties (described in the section Associated Character Properties in the RTF specification) are also reset.
\ scaps Small capitals.
\ shad Shadow.
\ strike Strikethrough.
\ sub Applies subscript to text and reduces point size according to font information.
\ super Applies superscript to text and reduces point size according to font information.
\ ul Continuous underline. \ ul0 turns off all underlining.
\ uld Dotted underline.
\ uldb Double underline.
\ ulnone Stops all underlining.
\ ulw Word underline.
\ up nSuperscript position in half-points (the default is 6).
\ v Hidden text.

Finding Rich Text

TOM methods can be used fairly easily to find rich text as defined by a range of text. The Microsoft Word Find command can find plain text with uniform formatting, such as Arial 11-point, but it cannot find text with a combination of formatting, such as a2. Finding such rich text exactly is often needed in word processing, although it has never been fulfilled in a "what you see is what you get" (WYSIWYG) word processor. There is clearly a larger domain of rich-text matching that allows for some character formatting properties to be ignored (or to include paragraph formatting and/or object content), but such generalizations are beyond the scope of this section.

One purpose for this functionality is to use a rich-text Find dialog box to define the rich text you want to locate in a Word document. The dialog box would be implemented using a rich edit control and TOM methods would be used to carry out the search through the Word document. You could either copy the desired rich text from a Word document into the Find dialog box or enter and format it directly in the Find dialog box.

The following program uses TOM methods to find text containing combinations of exact character formatting. The algorithm searches for the plain text in the match range, which is named pr1. If the plain text is found, it is pointed to by a trial range, which is named pr2. Then, two insertion-point ranges (prip1 and prip2) are used to walk through the trial range comparing its character formatting to that of pr1. If they match exactly, the input range (given by ppr) is updated to point at the trial range's text and the function returns the count of characters in the matched range. Two ITextFont objects, pf1 and pf2, are used in the character-formatting comparison. They are attached to the insertion-point ranges prip1 and prip2, respectively.

LONG FindRichText (
    ITextRange **ppr,             // Ptr to range to search
    ITextRange *pr1)              // Range with rich text to find
{
    BSTR        bstr;             // pr1 plain-text to search for
    LONG        cch;              // Text string count
    LONG        cch1, cch2;       // tomCharFormat run char counts
    LONG        cchMatch = 0;     // Nothing matched yet
    LONG        cp;               // Handy char position
    LONG        cpFirst1;         // pr1 cpFirst
    LONG        cpFirst2;         // pr2 cpFirst
    ITextFont  *    pf1, *pf      // Fonts corresponding to IPs prip1 and prip2
    ITextRange *pr2;              // Range duplicate to search with
    ITextRange *prip1, *prip      // Insertion points to walk pr1, pr2

    if (!ppr || !*ppr || !pr1)
        return E_INVALIDARG;

    // Initialize range and font objects used in search
    if ((*ppr)->GetDuplicate(&pr2)    != NOERROR ||
        pr1->GetDuplicate(&prip1)     != NOERROR ||
        pr2->GetDuplicate(&prip2)     != NOERROR ||
        prip1->GetFont(&pf1)          != NOERROR ||
        prip2->GetFont(&pf2)          != NOERROR ||
        pr1->GetText(&bstr)           != NOERROR )
    {
        return E_OUTOFMEMORY;
    }

    pr1->GetStart(&cpFirst1);

    // Keep searching till rich text is matched or no more plain-text hits
    while(!cchMatch && pr2->FindText(bstr, tomForward, 0, &cch) == NOERROR)
    {
        pr2->GetStart(&cpFirst2);                 // pr2 is a new trial range
        prip1->SetRange(cpFirst1, cpFirst1);      // Set up IPs to scan match
        prip2->SetRange(cpFirst2, cpFirst2);      //  and trial ranges

        while(cch > 0 &&
            pf1->IsEqual(pf2, NULL) == NOERROR)   // Walk match & trial ranges
        {                                         //  together comparing font
            prip1->GetStart(&cch1);               //  properties
            prip1->Move(tomCharFormat, 1, NULL);
            prip1->GetStart(&cp);
            cch1 = cp - cch1;                     // cch of next match font run

            prip2->GetStart(&cch2);
            prip2->Move(tomCharFormat, 1, NULL);
            prip2->GetStart(&cp);
            cch2 = cp - cch2;                      // cch of next trial font run

            if(cch1 < cch)                         // There is more to compare
            {
                if(cch1 != cch2)                   // Different run lengths:
                    break;                         //  no formatting match
                cch = cch - cch1;                  // Matched format run
            }
            else if(cch2 < cch)                    // Trial range format run too
                break;                             //  short

            else                                   // Both match and trial runs
            {                                      //  reach at least to match
                pr2->GetEnd(&cp);                  //  text end: rich-text match
                (*ppr)->SetRange(cpFirst2, cp)     // Set input range to hit
                cchMatch = cp - cpFirst2;          //  coordinates and return
                break;                             //  length of matched string
            }
        }
    }
    pr2->Release();
    prip1->Release();
    prip2->Release();
    pf1->Release();
    pf2->Release();
    SysFreeString(bstr);

    return cchMatch;
}

TOM Accessibility

TOM provides accessibility support through the ITextSelection and ITextRange interfaces. This section describes methods that are useful for accessibility as well as how a program can determine the x, y screen position of an object.

Since UI-based accessibility programs typically work with the screen and the mouse, a common concern is to find the corresponding ITextDocument interface for the current mouse location (in screen coordinates). The following sections present two ways to determine the proper interface:

For more information, see the Microsoft Active Accessibilityฎ specification. After you obtain an object from a screen position, you can use for an ITextDocument interface and call the ITextDocument::RangeFromPoint method to get an empty range object at the cp corresponding to the screen position.

Interface from Running Object Table

A running object table (ROT) tells what object instances are active. By querying this table, you can accelerate the process of connecting a client to an object when the object is already running. Before programs can access TOM interfaces through the running object table, a TOM instance with a window needs to register in the ROT using a moniker. You construct the moniker from a string containing the hexadecimal value of its HWND. The following code sample shows how to do this.

// This TOM implementation code is executed when a new windowed 
// instance starts up. 
// Variables with leading underscores are members of this class.

HRESULT hr;
OLECHAR szBuf[10];            // Place to put moniker
MONIKER *pmk;

hr = StringCchPrintf(szBuff, 10, "%x", _hwnd);
if (FAILED(hr))
{
	//
	// TODO: write error handler
	//
}
CreateFileMoniker(szBuf, &pmk);
OleStdRegisterAsRunning(this, pmk, &_dwROTcookie);
....................
 
// Accessibility Client: 
//    Find hwnd for window pointed to by mouse cursor.

GetCursorPos(&pt);
hwnd = WindowFromPoint(pt);

// Look in ROT (running object table) for an object attached to hwnd

hr = StringCchPrintf(szBuff, 10, "%x", hwnd);
if (FAILED(hr))
{
	//
	// TODO: write error handler
	//
}
CreateFileMoniker(szBuf, &pmk);
CreateBindContext(0, &pbc);
pmk->BindToObject(pbc, NULL, IID_ITextDocument, &pDoc);
pbc->Release();

if( pDoc )
{
    pDoc->RangeFromPoint(pt.x, pt.y, &pRange);
    // ...now do whatever with the range pRange
}

Interface from Window Messages

The EM_GETOLEINTERFACE message provides another way to obtain an IUnknown interface for an object at a given screen position. As described in the Interface from Running Object Table topic, you get an HWND for the screen position and then send this message to that HWND. The EM_GETOLEINTERFACE message is rich edit-specific and returns a pointer to an IRichEditOle interface in the variable addressed by lParam.

Tip If a pointer is returned (be sure to set the object to which lParam points to null before sending the message), you can call its IUnknown::QueryInterface method to obtain an ITextDocument interface. The following code sample illustrates this approach.

    HWND    hwnd;
    ITextDocument *pDoc;
    ITextRange *pRange;
    POINT    pt;
    IUnknown *pUnk = NULL;
	
    GetCursorPos(&pt);
    hwnd = WindowFromPoint(pt);
    SendMessage(hwnd, EM_GETOLEINTERFACE, 0, (LPARAM)&pUnk);
    if(pUnk && 
        pUnk->QueryInterface(IID_ITextDocument, &pDoc) == NOERROR)
    {
        pDoc->RangeFromPoint(pt.x, pt.y, &pRange);
        //  ... continue with rest of program
    }

Accessibility Oriented Methods

Some TOM methods are particularly useful for navigating around the screen, while other TOM methods enhance what you can do when you arrive at places of interest. The following table describes the most useful methods.

MethodHow it promotes accessibility
ITextDocument::GetSelectionThis method gets the active selection that can be used for a variety of view-oriented purposes, such as highlighting text and scrolling.
ITextDocument::RangeFromPointWhen used on an active selection, this method is guaranteed to get a range associated with a particular view.
ITextRange::ExpandEnlarges a text range so that any partial units it contains are completely contained. For example, Expand(tomWindow) expands the range to include the visible portion of the range's story.
ITextRange::GetDuplicateWhen used on an active selection, this method is guaranteed to get a range associated with a particular view. See the description of ITextDocument::RangeFromPoint.
ITextRange::GetPointGets the screen coordinates for the start or end character position in the text range.
ITextRange::ScrollIntoViewScrolls a text range into view.
ITextRange::SetPointSelects text at or up through a specified point.

Character Match Sets

The variant parameter of the various Move* methods in ITextRange, for example, ITextRange::MoveWhile and ITextRange::MoveUntil, can take an explicit string or a character-match set 32-bit index. The indexes are defined by either Unicode ranges or GetStringTypeEx character sets. The Unicode range starting at n and of length l (< 32768) is given by the index n + (l << 16) + 0x80000000. For example, basic Greek letters are defined by CR_Greek = 0x805f0370 and printable ASCII characters are defined by CR_ASCIIPrint = 0x805e0020. In addition, the ITextRange::MoveWhile and ITextRange::MoveUntil methods let you rapidly bypass a span of characters in any GetStringTypeEx character set or a span of characters not in any one of these character sets, respectively. In spite of their power, these methods are very efficient, requiring only 150 lines of C++ to implement.

The GetStringTypeEx sets are specified by the values for Ctype1, Ctype2, and Ctype3 and are defined as follows.

CsetMeaning
Ctype1Combination of CT_CTYPE1 types.
Ctype2 + tomCType2Any CT_CTYPE2 type.
Ctype3 + tomCType3Combination of CT_CTYPE3 types.

Specifically, Ctype1 can be any combination of the following.

Ctype1 nameValueMeaning
C1_UPPER0x0001Uppercase.
C1_LOWER0x0002Lowercase.
C1_DIGIT0x0004Decimal digits.
C1_SPACE0x0008Space characters.
C1_PUNCT0x0010Punctuation.
C1_CNTRL0x0020Control characters.
C1_BLANK0x0040Blank characters.
C1_XDIGIT0x0080Hexadecimal digits.
C1_ALPHA0x0100Any linguistic character (alphabetic, syllabary, or ideographic).
C1_DEFINED0x0200A defined character, but not one of the other C1_* types.

The Ctype2 types support proper layout of Unicode text. The direction attributes are assigned so that the bidirectional layout algorithm standardized by Unicode produces accurate results. These types are mutually exclusive. For more information about the use of these attributes, see The Unicode Standard: Worldwide Character Encoding, Volumes 1 and 2, Addison-Wesley Publishing Company: 1991, 1992.

CType2 nameValueMeaning
Strong:
C2_LEFTTORIGHT0x1Left to right.
C2_RIGHTTOLEFT0x2Right to left.
Weak:
C2_EUROPENUMBER0x3European number, European digit.
C2_EUROPESEPARATOR0x4European numeric separator.
C2_EUROPETERMINATOR0x5European numeric terminator.
C2_ARABICNUMBER0x6Arabic number.
C2_COMMONSEPARATOR0x7Common numeric separator.
Neutral:
C2_BLOCKSEPARATOR0x8Block separator.
C2_SEGMENTSEPARATOR0x9Segment separator.
C2_WHITESPACE0xAWhite space.
C2_OTHERNEUTRAL0xBOther neutrals.
Not applicable:
C2_NOTAPPLICABLE0x0No implicit direction.

The Ctype3 types are intended to be placeholders for extensions to the POSIX types required for general text processing or for the standard C library functions. These types are supported in Microsoft Windows NTฎ, Microsoft Windowsฎ 2000, and Windows XP.

CType3 nameValueMeaning
C3_NONSPACING0x1Nonspacing mark.
C3_DIACRITIC0x2Diacritic nonspacing mark.
C3_VOWELMARK0x4Vowel nonspacing mark.
C3_SYMBOL0x8Symbol.
C3_KATAKANA0x10Katakana character.
C3_HIRAGANA0x20Hiragana character.
C3_HALFWIDTH0x40Half-width character.
C3_FULLWIDTH0x80Full-width character.
C3_IDEOGRAPH0x100Ideographic character.
C3_KASHIDA0x200Arabic Kashida character.
C3_ALPHA0x8000All linguistic characters (alphabetic, syllabary, and ideographic).
C3_NOTAPPLICABLE0x0Not applicable.

An Edit Development Kit (EDK) could include pVar index #defines for the following ranges described in the Unicode Standard.

Character set Unicode RangeCharacter setUnicode Range
ASCII0x0—0x7fANSI0x0—0xff
ASCIIPrint0x20—0x7eLatin10x20—0xff
Latin1Supp0xa0—0xffLatinXA0x100—0x17f
LatinXB0x180—0x24fIPAX0x250—0x2af
SpaceMod0x2b0—0x2ffCombining0x300—0x36f
Greek0x370—0x3ffBasicGreek0x370—0x3cf
GreekSymbols0x3d0—0x3ffCyrillic0x400—0x4ff
Armenian0x530—0x58fHebrew0x590—0x5ff
BasicHebrew0x5d0—0x5eaHebrewXA0x590—0x5cf
HebrewXB0x5eb—0x5ffArabic0x600—0x6ff
BasicArabic0x600—0x652ArabicX0x653—0x6ff
Devangari0x900—0x97fBengali0x980—0x9ff
Gurmukhi0xa00—0xa7fGujarati0xa80—0xaff
Oriya0xb00—0xb7fTamil0xb80—0xbff
Teluga0xc00—0xc7fKannada0xc80—0xcff
Malayalam0xd00—0xd7fThai 0xe00—0xe7f
Lao0xe80—0xeffGeorgianX0x10a0—0xa0cf
BascGeorgian0x10d0—0x10ffJamo0x1100—0x11ff
LatinXAdd0x1e00—0x1effGreekX0x1f00—0x1fff
GenPunct0x2000—0x206fSuperscript0x2070—0x207f
Subscript0x2080—0x208fSuperSubscript0x2070—0x209f
Currency0x20a0—0x20cfCombMarkSym0x20d0—0x20ff
LetterLike0x2100—0x214fNumberForms0x2150—0x218f
Arrows0x2190—0x21ffMathOps0x2200—0x22ff
MiscTech0x2300—0x23ffCtrlPictures0x2400—0x243f
OptCharRecog0x2440—0x245fEnclAlphaNum0x2460—x24ff
BoxDrawing0x2500—0x257fBlockElement0x2580—0x259f
GeometShapes0x25a0—0x25ffMiscSymbols0x2600—0x26ff
Dingbats0x2700—0x27bfCJKSymPunct0x3000—0x303f
Hiragana0x3040—0x309fKatakana0x30a0—0x30ff
Bopomofo0x3100—0x312fHangulJamo0x3130—0x318f
CJLMisc0x3190—0x319fEnclCJK0x3200—0x32ff
CJKCompatibl0x3300—0x33ffHan0x3400—0xabff
Hangul0xac00—0xd7ffUTF16Lead0xd800—0xdbff
UTF16Trail0xdc00—0xdfffPrivateUse0xe000—0xf800
CJKCompIdeog0xf900—0xfaffAlphaPres0xfb00—0xfb4f
ArabicPresA0xfb50—0xfdffCombHalfMark0xfe20—0xfe2f
CJKCompForm0xfe30—0xfe4fSmallFormVar0xfe50—0xfe6f
ArabicPresB0xfe70—0xfefeHalfFullForm0xff00—0xffef
Specials0xfff0—0xfffd  

TOM Interface Conventions

In using Microsoft Visual Basicฎ for Applications (VBA)-compatible dual interfaces, all TOM methods return HRESULT values. In general, TOM uses standard values, namely:

  • E_OUTOFMEMORY
  • E_INVALIDARG
  • E_NOTIMPL
  • E_FILENOTFOUND
  • E_ACCESSDENIED
  • E_FAIL
  • CO_E_RELEASED
  • NOERROR (this is the same as S_OK)
  • S_FALSE

While it is possible to use more specific, custom values, they would complicate documentation and usage. Note that if the editing instance associated with a TOM object such as ITextRange is deleted, then the TOM object becomes useless, and all its methods return CO_E_RELEASED.

In addition to the HRESULT return values, many methods include out parameters, which are pointers used to return values. In particular, all VBA property-get methods have such an out parameter. As for all interfaces, all pointer (ptr) parameters must be checked to be nonzero before use. Required pointers passed with null values cause the method to return E_INVALIDARG. Optional out pointers with null values are ignored. Such optional out pointers include the pDelta and pB parameters of ITextRange.

Use methods with Get- and Set- prefixes respectively, to get and set properties. Get and Set are preferred from a C/C++ perspective. Boolean variables use the explicit values tomFalse = zero for FALSE and tomTrue = –1 for TRUE (to agree with VBA).

Reference pages for some TOM interface methods use the phrases "property-put method" and "property-get method" to identify methods that set and retrieve a VBA property. VBA code can use these methods or access the property directly. For example, if tf is an ITextFont object, the following VBA code is equivalent to calling the ITextFont::SetBold method:

    tf.Bold = tomTrue

TOM constants all begin with the prefix tom, for example tomWord, and where possible, agree with the corresponding Word constants, which begin with the prefix WD. Please see the Tom.odh file for Object Description Language (ODL) prototypes of the TOM methods as well as a complete listing of the TOM constants. Also, the files Tomstub.cpp and Tomstub.h give the headers and dummy bodies for all TOM methods needed to facilitate a TOM implementation.

To obtain clean C/C++ interfaces and to simplify TOM implementations, take advantage of the VBA strongly typed optional (missing) arguments. For example, an optional Count argument can be declared to be a long value and assigned the default value of 1 in the event that the VBA code omits the Count argument. Since previous versions of VBA need to use VARIANT * for optional arguments, this means that all TOM arguments must be given for use with these older versions. Linear dimensions are given in floating-point points, which is the VBA standard for Word and Microsoft Excel. Use the C language–float data type.

 Contact Us   |  E-Mail this Page   |  MSDN Flash Newsletter   |  Legal
 © 2003 Microsoft Corporation. All rights reserved.   Terms of Use  Privacy Statement   Accessibility