Wildcard* - matches zero or more characters

* * *

A simple implementation of wildcard "*" (asterisk) in Visual Basic.

Examples:

Gegensatz

a*tz matches atz

g*tz matches gensatz

G*tz matches Gegensatz

Wildcard "*" is very useful in parsing text for machine translation. When German is the source text, for example, it can be used to link two parts of separable verbs that can have any number of words between them, in order to translate the verb correctly.

For example:

Die Situation in (1) stellt einen selbstverstarkenden Prozess dar.

stellt*dar. matches stellt einen selbstverstarkenden Prozess dar.

Die Situation in (1) stellt einen Prozess dar.

stellt*dar. matches stellt einen Prozess dar.

In both these cases, "stellt*dar" can be translated with "is" in English, regardless of what is between "stellt" and "dar," as shown in the examples. Picking up "stellt" on its own ibis no help in translating it, as there are many instances of separable verbs containing "stellt," e.g. "stellt . . . aus" (exhibits), "stellt . . . ein" (adjusts), "stellt . . . hin" (places), "stellt . . . ab" (shuts off), etc. Notice the phrase searched for is "stellt*dar." and not "stellt*dar". The period is needed, otherwise "dar" could be part of another word and not the "dar" that is coupled with "stellt."

Visual Basic Code

' This is the test heading

' Public Sub MAIN()

' End Sub

' This is normal heading

Sub Context_f(IDOutgoing, IDTermSelected)

' Context_f Makro

' Macro to handle word context and subject IDs

' Macro to select terms in glossaries during a search run

' according to the context of the terms and the

' subject hierarchy of the text using subject IDs for terms in the glossaries.

' Passed variables = Outgoing$ containing English terms is returned

' Modified 10 Oct 2005

' to handle multiple rep terms with context and subject IDs

' Modified June 2006

' to introduce the regular expression "/",

' Quote Next Char to blot out "," in a contect word'/phrase ,i.e.

' "an," -> "an/," so that "," is searched and not seen as a control char,

' i.e. the end of a context word/phrase.

' The mods are all to the "Context_e" routine.

' Modified May 2007

' to introduce the wildcard "*" asterisk

' matches zero or more cahracters

' E.g. Gegensatz

' a*tz matches atz

' g*tz matches gensatz

' G*tz matches Gegensatz

' stellt*dar matches stellt in Gegensatz zu dar

' Init vars

Dim Test1, Test2

' Copy IDOutgoing to Output

Output = IDOutgoing

' Clear term selected flag

IDTermSelected = False

' Create Output for test purposes

' Output = "[S01]processing, [S02][S04]conditioning, [S03]working"

' Output = "[W:Garantie, Anspruch][S41]void, [W:LED, Lampe][S01][S23][S02]goes out"

' Output = "[W: stellt*dar/,, stellt*dar.]is, [S02][S04]conditioning, [S03]working"

' Output = "[W: stellt, stellt*dar, stellt*dar.]is, [S02][S04]conditioning, [S03]working"

' Output = "[W:tzzz*, stellt, stellt*dar, stellt*dar.]is, [S02][S04]conditioning, [S03]working"

' Output = "[W:tz*, stellt, stellt*dar, stellt*dar.]is, [S02][S04]conditioning, [S03]working"

LengthOfOutput = Len(Output)

' Mark context phrase

' Test to see if another Extend instr. is needed

Selection.Extend

Marking1 = Selection.Text

Selection.Extend

Marking2 = Selection.Text

If Len(Marking2) - Len(Marking1) < 2 Then

Selection.Extend

End If

' Selection.Extend

' Save Context phrase

ContextPhrase = Selection.Text

' For test purposes:

' ContextPhrase = "Es stellt in Gegensatz zu xyz dar, was es ist."

' ContextPhrase = "und es stellt in Gegensatz zu xyz dar."

' ContextPhrase = "und es stellt in Gegensatz zu xyz dar und war."

' remove marking in text again

Selection.Collapse

Selection.EscapeKey

Selection.MoveLeft Unit:=wdCharacter, Count:=1

' Open batch file for subject ID in "C:/Wechseln"

Open "C:\Wechseln\SubjectIDBatch.txt" For Input As #4

' Any tag words "[W:" in output string?

' Default values

WordNotInPhrase = True

NoMoreContextWords = False

If Output Like "*[[]W:*" = False Then

' No tag words "[W:" in output string

GoTo NoTagWords:

End If

' Default

CurrentStringPos = 1

' Yes, Save position of this word tag "[W:"

WordTagPos = InStr(CurrentStringPos, Output, "[W:")

While WordTagPos <> 0 And WordNotInPhrase = True ' (3)

' Wrong ********* always saves pos of 1st "[W" in string

' and should have exited loop if word found in phrase

' use InStr(Startpos, etc.)

' More context words in output for this word tag?

' Set default values

PreviousChar = "Y"

CurrentChar = "X"

CurrentStringPos = WordTagPos + 3

StartOfContextWord = WordTagPos + 3

While NoMoreContextWords = False And WordNotInPhrase = True ' (2)

' Get next context word

While (CurrentChar <> "]" And CurrentChar <> ",") Or (CurrentChar <> "]" And CurrentChar = "," And PreviousChar = "/") ' (1)

' Loop thru' context word, updating it's variables

' This is wrong if > 1 tag word in output

CurrentStringPos = CurrentStringPos + 1

CurrentChar = Mid(Output, CurrentStringPos, 1)

' Store previous char

PreviousChar = Mid(Output, CurrentStringPos - 1, 1)

Wend ' Get context word (1)

' Store length of context word

LengthOfContextWord = (CurrentStringPos) - StartOfContextWord

' Extract context word from input search string

' Store context word

ContextWord = Mid(Output, StartOfContextWord, LengthOfContextWord)

' Check if context word contains "/". If so, remove it, get position

' of "/" in context word, and adjust length of context word

SlashPos = InStr(ContextWord, "/")

If SlashPos <> 0 Then

' "/" found

' Save string up to "/"

FirstPart = Left(ContextWord, SlashPos - 1)

' Save string after "/"

SecondPart = Right(ContextWord, LengthOfContextWord - SlashPos)

' Combine both parts to make new context word

' i.e. without "/"

ContextWord = FirstPart + SecondPart

' Store length of new context word

LengthOfContextWord = Len(ContextWord)

End If

' Does context word contain "*"?

AsteriskPos = InStr(ContextWord, "*")

If AsteriskPos = 0 Then

' "*" not found

' Is this context word in context phrase?

WordInPhrase = InStr(ContextPhrase, ContextWord)

If WordInPhrase = 0 Then

' No, set Word not in phrase flag

WordNotInPhrase = True

' Go to common logic for context word not in context phrase

GoTo CharsNotFound:

Else

GoTo MatchFound: ' exit loop

End If ' Enf of context word in phrase question

End If ' End of Context word contains "*" question

' Chars before "*"?

If AsteriskPos > 1 Then

' Yes, chars before "*"

' Store these chars in FrontChars string

FrontChars = Left(ContextWord, AsteriskPos - 1)

' FrontChars found in context phrase?

FrontCharsInPhrase = InStr(ContextPhrase, FrontChars)

If FrontCharsInPhrase = 0 Then

' No, Got to common logic for context word, front chars or back chars not found in context phrase

GoTo CharsNotFound:

End If

' Store position of front chars in context phrase.

' If there are no front chars, then the position of the "*" will

' be stored

PosOfFrontCharsInPhrase = StartOfContextWord

' Are there chars after "*"

If AsteriskPos < LengthOfContextWord Then

' ThisChar = Mid(ContextWord, AsteriskPos + 1, 1)

' If ThisChar <> "," And ThisChar <> "/" Then

' Yes, store chars after "*" in BackChars string

BackChars = Right(ContextWord, LengthOfContextWord - AsteriskPos)

Else

GoTo MatchFound: ' exit loop

End If

' Back chars found after front chars in context phrase?

BackCharsInPhrase = InStr(PosOfFrontCharsInPhrase, ContextPhrase, BackChars)

If BackCharsInPhrase <> 0 Then

' Yes

GoTo MatchFound: ' exit loop

End If

' Context word, front chars or back chars not found in context phrase

CharsNotFound:

' Store position (Pos0) of tag word in output string

Pos0 = WordTagPos + 3

' Set Word not in phrase flag

WordNotInPhrase = True

' If current char is comma, first see if the previous char is "/"

' (Quote Next Char), then comma is part of context word/phrase, else

' point to next char and set CurrentChar = default value

If CurrentChar = "," And PreviousChar <> "/" Then

CurrentStringPos = CurrentStringPos + 1

CurrentChar = "X"

StartOfContextWord = StartOfContextWord + LengthOfContextWord + 2

ElseIf CurrentChar = "]" Then

NoMoreContextWords = True

End If

Wend ' Search for more context words (2)

WordTagPos = InStr(CurrentStringPos, Output, "[W:")

NoMoreContextWords = False

Wend ' More tag words in output (3)

GoTo NoTagWords:

' ???? problem here - this is also branch 1 ???

' Context word in context phrase

' Get term associated with this context word,

' merge with common logic for context word and subject ID

' Get position of next "]", i.e. end of context word tag section

' (PosWordTagEnd)

MatchFound:

n = 0

char = "X"

While char <> "]"

' Get next char from Output until ] found

n = n + 1

char = Mid(Output, Pos0 + n, 1)

Wend

' Pos0 + n is position of next "]"

' Store in PosWordTagEnd

PosWordTagEnd = Pos0 + n

' Check if next char <> "["

If Mid(Output, PosWordTagEnd + 1, 1) <> "[" Then

' This is the term

' Store this position in Pos1

Pos1 = PosWordTagEnd + 1

' Goto CommonlogicB: (with subject ID logic)

GoTo CommonlogicB:

Else

' Next char = "[" ' this is a subject ID.

' Merge with Common logic A (subject ID logic)

Pos0 = PosWordTagEnd + 1

GoTo CommonlogicA:

End If

' No tag words "[W:}" found in output string or

' tag words not in context phrase

NoTagWords:

' Look for a subject ID in Output, i.e. [S04]

Test1 = Output Like "*[[]S##[]]*"

' a subject ID found?

If Test1 = False Then

' No

Close #4

Exit Sub

End If

' Yes

' Look for a subject ID in the batch file, i.e. S01, S02, etc.

' End of batch file reached?

Test3 = False

While EOF(4) = False

' No , get next subject ID from batch file, e.g. S03

BatchID = Input(3, #4)

ModBatchID = "*" + "[[]" + BatchID + "[]]" + "*"

Test3 = Output Like ModBatchID

' Read "," from batch file

' Need to check here that not EOF batch file, Feb 21, 2004

Dummy = Input(1, #4)

' If comma not read in, and Subject ID not found

' then EOF Batch file and exit

If Dummy <> "," And Test3 = False Then

Close #4

Exit Sub

End If

If Test3 = True Then

GoTo BatchIDFound:

End If

Wend

' No BatchID from file found in Output and EOF(4)

' simply exit sub

Close #4

Exit Sub

' Batch file subject ID found in Output string

' Look for [ID]

BatchIDFound:

' Create [ID]

ID = "[" + BatchID + "]"

' Store position (Pos0) of ID in Output string

Pos0 = InStr(Output, ID)

' Get term associated with this ID from Output

' Merge with contxt word logic

CommonlogicA:

' Check if there is more than one ID for this term

' Get position of next "]" not followed by "[",

' i.e. start of term

' Init vars

n = 0

char = "["

While char = "["

' Get next char from Output until not "["

n = n + 5

char = Mid(Output, Pos0 + n, 1)

Wend

' Pos0 + n is position of start of term

' Store in Pos1

Pos1 = Pos0 + n

CommonlogicB:

' Get position of word tag or ID for subsequent term

' in output string (Pos2)- look for "[" or ","

PosX = InStr(Pos1 + 1, Output, "[")

PosY = InStr(Pos1 + 1, Output, ",")

If PosX = 0 And PosY = 0 Then

' no "[" or "," found, i.e. no more word tags/term IDs

' or loose terms in output string after this one

' Then use length of string

Pos2 = 0

' Pos2 = InStr(Pos1 + 1, Output, "[")

' Danger of bad syntax here if there is no "]" for example!

' There may be no more word tags/IDs, then use string length

' Check if there are any more word tags/IDs in output

' after this one

' If Pos2 = 0 Then

' "[" not found, i.e. no more word tags/term IDs in string

' after this one - then use length of string

' Calculate term length

Termlength = LengthOfOutput - (Pos1 - 1)

' Get term and store in selected variable

IDOutgoing = Mid(Output, Pos1, Termlength)

' Add "?, " for test

IDOutgoing = "?, " + IDOutgoing

' Set term selected flag

IDTermSelected = True

' End of subroutine

Close #4

Exit Sub

End If

' There are more new word tags/term IDs in output string

' after this one

' Calculate Pos2

If PosX <> 0 Then

Pos2 = PosX

End If

If PosY <> 0 Then

Pos2 = PosY

End If

' Point to end of term for this word tag/ID (Pos3)

' Pos3 = Pos2 - 2

' Set up length of term

' Termlength = Pos3 - Pos1

Termlength = Pos2 - Pos1

' Get term and store in selected variable

IDOutgoing = Mid(Output, Pos1, Termlength)

' Add "?, " for test

IDOutgoing = "?, " + IDOutgoing

' Set term selected flag

IDTermSelected = True

Close #4

End Sub

Back to Visual Basic Tutorial:

Like some details on the programmer?