Python Spell Checker (By Peter Norvig)

If you are doing any kind of work with text in Blender it’s probable that you wish you had a built in spell checker.

I started working on a simple text checker, to check if a word was a real word or an error before I readthis excellent blog by Peter Norvig, and then I abandoned my efforts and picked up his script and decided to work with that.


It’s pretty simple, I’ll be working on it more as part of my project, hopefully making something that is more friendly to use.

For starters here’s my basic rewrite of Peter’s script to work in Blender:
text_checker_01.blend (400 KB)

You need to download the the “big.txt” file he links in his blog, or any large text file, your top 5 favorite books combined would be a good one, since it’s more likely to use words which you know and use regularly (in your own dialect). The text file needs to be in the same folder as your Blend.

Just run the file, and start typing. There may be a slight delay while it builds the dictionary object in the first tic, and then it takes a few milliseconds to look up a word, so typing is a little slow, but it works great.

Some lines had to be changed since we don’t want to run the trainer every time we write a new word, just once on initiations. I wanted to save the created dictionary to disk, but it uses a function that makes it impossible (ASAIK) to pickle it.

I’d like to see if anyone else can find a use for it, you can use it to suggest a single word, or break up a string and check each word in turn. It really needs some kind of UI to allow you to choose when to replace a word or not. That’s something I’m going to be working on as part of my current project.

Feel free to modify it to your own use, or clean up my own mess to get it to look nicer if you have the time. :slight_smile:

All acknowledgments to Peter Norvig for his excellent script.

1 Like

I tried the spellchecker you posted and it works pretty well.

Thanks :slight_smile: I’m an English teacher, so I’d like to think people cared about getting their spelling and things correct, even though I don’t most of the time. :stuck_out_tongue:

Quite fun, works really fast.

http://puu.sh/eOLLP/7ef9fded92.png

Setting the default_factory to int makes the defaultdict useful for counting (like a bag or multiset in other languages).
( https://docs.python.org/2/library/collections.html#defaultdict-examples )


def train(features):
    model = collections.defaultdict( int )
    for f in features:
        model[f] += 1
    return model




def build_model(own):


    source_name = bge.logic.expandPath("//big.txt")
    
    source_file = open(source_name, 'rb')
    
    source_text = source_file.read()
    
    NWORDS = train(words(str(source_text)))            
       
    
    import pickle    
    
    dump = pickle.dumps(  NWORDS ) 
    
    own['NWORDS'] = pickle.loads(dump)

Next task would be to compress it…

You can also use the more concise collections.Counter object :slight_smile:

Added conservation of word case (title words are also unmodified as assumed proper nouns), as well as punctuation.


import string
import re, collections, bge




def copy_nature(word, corrected_word):


    fixed_capitals_and_punctuation = []
    source_chars = iter(word)
    correction_chars = iter(corrected_word)
    print(word, corrected_word)
    
    try:
        source = next(source_chars)
        replacement = next(correction_chars)
        
        while True:
            if source.isupper():
                fixed_capitals_and_punctuation.append(replacement.upper())
            
            else:
                fixed_capitals_and_punctuation.append(replacement)
                
            source = next(source_chars, "")
            replacement = next(correction_chars, "")
            
            if not replacement:
                break
                
    except StopIteration:
        pass
    
    for char in reversed(word):
        if char in string.punctuation:
            fixed_capitals_and_punctuation.insert(len(fixed_capitals_and_punctuation), char)
    
    result = ''.join(fixed_capitals_and_punctuation)
    
    return result




def display(cont):
    own = cont.owner
    
    if not own.get('ini'):
                
        own['original_text_ob'] = [ob for ob in own.children if ob.get("original_string")][0]
        own['original_text_ob']['Text'] = "loading dictionary...."
        
        own['new_text_ob'] = [ob for ob in own.children if ob.get("new_string")][0]
        own['new_text_ob']['Text'] = "PLEASE WAIT"        
                
        build_model(own)        
                
        own['ini'] = True
    
    split_string = own['key_log'].split()
    
    final_sentence = []    
    
    for word in split_string:
        trans = str.maketrans("", "", string.punctuation)
        correction_source = word.lower().translate(trans)
        corrected_word = correct(correction_source, own['NWORDS'])
        
        if corrected_word and not word.istitle():
            corrected_word = copy_nature(word, corrected_word)           
            
            final_sentence.append(corrected_word)
            
        else:
            final_sentence.append(word)   
    
    new_string = " ".join(final_sentence)    
    
    own['original_text_ob']['Text'] = own['key_log']    
    own['new_text_ob']['Text'] = new_string
    
##########################################################


def words(text): 
    return re.findall('[a-z]+', text.lower()) 


def train(features):
    return collections.Counter(features)


def edits1(word):
    alphabet = 'abcdefghijklmnopqrstuvwxyz' 
    
    splits  = [(word[:i], word[i:]) for i in range(len(word) + 1)]
    deletes  = [a + b[1:] for a, b in splits if b]
    transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
    replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
    inserts  = [a + c + b    for a, b in splits for c in alphabet]
    return set(deletes + transposes + replaces + inserts)


def known_edits2(word,NWORDS):
    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)


def known(words,NWORDS): 
    return set(w for w in words if w in NWORDS)


def correct(word,NWORDS):
    candidates = known([word],NWORDS) or known(edits1(word),NWORDS) or known_edits2(word,NWORDS) or [word]
    return max(candidates, key=NWORDS.get)    


def build_model(own):


    source_name = bge.logic.expandPath("//big.txt")
    
    source_file = open(source_name, 'rb')
    
    source_text = source_file.read()
    
    NWORDS = train(words(str(source_text)))
            
    
    own['NWORDS'] = NWORDS



Thanks Agoose77, I think the original author of the script was trying to do one of those “maximum functionality in minimum number of lines” things which are good for grabbing headlines.
You’ll notice that some of the lines in his script are a little strange, with def and return on the same line…

Any way in which it can be improved while retaining speed of execution is great.

That looks great! You said you were english teacher?

Yes, in South Korea.
More of a language instructor really. :stuck_out_tongue: I have to trick kids in to learning English by making them play games all day.
They think they are just playing, but really they are learning by stealth.

I’d like to make a living one day by making educational games, so a spell checker could come in handy there, I’d look a right plum if I filled my game with spelling mistakes. :slight_smile:


@ Agoose77;
Thanks for the corrections you made to the spell checker. It still has problems with words like “it’s” or “don’t” but in all other ways, I’m really happy with the outcome. It should help me to quickly proofread my game dialogs before committing them to the game.

It’s not perfect, but then neither am I. :slight_smile:

Could you please make it say," that word is not in my database"?When it cannot correct the word.

This was mostly a hacky cut and paste job by me I’m afraid, but AFAIK, You can just run:

If word in NWORDS:
    word_in_databse = True
else:
    word_in_databse = False

Once the word is returned to find out if it found the word or not.

Where would I put that in the python code?

In the display function. After feeding the word in to the correct() function. You need to run it on the corrected word not the original word.

It would probably be located in your own custom function from where you send words to be checked and then display the results.

I still don’t know where to put it.

Well, it really depends on how you want to use the resource.
Like most resources on here it requires a bit of python knowledge to find out where to plug it in to your project.

If you’re going to be using text in your games, I recommend finding out a bit about Strings and text objects. There’s a lot you can do with them, but because the are immutable (they don’t change, you can create an edited copy though) it’s sometimes difficult to deal with them. It’s not easy, for example, to insert letters or words in a certain place. You have to learn about slicing and joining.

here’s the current python documentation about strings and text. Lots of useful functions in there that you can use directly with Blender.

But still, despite beign such a simple element, I think strings have always been something I’ve found hard to get to grips with. Yesterday I had a hell of a time trying to add newlines (
) to a string using input in blender, but I finally found out the right way of doing it. it’s just that sometimes strings don’t work the way you expect them to.