11 January, 2014
I wrote a class for some Python code I’m working on, and I figured there might be others out there who would find it interesting. The class lets you set up a typical dictionary data structure (where hashable items are used as keys to look up values stored in the structure), but will allow you to do partial and case-insensitive matches.
A bit more details, and the code, after the break
I included a bit of testing code at the bottom of the code. If you run this from the Python interpreter directly from the command line, it should execute the test code and spit out the results to give you an idea of how it works.
Yes, I know this kinda goes against the purpose of a Python dictionary. It only works for strings instead of all hashable objects; it will try to turn your keys into strings, actually. And, the search is currently linear if you don’t include the original key, so performance will be an issue in bigger structures. I’m sure Python purists will line up around the block to tell me this is not really Pythonic. That’s fine, it worked for what I needed it and thought it was unique enough to be worth sharing.
Where would you use this? In a text MUD, you could use this for lookups of names of other players, or for abbreviations of commands. So the command “te psy I think this is okay.” might match “te” to the command “tell” and “psy” to the name “Psychochild” in the game, assuming there are no other commands/names that begin with the same letters.
One way to extend this might be to have an most recently used (MRU) cache. This would decrease performance in smaller dictionaries, but might help if you have a large dictionary with a few more frequently used keys. Maybe store and re-order the keys in the additional lookup code to improve access if you have some keys that would be used more often in a short period of time, such as abbreviations for names between two people chatting.
Anyway, I’m releasing this under the MIT license, which basically translates to “do whatever you want with it, just keep the copyright statement attached and don’t blame me if anything goes wrong.” A note to me if you do use it would also be appreciated. :)
Feel free to ask questions in the comments below, or if you post up an improvement.
# PartialMatchDict - a dictionary that will allow partial key matches. # # Code copyright: Brian 'Psychochild' Green - http://psychochild.org/ # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # 3. The names of its contributors may not be used to endorse or promote # products derived from this software without specific prior written # permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER # OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, # EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, # PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR # PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF # LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # TODO: Cache limited MRU entries for increased performance in large structures. # - Include a dict that maps partial keys to full keys? Or store keys in own data structure and re-order them based on use. # - Limit size to number of elements? # - Need to invalidate cache if we add more items. # TODO: Create custom exception types to differentiate between keys not found and keys not unique. class PartialMatchDict(dict): """ A dictionary structure that will return a value if its key at least partially uniquely matches a passed-in key. Exact matches returned first, then checks for partial matches to the beginnings of keys while ignoring case. Example: >>> pd = PartialMatchDict() >>> pd["thelongstring"] = 1 >>> pd["the"] = 2 >>> pd["TheString"] = 3 >>> pd["thelongstr"] 1 >>> pd["the"] 2 >>> pd["thestring"] 3 >>> pd["th"] KeyError: "Partial key 'th' not unique" >>> pd["anotherstring"] KeyError: "Missing key 'anotherstring'" """ def __init__(self, items = None): super(PartialMatchDict, self).__init__() if items: self.update(items) # PartialSearch - Do the partial search. # param: partialKey = key to try to match. # returns: a tuple, 0th element is True if found a value, None if no possible match, False if no unique match was found; # 1st element indicates value if successful, or first possible value if not unique. def PartialSearch(self, partialKey): # if the item is in the dictionary then just return it if dict.__contains__(self, partialKey): return (True, dict.__getitem__(self, partialKey)) matchFound = (None, None) # Don't partial match an empty string, because it will match anything. if len(partialKey) == 0: # "" is not a unique key. return (False, "(empty string)") # See if any key starts with our partialKey. Use lower() to ignore case. lowerPartialKey = str(partialKey).lower() for key in self: if str(key).lower().startswith(lowerPartialKey): if not matchFound: matchFound = (True, dict.__getitem__(self, key)) else: # More than one potential match, so set our success indicator to False. matchFound = (False, matchFound) break return matchFound # Find out if we have an exact key in the set. def has_exact_key(self, item): return dict.__contains__(self, item) # Override dict method, used for "in" operator. def __contains__(self, item): # Compare to True to have None (not found at all) returns False. return (self.PartialSearch(item) == True) # Override dict method, used when looking up key using . def __getitem__(self, item): wasFound, theValue = self.PartialSearch(item) if not wasFound: errorMsg = "%s" if wasFound == None: # Couldn't find anything at all. errorMsg = "Missing key '%s'" else: # Key not unique errorMsg = "Partial key '%s' not unique" raise KeyError(errorMsg % (str(item))) else: return theValue # If we run the file from the command prompt, do some tests. if __name__ == '__main__': testdict = PartialMatchDict() print "** Assigning testdict['Primary'] = 1" testdict['Primary'] = 1 print "** Assigning testdict['Secondary'] = 2" testdict['Secondary'] = 2 print "** Assigning testdict['Tertiary'] = 3" testdict['Tertiary'] = 3 print "** Assigning testdict['thirtieth'] = 30" testdict['thirtieth'] = 30 # Make sure we find what we expect. print "testdict['Primary'] == ", testdict['Primary'] print "testdict['primary'] == ", testdict['primary'] print "testdict['secondary'] == ", testdict['secondary'] print "testdict['tertiary'] == ", testdict['tertiary'] print "testdict['tert'] == ", testdict['tert'] print "testdict['thirtieth'] == ", testdict['thirtieth'] print "testdict['thirt'] == ", testdict['thirt'] print "testdict['th'] == ", testdict['th'] print "** Assigning testdict['th'] = None" testdict['th'] = None print "testdict['th'] == ", testdict['th'] print "testdict['thirt'] == ", testdict['thirt'] # Expect some errors. try: print testdict['quart'] except KeyError: print "There is no possible match, so could not find testdict['quart']" try: print testdict['t'] except KeyError: print "Possible matches are not unique, so could not find testdict['t']"