wtorek, 7 lipca 2015

Fuzzy matching columns revisited.

Have you ever had to join two tables with spelling and typing errors, variants of the same name etc.? Try an UDF (user defined function in Excel) using NGrams and DiceCoefficient... my recommendation:
http://www.niftytools.de/2015/03/fuzzy-string-matching-excel/

Another option is to use an excelent vba macro which uses Levenstein and several other routines for fuzzy matching. http://www.mrexcel.com/forum/excel-questions/195635-fuzzy-matching-new-version-plus-explanation.html

Moreover, it is often the case that we want to join tables on strings that exist within names in a column of another table. Then you may use the following formula (array formula, press Ctrl+Shift+Enter):

{=INDEX(kolumna_wartosci;MATCH(FALSE;ISERROR(SEARCH(kolumna_przeszukiwana;szukana_wartosc));0))}

Other options... search google for fuzzy lookup udf: np. stackoverflow: http://stackoverflow.com/questions/13291313/matching-similar-but-not-exact-text-strings-in-excel-vba-projects
--------------
In MS SQL you can try using SimMetrics library and compare various routines:   http://anastasiosyal.com/POST/2009/01/11/18.ASPX

And if we look for strings (in a column in TableB) in a column with long names  in TableA, we can try:

SELECT  TableA.descr as opis, max(TableB.Descr) AS major, b
   FROM TableB
   RIGHT JOIN TableA
   ON  TableA.Descr LIKE ("*"+TableB.Descr+"*")
   GROUP BY  TableA.descr,b;

Brak komentarzy:

Prześlij komentarz