Have you ever had to join two tables with spelling and typing errors, variants of the same name etc.?
Try an UDF (user defined function in Excel) using NGrams and DiceCoefficient... my recommendation:
http://www.niftytools.de/2015/03/fuzzy-string-matching-excel/
Another option is to use an excelent vba macro which uses Levenstein and several other routines for fuzzy matching. http://www.mrexcel.com/forum/excel-questions/195635-fuzzy-matching-new-version-plus-explanation.html
Moreover, it is often the case that we want to join tables on strings that exist within names in a column of another table. Then you may use the following formula (array formula, press Ctrl+Shift+Enter):
{=INDEX(kolumna_wartosci;MATCH(FALSE;ISERROR(SEARCH(kolumna_przeszukiwana;szukana_wartosc));0))}
Other options... search google for fuzzy lookup udf: np. stackoverflow: http://stackoverflow.com/questions/13291313/matching-similar-but-not-exact-text-strings-in-excel-vba-projects
--------------
In MS SQL you can try using SimMetrics library and compare various routines: http://anastasiosyal.com/POST/2009/01/11/18.ASPX
And if we look for strings (in a column in TableB) in a column with long names in TableA, we can try:
SELECT TableA.descr as opis, max(TableB.Descr) AS major, b
FROM TableB
RIGHT JOIN TableA
ON TableA.Descr LIKE ("*"+TableB.Descr+"*")
GROUP BY TableA.descr,b;
Brak komentarzy:
Prześlij komentarz