poniedziałek, 18 kwietnia 2016

SQL with VBS when data too big for Excel

I will use when data is too big for Excel, to quickly filter, aggregate or leftjoin data. Excel is not needed. It may be a poor-man, MSWindows (>=7.0v, it may work with older v. and an older Access runtime) solution for data too big for Excel.

1. Install Microsoft Access Runtime, the newest is 2013 https://www.microsoft.com/nl-nl/download/details.aspx?id=39358

2. Put in the same foldar: your initial data in a text file (csv) and an empty output file (csv) with one row reflecting all column names that should be in a result of a query.

3. Prepare a Schema.ini file: describe the two csv files in the following format (an example):

[PhoneList.csv]
ColNameHeader=True
Format=Delimited(;)
DateTimeFormat=yyyy-mm-dd
MaxScanRows=25
CharacterSet=ANSI
Col1=Surname Char Width 10
Col2=Name Char Width 10
Col3=No Long Width 10


[exp.csv]
ColNameHeader=True
Format=Delimited(;)
Col1=Surname Char Width 10
Col2=Name Char Width 10
Col3=No Long Width 10

4. Prepare a vbs file: it should contain the following code (an example):

Dim db: db = CreateObject("Scripting.FileSystemObject").GetParentFolderName(WScript.ScriptFullName)

Dim cn: Set cn = CreateObject("ADODB.Connection")

cn.Open _
    "Provider=Microsoft.ACE.OLEDB.15.0;" & _
          "Data Source=" & db & ";" & _
          "Extended Properties=""text;HDR=YES;FMT=Delimited(;)"""


cn.Execute "INSERT INTO [exp.csv] SELECT [Surname],[Name],[No] FROM [PhoneList.csv] WHERE [No] > -20"

cn.Close

5. The environment should be 32 or 64 bit, not a mix. Unfortunately I have 64 bit system i and office 32 bit therefore I am using a workaround tostart a script in 32 bit mode:

c:\windows\syswow64\cscript.exe moj_skrypt.vbs

Result: I tried the script on a 1 mln records csv file and it took 2 s to return the filtered data in the output csv.

Note to myself and others: Here you will find an alternative solution with R and the test data 2,6 mln records - R and ff package. For advanced jobs there is a superb idea of using ff with dplyr (ffbase2). I need to test it. http://www.r-bloggers.com/if-you-are-into-large-data-and-work-a-lot-with-package-ff/

Brak komentarzy:

Prześlij komentarz