since a week I try to succeed on my topic, but I cant find any acceptable solution. I mean, ... I have a working solution, but comparing only takes about half a day :-S
Precondition: Both csv-files are already copy-pasted into local workbook. they are present and ready to play with them. each file has ~6000 rows and 4 columns. column A: documentname/version column B: subject1 column C: subject2 column D: boolean-artefact both csv files have same structure. column A includes a documentname and its latest version. each row contains a combination of: documentname/version, subj1, subj2, boolean
Examples of CSV_old new including comments/change for csv_new in column_E
Document/Version Subj1 Subj2 BOOLEAN
DOC_1/Vers1 FUN GERMANY FALSE
DOC_2/Vers3 FUN GERMANY TRUE
DOC_2/Vers3 FUN UK TRUE <- to be deleted in CSV_new
DOC_2/Vers3 FUN FRANCE TRUE
DOC_3/Vers7 ACTION GERMANY FALSE <- Version Update in CSV_new
DOC_4/Vers4 MOVIE UK TRUE
DOC_6/Vers1 HELP SPAIN FALSE
DOC_7/Vers2 FUN GERMANY FALSE <- boolean: true in CSV_new
DOC_8/Vers5 FUN FRANCE TRUE <- Subj1: ACTION instead of FUN
CSV_new
Document/Version Subj1 Subj2 BOOLEAN
DOC_1/Vers1 FUN GERMANY FALSE
DOC_2/Vers3 FUN GERMANY TRUE
DOC_2/Vers3 FUN UK TRUE
DOC_2/Vers3 FUN FRANCE TRUE
DOC_3/Vers9 ACTION GERMANY FALSE <- Version Updated
DOC_4/Vers4 MOVIE UK TRUE
DOC_5/Vers5 DANGER UK FALSE <- new/added Row in CSV_new
DOC_6/Vers1 HELP SPAIN FALSE
DOC_7/Vers2 FUN GERMANY FALSE <- boolean updated to true
DOC_8/Vers5 ACTION FRANCE TRUE <- Subj1: ACTION instead of FUN
Aim: Compare two CSV files (both derived from database). Each file is a derived version from a huge database (extract). I would like to check an older csv file (e.g version 2.0, csv_old) against a newer on (e.g. version 4.1, csv_new).
This way I would like to see the differents between both derived versions (extracts) of the database. There can be new inserted/added lines as well as deleted lines.
So far I got a code which is working, but takes sooooooo much time. I paste a kind of pseudo code to give you an imagination about my approach (it does only contain one step of comparison):
For rowInOldCSV = 3 To Sheets("_ws_oldCSV").Range("A65536").End(xlUp).Row
Set findSameDocumentNumberInColumnA = Sheets(givenActiveWS).Cells.Find(Sheets("_ws_oldCSV").Range("A" & rowInOldCSV & ":D" & rowInOldCSV).Value, LookIn:=xlValues)
Set findSameDocumentNumberInColumnA_withoutVers = Sheets(givenActiveWS).Cells.Find(Left(Sheets("_ws_oldCSV").Cells(rowInOldCSV, 1).Value, Len(Sheets("_ws_oldCSV").Cells(rowInOldCSV, 1).Value) - 5), LookIn:=xlValues)
If Not findSameDocumentNumberInColumnA Is Nothing Then
'document/version found!
firstAddress = findSameDocumentNumberInColumnA.Address
Do
'if subj1+subj2 are same
If (Sheets(givenActiveWS).Cells(findSameDocumentNumberInColumnA.Row, 2).Value = Sheets("_ws_oldCSV").Cells(rowInOldCSV, 2).Value) And _
(Sheets(givenActiveWS).Cells(findSameDocumentNumberInColumnA.Row, 3).Value = Sheets("_ws_oldCSV").Cells(rowInOldCSV, 3).Value) Then '....and boolean-value the same
'Sheets("_ws_oldCSV").Range("A" & rowInOldCSV & ":D" & rowInOldCSV).Copy 'takes even longer
'Sheets(givenActiveWS).Cells(findSameDocumentNumberInColumnA.Row, 6).PasteSpecial Paste:=xlPasteValues
Sheets(givenActiveWS).Cells(findSameDocumentNumberInColumnA.Row, 6).Value = Sheets("_ws_oldCSV").Cells(rowInOldCSV, 1).Value
Sheets(givenActiveWS).Cells(findSameDocumentNumberInColumnA.Row, 7).Value = Sheets("_ws_oldCSV").Cells(rowInOldCSV, 2).Value
Sheets(givenActiveWS).Cells(findSameDocumentNumberInColumnA.Row, 8).Value = Sheets("_ws_oldCSV").Cells(rowInOldCSV, 3).Value
Sheets(givenActiveWS).Cells(findSameDocumentNumberInColumnA.Row, 9).Value = Sheets("_ws_oldCSV").Cells(rowInOldCSV, 4).Value
'leave loop
Exit Do
End If
Set findSameDocumentNumberInColumnA = Sheets(givenActiveWS).Cells.FindNext(findSameDocumentNumberInColumnA)
Loop While Not findSameDocumentNumberInColumnA Is Nothing And findSameDocumentNumberInColumnA.Address <> firstAddress
Else
'document/version not found
If Not findSameDocumentNumberInColumnA_withoutVers Is Nothing Then
'document found, looks like new version
'mark it with yellow to show updated version
Else
'unkown document, means new introduced since csv_old
'copy it under last item in RowF
'
End If
End If
next rowInOldCSV
So far to my approach. I saw two different ones: http://www.ms-office-forum.net/forum/showthread.php?t=279399 and Excel VBA: Range to String Array in 1 step both seems to work quite well and veeeery fast, but unfortunately I am not able to use it for my scenario.
I guess, I have to put the values from column into string-array to start a comparison? I am out of ideas and have no clue how to handle Column-Values into String-Arrays. Sorry, ...
You might help me?
Result of comparison: would be nice to write stuff into CSV_new.
Doc/Vers Subj1 Subj2 BOOLEAN Doc Subj1 Subj1 Boolean
DOC_1/Vers1 FUN GERMANY FALSE - - - -
DOC_2/Vers3 FUN GERMANY TRUE - - - -
DOC_2/Vers3 FUN UK TRUE Deleted - - -
DOC_2/Vers3 FUN FRANCE TRUE - - - -
DOC_3/Vers9 ACTION GERMANY FALSE Updated - - -
DOC_4/Vers4 MOVIE UK TRUE - - - -
DOC_5/Vers5 DANGER UK FALSE New - - -
DOC_6/Vers1 HELP SPAIN FALSE - - - -
DOC_7/Vers2 FUN GERMANY TRUE - - - X
DOC_8/Vers5 ACTION FRANCE TRUE - X - -
Many, many thanks in advance for your effort!!!!! :o)


