1

I have two CSV files. First file may content different number of lines. Every line have ID. In this case - place_id. I want add column in this file from second.

"place_id";"osm_type";"osm_id";"place_rank";"boundingbox";"lat";"lon";"display_name";"class";"type";"importance";"icon";"postcode";"city";"town";"village";"hamlet";"allotments";"neighbourhood";"suburb";"city_district";"state_district";"building";"address100";"address26";"address27";"address29";"county";"state";"country";"country_code";"place";"population";"wikidata";"wikipedia";"name";"official_name"
"100073243";"way";"108738557";"19";"56.1330951,56.1377776,35.7857419,35.7966764";"56.1354281";"35.7903646";"Bolshoe Syrkovo, Volokolamskij gorodskoj okrug, Moskovskaya oblast, CFO, RF";"place";"hamlet";"0.45401456808503";"https://nominatim.openstreetmap.org/images/mapicons/poi_place_village.p.20.png";"";"";"";"";"Bolshoe Syrkovo";"";"";"";"";"";"";"";"";"";"";"Volokolamskij gorodskoj okrug";"Moskovskaya oblast";"RF";"ru";"hamlet";"19";"Q4092451";"ru:Bolshoe Syrkovo";"Bolshoe Syrkovo";""
"100073263";"way";"108729132";"19";"56.1542386,56.156816,36.3303962,36.3383278";"56.15552975";"36.3343542260811";"Kondratovo, Volokolamskij gorodskoj okrug, Moskovskaya oblast, CFO, RF";"place";"hamlet";"0.385";"https://nominatim.openstreetmap.org/images/mapicons/poi_place_village.p.20.png";"";"";"";"";"Kondratovo";"";"";"";"";"";"";"";"";"";"";"Volokolamskij gorodskoj okrug";"Moskovskaya oblast";"RF";"ru";"";"";"";"";"Kondratovo";""
"100073265";"way";"108738571";"19";"56.009293,56.0205996,36.2239313,36.2390323";"56.015194";"36.2290485";"Gryady, Volokolamskij gorodskoj okrug, Moskovskaya oblast, CFO, Rossiya";"place";"village";"0.36089190172262";"https://nominatim.openstreetmap.org/images/mapicons/poi_place_village.p.20.png";"";"";"";"Gryady";"";"";"";"";"";"";"";"";"";"";"";"Volokolamskij gorodskoj okrug";"Moskovskaya oblast";"Rossiya";"ru";"village";"841";"Q4151063";"ru:Gryady (Moskovskaya oblast)";"Gryady";""

And second file. This file content full base of geo coordinates. Every line have place_id column which matches with place_id line in first file. Second file - i want copy strings from geojson column and add to first file by place_id. This file is match bigger than first. (First about 5 Mb, Second about 50 Mb.)

"place_id";"osm_id";"geojson"
"100059669";"111492916";"{""type"":""Polygon"",""coordinates"":[[[37.6221208,56.0629951],[37.6227846,56.0617338],[37.6235702,56.0612884],[37.6241549,56.0610708],[37.625994,56.06052],[37.627407,56.0616613],[37.6250022,56.0628003],[37.624107,56.0632933],[37.6244298,56.06364],[37.6240209,56.0640423],[37.6238138,56.0639879],[37.6238869,56.0635391],[37.6236798,56.0634711],[37.6221208,56.0629951]]]}"
"100066930";"108048163";"{""type"":""Polygon"",""coordinates"":[[[37.488797,54.9187857],[37.489145,54.9178087],[37.4916813,54.9161087],[37.4915675,54.914397],[37.4923037,54.9141008],[37.4938964,54.9139008],[37.4946333,54.9135329],[37.4950753,54.9135998],[37.4958462,54.9135723],[37.4961204,54.9133221],[37.4963465,54.9127529],[37.4976451,54.912619],[37.49836,54.9121783],[37.4984597,54.9124224],[37.4989822,54.9128384],[37.4986734,54.9131341],[37.4984106,54.9135755],[37.4984278,54.9141491],[37.4988218,54.9145627],[37.5001752,54.9148064],[37.5005392,54.9147547],[37.5005076,54.9157027],[37.5005411,54.9169758],[37.5003203,54.9183989],[37.500086,54.9191066],[37.4999331,54.919399],[37.4992204,54.9195132],[37.4991362,54.9199856],[37.4977175,54.9199433],[37.497684,54.9204933],[37.4959374,54.9204279],[37.4937625,54.9202703],[37.493187,54.9202895],[37.4925111,54.9202126],[37.4917951,54.9202741],[37.4903496,54.9202356],[37.4899949,54.920301],[37.4891785,54.9207395],[37.488884,54.9204202],[37.4888506,54.9200203],[37.488797,54.9194703],[37.488797,54.9187857]]]}"
"100073243";"108738557";"{""type"":""Polygon"",""coordinates"":[[[35.7857419,56.1346341],[35.7870207,56.1330951],[35.7960034,56.1354737],[35.7964486,56.136383],[35.7966764,56.1371216],[35.796451,56.1375923],[35.7940459,56.1377776],[35.7872053,56.1362698],[35.7860927,56.135251],[35.7857419,56.1346341]]]}"
"100073263";"108729132";"{""type"":""Polygon"",""coordinates"":[[[36.3303962,56.1556187],[36.3327609,56.1549359],[36.3332297,56.1553915],[36.3371409,56.1542386],[36.3383278,56.1554408],[36.3356724,56.1561707],[36.3352194,56.1557934],[36.3314321,56.156816],[36.3303962,56.1556187]]]}"
"100073265";"108738571";"{""type"":""Polygon"",""coordinates"":[[[36.2239313,56.0144832],[36.2261932,56.011495],[36.2284626,56.0095073],[36.2321529,56.009293],[36.2331509,56.0117168],[36.2341666,56.0135926],[36.2390323,56.0144832],[36.2385065,56.0167726],[36.2357356,56.0167906],[36.2334461,56.0197924],[36.2263531,56.0205996],[36.2251121,56.020199],[36.2239313,56.0144832]]]}"
"100075231";"110068197";"{""type"":""Polygon"",""coordinates"":[[[38.2935489,54.7729509],[38.2939625,54.7719488],[38.2950008,54.7717047],[38.2966022,54.7717389],[38.2968603,54.7712015],[38.2960165,54.7691138],[38.2982481,54.7689281],[38.3005051,54.7687673],[38.3025611,54.7678635],[38.3045996,54.7650658],[38.305887,54.7649297],[38.3081401,54.7650906],[38.3085907,54.7656105],[38.3078585,54.7664648],[38.3092498,54.7671973],[38.3097709,54.7679502],[38.3082259,54.7681977],[38.3082688,54.7688538],[38.3074105,54.7691138],[38.3073891,54.7696708],[38.3081616,54.7712304],[38.3070458,54.7719978],[38.3052433,54.7713294],[38.3036984,54.7705991],[38.3024753,54.7723691],[38.2999862,54.7725548],[38.2993425,54.7731984],[38.2958449,54.7734459],[38.2942355,54.77404],[38.2935489,54.7729509]]]}"
"100083347";"108773218";"{""type"":""Polygon"",""coordinates"":[[[37.363052,55.2929074],[37.3641893,55.2923393],[37.3680087,55.2950118],[37.3709592,55.2961602],[37.3732015,55.2966977],[37.3755511,55.2974185],[37.3748001,55.2984019],[37.3730727,55.2976933],[37.3689743,55.2966549],[37.3660883,55.2951706],[37.363052,55.2929074]]]}"
"100088132";"108787848";"{""type"":""Polygon"",""coordinates"":[[[36.3930954,56.1869244],[36.3949447,56.1858475],[36.4025567,56.1881928],[36.4037609,56.1903944],[36.4019117,56.1907295],[36.3982131,56.1894851],[36.3930954,56.1869244]]]}"
"100088151";"108787862";"{""type"":""Polygon"",""coordinates"":[[[36.4786893,56.0795892],[36.4788543,56.0782741],[36.4790085,56.0775791],[36.4790382,56.0775181],[36.4791316,56.0774071],[36.4790562,56.0772801],[36.4790339,56.0770308],[36.4814648,56.0770996],[36.48379,56.0816509],[36.4817819,56.0817197],[36.478929,56.0802664],[36.4786893,56.0795892]]]}"

I guess this is not difficult for a knowledgeable programmer. I'm not like that)

I tried many codes. But not one did not work for me. Below I will list the codes that I have tried. I am not good at programming. And maybe I don’t understand the meaning of some codes.

Please help with my case.

###
Get-ChildItem -Filter .\comb\*.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv .\combinedcsvs.csv -NoTypeInformation -Append
###

###
$DevData = (Import-Csv ".\pars_full_4_without_geo.csv" -Delimiter ";" -Encoding:UTF8)[1..10]
$ProdData = (Import-Csv ".\pars_full_4_only_geo.csv" -Delimiter ";" -Encoding:UTF8)[1..10]
# throw one set into a hashtable
# we can use this as a lookup table for the other set
$ProdTable = @{}
foreach($line in $ProdData){
    $ProdTable[$line.place_id] = $line.ID
}
# Output the DevData with the appropriate ProdData value
$DevData | Select-Object @{Label='DevID';Expression={$_.ID}},@{Label='ProdID';Expression={$ProdTable[$_.place_id]}},place_id | Export-Csv .\new2.csv -NoTypeInformation -Delimiter ";" -Encoding:UTF8
###

###
$f1=(Import-Csv ".\pars_full_4_without_geo.csv" -Delimiter ";" -Encoding:UTF8 -header "place_id","osm_type","osm_id","place_rank","boundingbox","lat","lon","display_name","class","type","importance","icon","postcode","city","town","village","hamlet","allotments","neighbourhood","suburb","city_district","state_district","building","address100","address26","address27","address29","county","state","country","country_code","place","population","wikidata","wikipedia","name","official_name")[1..1]
$f1
$f2=(Import-Csv ".\pars_full_4_only_geo.csv" -Delimiter ";" -Encoding:UTF8 -header samname,"place_id","osm_id","geojson")[1..1]
$f1|
   %{
      $geojson=$_.geojson
      $m=$f2|?{$_.geojson -eq $geojson}
      $_.place_id=$m.place_id
    }
$f1
###


###
#Make an empty hash table for the first file
$File1Values = @{}
#Import the first file and save the rows in the hash table indexed on "place_id"
Import-Csv ".\pars_full_4_only_geo.csv" -Delimiter ";" -Encoding:UTF8 | ForEach-Object {
  $File1Values.Add($_.place_id, $_)
}
#Import the second file and make a custom object with properties from both files
Import-Csv ".\pars_full_4_without_geo.csv" -Delimiter ";" -Encoding:UTF8 | ForEach-Object {
  [PsCustomObject]@{
    ABC = $File1Values[$_.KeyColumn].ABC;
    DEF = $File1Values[$_.KeyColumn].DEF;
    UVW = $_.UVW;
    XYZ = $_.XYZ;
  }
} | Export-Csv -Path c:\OutFile.csv
###

###
$Poproperties = @(
'worker_name',
'requester_name',
@{E={$Lookup_Hash.($_.field_834)};L='field_834'},
@{E={$Lookup_Hash.($_.field_835)};L='field_835'},
@{E={$Lookup_Hash.($_.field_836)};L='field_836'},
@{E={$Lookup_Hash.($_.field_837};L='field_837'},
@{E={$Lookup_Hash.($_.field_838)};L='field_838'}
)
Import-Csv -Path C:\S_FilePath | Select-Object -Property $Poproperties
###


###
$Lookup_Hash = Import-Csv ".\pars_full_4_only_geo.csv" -Delimiter ";" -Encoding:UTF8 | ForEach-Object -Process { $_.place_id = $_.name }
$S_File = Import-Csv ".\pars_full_4_without_geo.csv" -Delimiter ";" -Encoding:UTF8 | Select-Object -Property *,@{E={$Lookup_Hash.($_.place_id)};L='place_id'} | Export-Csv ".\pars_full_5_combine_geo.csv" -NoTypeInformation -Delimiter ";" -Encoding:UTF8
###

2 Answers 2

1

This is a working example I created that shows one way in which this can be done

I created two csv files

file1.csv

"id";"score"
"1";"90"
"3";"100"

file2.csv

"id";"firstname";"lastname"
"1";"steve";"jobs"
"2";"bill";"gates"
"3";"santa";"claus"

Then my powershell script, test.ps1

$csv1=(import-csv file1.csv -Delimiter ";")
$csv2=(import-csv file2.csv -Delimiter ";")
$csv1 |
    ForEach-Object{
        $row = $_
        if($mtch = $csv2|?{$_.id -eq $row.id}){
                $out = [pscustomobject]@{ id =  $row.id; firstname = $mtch.firstname; lastname = $mtch.lastname; score = $row.score }
                $out
           }
     } | Export-Csv csv3.csv -NoTypeInformation

This is how I run my script (in the same directory as the csv files

powershell -ExecutionPolicy RemoteSigned .\test.ps1

And this is the results, csv3.csv

"id","firstname","lastname","score"
"1","steve","jobs","90"
"3","santa","claus","100"
Sign up to request clarification or add additional context in comments.

5 Comments

Thx! This perfectly work for small files! My files about 50 mb. and now it's already 40 minutes and the process has not ended ... I need to look for workarounds. Thx for reaction on my question!
Why not import into a database?
Also, csv1 can be made to be the bigger file, so that you are looping through the bigger file and checking against the smaller file.
About import to database - its good question) i think its will be decision for me if i dont find acceptable option. I know how to prepare needed for my tasks csv file in in windows environment. I dont work with bases. Will see... How i can make csv1 bigger file?
I added own decision which work pretty well for my tasks.) All is good) Thx!
1

add a code that suits for my tasks. I split scheme on 3 steps.

$FileWithOutGeom = Import-Csv ".\FileWithOutGeom.csv" -Delimiter ';' -Encoding UTF8

# step 1. getting all IDs from file without coordinates - sort by ID and select place_id column values. I use join with delimiter '|' to bring data in a suitable format for next step. (for where-obgect -match)
$ID = [string]::Join("|",( $FileWithOutGeom | sort place_id | Select-Object -ExpandProperty 'place_id'))

# step 2. take second file with all coordinates and select from them only those rows which ID have in first file and sort by ID too
$FileWithAllGeom = Import-Csv ".\FileWithAllGeom.csv" -Delimiter ';' -Encoding UTF8 | Where-Object -property place_id -Match $ID | sort place_id

# step 3. take first file without geom and add-member - new column name (geojson) and values for this column from step 2 with add increment for each-object
$FileWithOutGeom | ForEach-Object -Begin {$i = 0} {$_ | Add-Member -MemberType NoteProperty -Name 'geojson' -Value ($FileWithAllGeom)[$i++].geojson -PassThru 
} | Export-Csv ".\CombinedFile.csv" -NoTypeInformation -Delimiter ";" -Encoding:UTF8

At the exit i have file with 'geojson' column at the end of first file. Sorry for maybe terrible code. I was combined this code from pieces found in net. This scheme work pretty fast for my tasks. Files about 50 mb and another 20 mb - processed in less than a 10 sec.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.