Updating data (adding new fields) to MongoDB documents using a csv file (pymongo)

Question

I have a small base that I created by importing the csv of the document, but after aggregating with my original csv, a number of new fields appeared. How can I import a new document into the database while adding new fields and those documents that are unique by index?

My collection index in collection:

[('date', 1), ('country', 1), ('province/state', 1), unique=True]

Fragment of data located in the database:

_id:601d00ccbf6246e8f0e37460
date:"2021-02-02"
province/state:"AK"
confirmed:52775
probable_cases:NaN
total_test_results:1511785
country:"US"
and many more fields

Data in the collection that I need to get after importing a new csv file (for exaple):

_id:601d00ccbf6246e8f0e37460
date:"2021-02-02"
province/state:"AK"
confirmed:52775
probable_cases:NaN
total_test_results:1511785
country:"US"
vacctination:1234
daily_vaccinations_per_million:NaN
and many more fields

I have not yet found how to do this, but if someone knows, you can say in which direction to move or give an example.

You can think about using the mongoimport command-line tool to import the CSV data and Merge matching documents during import. — prasad_
– prasad_, Commented Feb 5, 2021 at 9:15
You can call the mongoimport in your Python code (if that works for your requirement). — prasad_
– prasad_, Commented Feb 5, 2021 at 12:33
Yes. I haven't tried with Python, but with Java it worked without any problems. — prasad_
– prasad_, Commented Feb 5, 2021 at 16:01
@prasad_ I'll have to try it tomorrow. And you have a small example, even in Java it will be by the way? And then most likely I will again get confused either in parentheses or quotes...this i can) — kostya ivanov
– kostya ivanov, Commented Feb 5, 2021 at 17:39

prasad_ · Accepted Answer · 2021-02-06 10:45:15Z

1

And you have a small example, even in Java it will be by the way?

Here is an example code in Java (it is a user-defined method, doImport), and this imports JSON data from a file into a MongoDB database collection. The java.lang.ProcessBuilder class is used to create an operating system processes (in which the mongoimport is run).

private static void doImport() 
        throws IOException, InterruptedException {

    final String [] cmd =  { "mongoimport.exe", "--db=testdb", "--collection=testcoll", "--file=N:\\files\\myFile.json" };

    ProcessBuilder pb = new ProcessBuilder(cmd);
    pb.redirectErrorStream(true);
    Process process = pb.start();
    
    try(BufferedReader in = new BufferedReader(
            new InputStreamReader(process.getInputStream()))) {
        String line;
        while ((line = in.readLine()) != null) {
            System.out.println(line);
        }
        process.waitFor();
        System.out.println("Done.");
    }
}

edited Feb 6, 2021 at 10:45

answered Feb 6, 2021 at 10:36

prasad_

14.4k2 gold badges27 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

kostya ivanov Over a year ago

Thank you very much for this example, I'll go try to rewrite it for python, if it works out well, I'll post it here too)

kostya ivanov Over a year ago

sorry i'm so problematic. I get a strange error, although it seems like I'm doing everything right. I am using the data refresh command:

mongoimport -c=update_test -d test_import --mode=upsert --upsertFields date_1_country_1_province/state_1 --type csv --file stage_5_3and4.csv --headerline

, but in the console i get Failed: bulk write error: [{[{E11000 duplicate key error collection. The number of documents before applying the command = 121856, and after = 175705. But there should be only 122485 of them, because there is not much new data. fields on old doс has not changed(

prasad_ Over a year ago

I suggest you try some test runs on a test collection with a few (5 to 10) sample documents and verify the few documents are updated as you want them. Also, see the import option "merge" in addition to the "upsert" you are trying.

kostya ivanov Over a year ago

@ prasad_ Did as you said. It's easier with a small data set. Initial - 5 documents (49 fields), added - 10 documents (61 fields). I was able to update the data and add the missing data when I removed the index. But it is a little alarming that it will not break sometime later?

mongoimport -c=update_test3 -d test_import --upsert --mode merge --upsertFields date,country,province/state --type csv --file file2.csv --headerline

kostya ivanov Over a year ago

@ prasad_ I apply the same process to a large collection and everything breaks down, the number of documents increases, although it should not

|

Collectives™ on Stack Overflow

Updating data (adding new fields) to MongoDB documents using a csv file (pymongo)

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related