0

I have a small base that I created by importing the csv of the document, but after aggregating with my original csv, a number of new fields appeared. How can I import a new document into the database while adding new fields and those documents that are unique by index?

My collection index in collection:

[('date', 1), ('country', 1), ('province/state', 1), unique=True]

Fragment of data located in the database:

_id:601d00ccbf6246e8f0e37460
date:"2021-02-02"
province/state:"AK"
confirmed:52775
probable_cases:NaN
total_test_results:1511785
country:"US"
and many more fields

Data in the collection that I need to get after importing a new csv file (for exaple):

_id:601d00ccbf6246e8f0e37460
date:"2021-02-02"
province/state:"AK"
confirmed:52775
probable_cases:NaN
total_test_results:1511785
country:"US"
vacctination:1234
daily_vaccinations_per_million:NaN
and many more fields

I have not yet found how to do this, but if someone knows, you can say in which direction to move or give an example.

6
  • 1
    You can think about using the mongoimport command-line tool to import the CSV data and Merge matching documents during import. Commented Feb 5, 2021 at 9:15
  • 1
    You can call the mongoimport in your Python code (if that works for your requirement). Commented Feb 5, 2021 at 12:33
  • 1
    @prasad_ this is done with "subprocess"? Commented Feb 5, 2021 at 15:56
  • 1
    Yes. I haven't tried with Python, but with Java it worked without any problems. Commented Feb 5, 2021 at 16:01
  • 1
    @prasad_ I'll have to try it tomorrow. And you have a small example, even in Java it will be by the way? And then most likely I will again get confused either in parentheses or quotes...this i can) Commented Feb 5, 2021 at 17:39

1 Answer 1

1

And you have a small example, even in Java it will be by the way?

Here is an example code in Java (it is a user-defined method, doImport), and this imports JSON data from a file into a MongoDB database collection. The java.lang.ProcessBuilder class is used to create an operating system processes (in which the mongoimport is run).

private static void doImport() 
        throws IOException, InterruptedException {

    final String [] cmd =  { "mongoimport.exe", "--db=testdb", "--collection=testcoll", "--file=N:\\files\\myFile.json" };

    ProcessBuilder pb = new ProcessBuilder(cmd);
    pb.redirectErrorStream(true);
    Process process = pb.start();
    
    try(BufferedReader in = new BufferedReader(
            new InputStreamReader(process.getInputStream()))) {
        String line;
        while ((line = in.readLine()) != null) {
            System.out.println(line);
        }
        process.waitFor();
        System.out.println("Done.");
    }
}
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you very much for this example, I'll go try to rewrite it for python, if it works out well, I'll post it here too)
sorry i'm so problematic. I get a strange error, although it seems like I'm doing everything right. I am using the data refresh command: mongoimport -c=update_test -d test_import --mode=upsert --upsertFields date_1_country_1_province/state_1 --type csv --file stage_5_3and4.csv --headerline, but in the console i get Failed: bulk write error: [{[{E11000 duplicate key error collection. The number of documents before applying the command = 121856, and after = 175705. But there should be only 122485 of them, because there is not much new data. fields on old doс has not changed(
I suggest you try some test runs on a test collection with a few (5 to 10) sample documents and verify the few documents are updated as you want them. Also, see the import option "merge" in addition to the "upsert" you are trying.
@ prasad_ Did as you said. It's easier with a small data set. Initial - 5 documents (49 fields), added - 10 documents (61 fields). I was able to update the data and add the missing data when I removed the index. But it is a little alarming that it will not break sometime later? mongoimport -c=update_test3 -d test_import --upsert --mode merge --upsertFields date,country,province/state --type csv --file file2.csv --headerline
@ prasad_ I apply the same process to a large collection and everything breaks down, the number of documents increases, although it should not
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.