1

Here's the situation.

I have four arrays like below that have the same length (per array) and matching "id" fields.

How can I merge elements using this matching "id" field?

array_1 = [
  {
    "id": "111",
    "field_1" "some string variables here",
    ...
  },
  {
    "id": "222",
    "field_1" "some string variables here",
    ...
  },
  ...
]

array_2 = [
  {
    "id": "111",
    "field_2" "other string variables here",
    ...
  },
  {
    "id": "222",
    "field_2" "other string variables here",
    ...
  },
  ...
]

...

Expected result:

result_array_after_merge = [
  {
    "id": "111",
    "field_1" "some string variables here",   <-- from array_1
    "field_2" "other string variables here",  <-- from array_2
    ...
  },
  {
    "id": "222",
    "field_1" "some string variables here",   <-- from array_1  
    "field_2" "other string variables here",  <-- from array_2
    ...
  },
  ...
]

4 Answers 4

3

use pandas!

for your data:

array_1 = [
  {
    "id": "111",
    "field_1" :"some string variables here"   
  },
  {
    "id": "222",
    "field_1": "some string variables here"
  }
]

array_2 = [
  {
    "id": "111",
    "field_2": "other string variables here"
  },
  {
    "id": "222",
    "field_2" :"other string variables here"
  }
]

import pandas as pd

##convert the arrays to dataFrames
df1 = pd.DataFrame(array_1)
df2 = pd.DataFrame(array_2)

## merge them on ids:
df_merged = pd.merge(df1, df2, on='id', how='left')

## export back to json friendly format
print(df_merged.to_dict('records'))

gives output:

[{'id': '111',
  'field_1': 'some string variables here',
  'field_2': 'other string variables here'},
 {'id': '222',
  'field_1': 'some string variables here',
  'field_2': 'other string variables here'}]
Sign up to request clarification or add additional context in comments.

2 Comments

is there any better way to merge multi dataframes at once? like pd.merge(df1, df2, df3...) ?
you could merge them on a loop. eg, start with df_merge = df1, and put the rest in a df_list = [df2, df3, df4...]. . then for d in df_list: df_merge = df.merge(df_merge, d, on='id'....)
1

Another scalable approach (don't matter how many arrays do you have), based on @yulGM answer:

import pandas as pd

list_of_arrays = [array_1, array_2] # list all of your arrays
dfs = map(pd.DataFrame, list_of_arrays)
pd.concat(dfs, axis=1).loc[:, lambda x: ~x.columns.duplicated()].to_dict('records')

This will only work if the number or fields across arrays are identical

Alternatively, for using merge:

from functools import reduce

dfs = map(pd.DataFrame, list_of_arrays)
reduce(lambda left, right: pd.merge(left, right, on='id', how='left'), dfs).to_dict('records')

Both of them results in:

[{'id': '111',
  'field_1': 'some string variables here',
  'field_2': 'other string variables here'},
 {'id': '222',
  'field_1': 'some string variables here',
  'field_2': 'other string variables here'}]

Comments

0

I would convert the 1st list to a dict where "id"s are the keys and the original list's elements are values. Then I would iterate the second list and update the values of the dict where the "id" matches. Then I would convert the dict back to a list by just taking the values:

array_1 = [
  {
    "id": "111",
    "field_1": "some string variables here",
  },
  {
    "id": "222",
    "field_1": "some string variables here",
  },
]

array_2 = [
  {
    "id": "111",
    "field_2": "other string variables here",
  },
  {
    "id": "222",
    "field_2": "other string variables here",
  },
]

dict_1 = {item["id"]: item for item in array_1}
for d in array_2:
    dict_1[d["id"]].update(d)

array_3 = list(dict_1.values())

from pprint import pprint
pprint(array_3)

Result:

[{'field_1': 'some string variables here',
  'field_2': 'other string variables here',
  'id': '111'},
 {'field_1': 'some string variables here',
  'field_2': 'other string variables here',
  'id': '222'}]

Comments

0

you can create a third list, iterate the second list for each id from the first one and if you find the id, update the third list with the items from both lists.

A better solution - i dont know if it's possible to change the data format - would be to create a dictionary with the id as the key and the fields as the value like so:

{1111: ["some string variables here", "some string variables here"]}

that would improve performance dramatically.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.