3

I am new to querying in Google BigQuery and am attempting to flatten an ARRAY field in my query so that the array values are listed as a single result in a comma separated list. In my query "associations.associatedvids" is an array field in the deals table. My issue is really is a 2 step problem as I also need to match the associatedvids with the corresponding first and last name fields in another table called contacts.

First, for the contact ids, when I do the following

Select
    CAST(property_hs_object_id.value AS String) AS deal_ID,
    associations.associatedvids AS associated_contacts_ID
From hubspot_data.deals

I get a result like this:

Row    deal_ID         associated_contacts_ID.value 
1      1814103617      3240001
                       3239951
...

but what I want is:

Row    deal_ID         associated_contacts_ID.value 
1      1814103617      3240001,3239951
...

I've tried different ways of unnesting the array, but cannot seem to get it right. For instance the following attempt returns the error "Scalar subquery produced more than one element".

Select
    CAST(property_hs_object_id.value AS String) AS deal_ID,
    (select associations.associatedvids from unnest(associations.associatedvids)) AS associated_contacts_ID
From hubspot_data.deals

Second, what I ultimately want is:

Row    deal_ID         associated_contact_names 
1      1814103617      John Doe,Jane Doe
...

The names fields are property_firstname.value and property_lastname.value, and associations.associatedvids (data type ARRAY<STRUCT>)=contacts.vids (data type INT64). I've tired the following, but since the data types are different I'm getting an error.

Select
    CAST(property_hs_object_id.value AS String) AS deal_ID,
    (
        select concat(property_firstname.value, " ", property_lastname.value)
        from hubspot_data.contacts
        where contacts.vid=associations.associatedvids
    ) AS contact_name
From hubspot_data.deals

Any guidance would be much appreciated!

EDIT: Here's is my attempt at a minimal working example piece of code. I believe the field I'm trying to query is an ARRAY of STURCTs with the data type of the Struct element I want being INT64.

WITH deals AS (
    Select "012345" as deal_ID,
    [STRUCT(["abc"] as company_ID, [123,678,810] as contact_ID)]
    AS associations)
SELECT 
    deal_ID,
    contacts
FROM deals d
CROSS JOIN UNNEST(d.associations) as contacts

this give me:

Row    deal_ID    contacts.company_ID    contacts.contact_ID    
1      012345     abc                    123
                                         678
                                         810

but what I want is

Row    deal_ID    contacts.contact_ID   
1      012345     123, 678, 810

And ultimately, I need to replace the contact_IDs with the contact first and last names that are in a different table (but fortunately not in an array).

3
  • simple and easy to accomplish. but in order to actually put answer as an answer - you need provide more info about your data - see How to create a Minimal, Reproducible Example Commented Aug 19, 2020 at 21:44
  • Hi Mikhail, below I've tried to create a smallest reproducible example but I'm not sure it exactly captures my problem as I don't fully understand the data structure of the field that I'm having trouble querying. I believe the field I'm trying to get to is an ARRAY of STRUCTs of data type INT64 (which is maybe why String_AGG doesn't work?) but I'm not entirely sure what that means. EDIT: it looks like I need to edit my original post to get the example in...see above please Commented Aug 21, 2020 at 19:53
  • the simplest way to show schema of your table is to actually locate your table in BQ Console > click on Schema and copy it - important parts to catch : Field name, Type, Mode Commented Aug 21, 2020 at 20:01

1 Answer 1

3

Below is for BigQuery Standard SQL

Based on limited info in your question - I guess you are missing STRING_AGG in the second query you presented in your question

It should be

SELECT
  CAST(property_hs_object_id.value AS String) AS deal_ID,
  (SELECT STRING_AGG(associations.associatedvids) FROM UNNEST(associations.associatedvids)) AS associated_contacts_ID
FROM hubspot_data.deals   

Update: answer on updated question

#standardSQL
SELECT 
  deal_ID,
  ARRAY(
    SELECT AS STRUCT 
      company_ID, 
      ( SELECT STRING_AGG(CAST(id AS STRING), ', ') 
        FROM t.contact_ID id
      ) AS contact_ID 
    FROM d.associations t
  ) AS contacts
FROM deals d
Sign up to request clarification or add additional context in comments.

6 Comments

Hi, thanks for answering. I've tried the STRING_AGG function but it's not working since associations.associatedvids is the wrong data type for that function. For example, I get the following error when I try the above query. No matching signature for aggregate function STRING_AGG for argument types: ARRAY<STRUCT<value INT64>>
Thanks! Your answer helped me figure out a solution. My actual data wasn't exactly the same structure as the small example I created, but I was able to figure it out with your answer to the small example.
perfect. happy you got the idea and was able to apply to your real use case :o)
I've updated my question to add a little more information about how I'm trying to replace the contact_ID with the actual contact names. I'm getting stuck on how to make a join with the elements inside an array. I'd really appreciate if you could take a look at the update.
it is not good (or anyhow appreciated) practice here on SO, to update already answered question with new question(s). Rather you should post new question and we will be happy to help you there. Meantime, please rollback your update to its previous [answered] state
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.