0

I have a single column DB file that is rigidly formatted. It doesn't have a header or field names in the export that I've imported into a Goggle Sheet. I need help with how to approach this task of reformatting the original column into multiple columns so I can use the filter tools in Google Sheets.

The following tables are in a sample shared sheet: here

The format of the single column DB is:

In column A1:A:

&++ BEGIN
[A Title A]    
Nonsense sentence    
~cat    
£££ or dog    
Nonsense sentence again    
~cat    
£££ or dog    
£££ to fish    
£££ this man    
Nonsense sentence 10    
~dog    
£££ by fish    
More nonsense sentences    
~cat again    
£££ or man    
£££ and fish    
[Title 2]    
Nonsense sentence 21    
~car    
£££ or bus    
£££ a train    
Nonsense sentence again2    
~boat and trailer    
£££ sea    
Nonsense sentence 40    
~lorry caravan fish    
£££ with a roof    
£££ the swimming pool    
£££ some rubbish    
&++ EOF (data)

The &++ BEGIN and &++ EOF... appear only once - at the start and the end

  • [..] denotes a title section

  • "~" is the field marker for the default string

  • "£££" is the field marker for all the options

The result I'm looking for is (in four columns C, D, E, F):

Nonsense sentence         ~cat                  £££ or dog                                                  [A Title A]
Nonsense sentence again   ~cat                  £££ or dog, £££ to fish, £££ this man                       [A Title A]
Nonsense sentence 10      ~dog                  £££ by fish                                                 [A Title A]
More nonsense sentences   ~cat again            £££ or man, £££ and fish                                    [A Title A]
Nonsense sentence 21      ~car                  £££ or bus, £££ a train                                     [Title 2]
Nonsense sentence again2  ~boat and trailer     £££ sea                                                     [Title 2]
Nonsense sentence 40      ~lorry caravan fish   £££ with a roof, £££ the swimming pool, £££ some rubbish    [Title 2]

The actual order of the columns the data ends up in is not important - but there are only four columns. The option column simply needs all the options concatenated into a single string with each separated by ", "

Explaining the format required for the result is so simple and can be managed manually very easily with just the basic knowledge of Google Sheets that I possess. Trying to encapsulate it all in one arrayformula in cell C1 has defeated me - because I don't want to manually format the 20,000+ rows of the original data!

Has anybody got experience of dealing with the variable number of rows between sections using a Google Sheets formula? I can probably resort to a script but would rather not if it can be avoided (I have to explain the process to others).

I've got some way towards a result using WRAPROWS(TOROW(A:A,3),4) but this fails at the first instance where the number of rows the variable data takes up exceeds 1.

3 Answers 3

1

To do that with a Google Sheets formula, use scan() and reduce(), like this:

=let(
  data, tocol(A1:A, 1),
  ordinals, scan(, data, lambda(a, c, ifs(
    isblank(c) + regexmatch(c, "^&"), iferror(ø),
    regexmatch(c, "^\w"), a + 1,
    true, a 
  ))),
  titles, scan(, data, lambda(a, c, ifs(
    isblank(c), iferror(ø),
    regexmatch(c, "^\["), c,
    true, a 
  ))),
  reduce(tocol(æ, 2), sequence(max(ordinals)), lambda(a, i, let(
    get_, lambda(a, regex, filter(a, ordinals = i, regexmatch(a, regex))),
    vstack(a, hstack(
      get_(data, "^\w"), get_(data, "^\~"), join(", ", get_(data, "^£££")),
      single(get_(titles, "^\["))
    ))
  )))
)

See let(), tocol(), scan(), regexmatch(), reduce(), sequence(), lambda(), filter(), vstack() and hstack().

Sign up to request clarification or add additional context in comments.

4 Comments

This is great. Thx @doubleunary. Two code constructs you've used that I don't yet understand. tocol(æ, 2). what mechanism provides the variable æ with a range? And, single(get_(titles, "^[")) in the last line. where is 'single' defined? I will vote this as the answer as it satisfies all the parameter I gave. Your time is much appreciated.
Did you see the SO bot considered this was not a question about code. Your code answer was code and perfect. How stupid are the SO bots these days.
"what mechanism provides the variable æ with a range?" — none. The æ deliberately produces an error so that we get a null row. "where is 'single' defined?" — nowhere. It's an undocumented Sheets function that sometimes comes handy.
Gotta love those 'undocumented Sheets function's :-) Thx, all clear now.
0

Give a try to the following formula-

=QUERY(LET(n,SCAN(" ",TOCOL(A:A,1),LAMBDA(a,x,IF(OR(ISNUMBER(SEARCH("Title",x)),ISNUMBER(SEARCH("EOF",x)))," ",IF(ISNUMBER(SEARCH("Nonsense",x)),x,a)))),
titles,SCAN(" ",TOCOL(A:A,1),LAMBDA(a,x,IF(ISNUMBER(SEARCH("Title",x)),x,a))),
REDUCE("",UNIQUE(n),LAMBDA(p,q,VSTACK(p,
HSTACK(q,INDEX(FILTER(TOCOL(A:A,1),n=q),2),TEXTJOIN(", ",1,QUERY(FILTER(TOCOL(A:A,1),n=q),"offset 2",0)),TEXTJOIN("",1,UNIQUE(FILTER(titles,n=q))))
)))),"offset 2",0)

enter image description here

1 Comment

Thx @Harun24hr. It will take me a while to work through your solution but could you suggest how I might meet the requirement that the title block is bounded by "[..]" rather than explicit text. I can see why you chose to use explicit text in the sample but the formula ignores the criteria: [..] denotes a title section, "~" is the field marker for the default string, and "£££" is the field marker for all the options. So the formula doesn't actually work with live data.
0

What you're asking isn't as trivial to do with a Google Sheets formula as it may seem, because the data format requires some pre-processing to avoid the need to maintain state as rows are processed.

On the other hand, maintaining state is easy in Apps Script. This custom function will exactly match the desired results you show:

'use strict';

/**
* Returns a tabulated copy of data.
*
* @customfunction
* @param {A1:A} data The text strings to tabulate.
* @return {String[][]} The tabulated data.
*/
function Tabulate(data) {
  const result = [];
  let sentence, string, options = [], title, prevTitle;
  data.flat().filter(String).map(String).forEach(d => {
    if (d.match(/^~/)) string = d;
    if (d.match(/^£££/)) options.push(d);
    if (d.match(/^\[/)) { prevTitle = title; title = d; }
    if (d.match(/^(\w|&\+\+ EOF)/)) {
      result.push([sentence, string, options.join(', '), prevTitle || title]);
      sentence = d;
      options = [];
      prevTitle = '';
    }
  });
  return result.slice(1);
}

Use the custom function in a formula like this:

=Tabulate(A1:A)

See Custom Functions in Google Sheets.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.