0

I'm using Google's Speech-to-Text API to convert an audio file into text. It can identify speakers, which is really cool, but it formats the info in a way that I am having some trouble with. Here are their docs on separating out speakers.

My goal is to have a single string separating out lines by their speakers, something like this:

Speaker1: Hello Tom
Speaker2: Howdy
Speaker1: How was your weekend

If I send an audio file to get transcribed, I get back something like this:

wordsObjects =
[
  {
    startTime: { seconds: '1'},
    endTime: { seconds: '1'},
    word: 'Hello',
    speakerTag: 1
  },
  {
    startTime: { seconds: '2'},
    endTime: { seconds: '2'},
    word: 'Tom',
    speakerTag: 1
  },
]

Of course there's an object for each word, I just want to save space. Anything Tom says in this example should be represented by speakerTag: 2

Here's the closest I've gotten so far:

  const unformattedTranscript = wordsObjects.map((currentWord, idx, arr) => {
    if (arr[idx + 1]) {
      if (currentWord.speakerTag === arr[idx + 1].speakerTag) {
        return [currentWord.word, arr[idx + 1].word];
      } else {
        return ["SPEAKER CHANGE"];
      }
    }
  });

  const formattedTranscript = unformattedTranscript.reduce(
    (acc, wordArr, idx, arr) => {
      if (arr[idx + 1]) {
        if (wordArr[wordArr.length - 1] === arr[idx + 1][0]) {
          wordArr.pop();
          acc.push(wordArr.concat(arr[idx + 1]));
        } else {
          acc.push(["\n"]);
        }
      }
      return acc;
    },
    []
  );

This solution does not work if a speaker says more than two words consecutively. I've managed to confuse myself thoroughly on this one, so I'd love to be nudged in the right direction.

Thanks in advance for any advice.

3 Answers 3

1

You could add a chunkWhile generator function. Chunk the items as long as the speaker tag is the same, then convert each chunk into a line.

function* chunkWhile(iterable, fn) {
  const iterator = iterable[Symbol.iterator]();
  let {done, value: valueA} = iterator.next();
  if (done) return;

  let chunk = Array.of(valueA);
  for (const valueB of iterator) {
    if (fn(valueA, valueB)) {
      chunk.push(valueB);
    } else {
      yield chunk;
      chunk = Array.of(valueB);
    }
    valueA = valueB;
  }
  yield chunk;
}

const wordsObjects = [
  { word: 'Hello'  , speakerTag: 1 },
  { word: 'Tom'    , speakerTag: 1 },
  { word: 'Howdy'  , speakerTag: 2 },
  { word: 'How'    , speakerTag: 1 },
  { word: 'was'    , speakerTag: 1 },
  { word: 'your'   , speakerTag: 1 },
  { word: 'weekend', speakerTag: 1 },
];

const chunkGenerator = chunkWhile(
  wordsObjects,
  (a, b) => a.speakerTag == b.speakerTag,
);

let string = "";
for (const wordsObjects of chunkGenerator) {
  const speakerTag = wordsObjects[0].speakerTag;
  const words      = wordsObjects.map(({word}) => word).join(" ");
  
  string += `Speaker${speakerTag}: ${words}\n`;
}

console.log(string);

If you ever need to convert a generator to an array you can do Array.from(generator) or [...generator].

Sign up to request clarification or add additional context in comments.

Comments

0

I think you're overcomplicating things. You can simply iterate over words array and track current speaker tag. Whenever current word speaker tag changes you can add a new line (and if it didn't change - append current word to the current line). Here's an example:

const stringifyDialog = (words) => {
    let currSpeakerTag // number | undefined
    let lines = [] // Array<[number, string]>, where number is speaker tag and string is the line

    for (let {speakerTag, word} of words) {
        if (speakerTag !== currSpeakerTag) {
            currSpeakerTag = speakerTag
            lines.push([speakerTag, word])
        } else {
            lines[lines.length - 1][1] += ` ${word}`
        }
    }

    return lines.map(([speakerTag, line]) => `Speaker${speakerTag}: ${line}`).join('\n')
}

Given input

const wordsObjects =
[
  {
    word: 'Hello',
    speakerTag: 1
  },
  {
    word: 'Tom',
    speakerTag: 1
  },
  {
    word: 'Howdy',
    speakerTag: 2
  },
  {
    word: 'How',
    speakerTag: 1
  },
  {
    word: 'was',
    speakerTag: 1
  },
  {
    word: 'your',
    speakerTag: 1
  },
  {
    word: 'weekend',
    speakerTag: 1
  },
]

this will produce

"Speaker1: Hello Tom
Speaker2: Howdy
Speaker1: How was your weekend"

1 Comment

Thanks for clearing that up for me, and even writing out the code for a solution. It's a lot more straight forward than where my mind was taking me.
0

That's how i would do it using a reducer:

  const formattedTranscript = wordsObjects.reduce((accumulator, currentValue) => {

    // check if same speaker (continue on the same line)
    if(accumulator.length > 0)
    {
        const lastItem = accumulator[accumulator.length -1];
        if(lastItem.speakerTag === currentValue.speakerTag) {
          lastItem.text += " " + currentValue.word;
          return accumulator;
        }
    }

    // new line (new speaker)
    accumulator.push({
        speakerTag: currentValue.speakerTag, 
        text: currentValue.word 
    });

    return accumulator;
}, []);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.