0

I have a file that contains basic examples of input and output:

[Database.txt]
Hello*==Hello. How are you?
How*are*you*==I am fine I guess.
Can you*die*==I can not die. I am software.

I will get an input string that does not have punctuation.

Example: "can you ever die in a million years"

I am trying to match the input with the first side of the database on the left of "==" and return the second side of the line the string matched the first side with. So where input = "can you ever die in a million years", output = "I can not die. I am software."

I have to use native JavaScript. This is part of a personal project I have been working on and have not been able to get past in 4 months. It is part of an independent natural speech engine that could download the file, read it to a variable, and use it as a reference. I have tried combinations of looping through lines, splitting at "==", str.match(), and a lot of other stuff. I will manage case insensitivity. Any help would be greatly appreciated.

6
  • Doesn't look hard. But can you show how your current code parses your data? Commented Apr 18, 2015 at 5:44
  • If you change your mind and allow bash for the job, let me know, it would be a piece of cake. Commented Apr 18, 2015 at 5:44
  • You are going to run out of gas very quickly trying to do this using regular expressions. Commented Apr 18, 2015 at 6:03
  • @torazaburo Unfortunately there is no native high-level parser in JavaScript. OP probably wants to get his system running before improving the parser, and regexes will get him there very quickly. Commented Apr 18, 2015 at 6:06
  • @Touffy Yes, get him there very quickly, and then he runs into a brick wall, or over a cliff, choose your metaphor. Anyway, there are plenty of parsers and NLP packages in JS. He would be well-advised to start off with one now. Commented Apr 18, 2015 at 6:10

1 Answer 1

1

You can split it up into an array, and make each left side into a regexp.

then you can run a guantlet of tests to find the match. the tricky part is that you need to make multiple tests, beyond just one super regexp. i used [].some() to terminate after the first match is found. you can change the some with filter and collect the output to get multiple matches.

var gaunlet=[],
 str="[Database.txt]\n\
Hello*==Hello. How are you?\n\
How*are*you*==I am fine I guess.\n\
Can you*die*==I can not die. I am software.";

str.split("\n").forEach(function(a,b){
    var r=a.split("==");
    gaunlet[b]=[RegExp(r[0].replace(/\*/g,"[\\w\\W]*?"), "i"), r[1]];
});

function lookup(inp){
  var out;
    gaunlet.some(function(a){
        if(a[0].test(inp)) return out=a[1];
    });
  return out;
}


alert(lookup("can you die in a million years?"));

fiddle: https://jsfiddle.net/joaze5u6/1/

i also wrote in a fix for the way js captures wildcards, the [\w\W]*? does what .*? should probably do but doesn't in js...

Sign up to request clarification or add additional context in comments.

4 Comments

Why the + instead of *? In "Hello*" clearly you want to match just "Hello".
yeah, although it'd be best to use + between words and * in the end of the pattern…
This turned out perfect (after a few small modifications). Thank you. I tried using the Bourne Shell but without awk, sed, and grep it was impossible to do it within the required time limit.
I have discovered "2 to the nth" behavior. Every time it tries to find a new match almost doubles the amount of time that it takes. I have a large database and have tried splitting it up but for the most part it still takes the same amount of time looping through different parts of the database that are more likely to have a match based on personal pronouns. Catastrophic Backtracking?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.