1

I have a code...

var userArray=userIn.match(/(?:[A-Z][a-z]*|\d+|[()])/g);

...that separates the user input of a chemical formula into its components.

For example, entering Cu(NO3)2N3 will yield

Cu , ( , N , O , 3 , ) , 2 , N , 3.

In finding the percentage of each element in the entire weight, I need to count how many times each element is entered.

So in the example above,

Cu : 1 , 
N  : 5 , 
O : 6 

Any suggestions of how I should go about doing this?

5
  • Does the quantifier always come right after the element? Also, is nesting allowed? Are two digit numbers allowed? Commented Jun 28, 2013 at 22:50
  • 1
    This is much more than just counting occurrences. This is parsing and multiplying. Commented Jun 28, 2013 at 22:51
  • @Barmar Yes, this requires an actual parser - not a particularly hard one though. Tokens are letters, numbers (quantifiers) and brackets. I don't mind giving the OP a good answer on how to implement it but it's not very clear yet. Commented Jun 28, 2013 at 22:53
  • Yes, the quantifier will be right after the element, and two digits numbers ARE allowed. So entering H12, will be H, 12 . The only exception would be with parenthesis, where the following number would have to multiply by everything inside the parenthesis. Commented Jun 28, 2013 at 22:57
  • @TGH The g modifier makes it return all occurrences in an array. Commented Jun 28, 2013 at 22:59

2 Answers 2

2

You need to build a parser

There is no simple way around that. You need nesting and memory, a regular expression can't handle that very well (well, a real CS regulular expression can't handle that at all).

First, you get the result regexp you have. This is called Tokenization.

Now, you have to actually parse that.

I suggest the following approach I will give you pseudo code because I think it will be better deductively. If you have any questions about it let me know:

method chemistryExpression(tokens): #Tokens is the result of your regex

  1. Create an empty map called map

  2. While the next token is a letter, consume it (remove it from the tokens)

    2.1 Add the letter to the map with occurrence 1 or increment it by one if it's already inside the map

  3. If the next token is (, consume it: # Deal with nesting

    3.1 Add the occurrences from parseExpression(tokens) to the map (note, tokens changed)

    3.2 Remove the extra ) you've just encountered

  4. num = consume tokens while the next token is a number and convert to int

  5. Multiply the occurances of all tokens in the map by num

  6. Return the map

Implementation suggestion

  • The map can just be an object.

    • Adding to the map is checking if the key is there, if it is not, set it to 1, if it is there, increment its value by one.

    • Multiplying can be done using a for... in loop.

  • This solution is recursive this means you're using a function which calls itself (chemistryExpression) in this case. This parser is a very basic example of a recursive descent parser and handles nesting well.

  • Common sense and good practice necessitate two methods

    • peek - what is the next token in the tokens, this is tokens[0]
    • next - grab the next token from tokens, this is tokens.unshift()
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I think I understand for the most part, so I'll get to work!
0

For each value in userArray, check if there is a next element anf if that next element is a number, if so, add this number to the count of the current element type, else add 1. You can use an object as a map to store a count for each distinct element type :

var map = { }
map[userArray[/*an element*/] = ...

EDIT : if you have numbers longer than a digit, then in a loop while the next is a number, concatenate all numbers into a string and parseInt()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.