I am writing a parser for a custom mesh format for fluid dynamics simulation library, the mesh file contains 3D points (vertices) for the simulation mesh, for example:
[points 4]
2.426492638414711e-07,-0.0454127835577514,0.737590325020352
-0.02408935296003224,-0.02309953378412839,0.7378945938955059
-1.6462459712364876e-07,-0.02312891146336533,0.7381839359073152
0.024084588772487963,-0.02310255971887,0.737895047277951
My parser works okay, but it's awfully slow. I used a profiler to find the bottlenecks of my code, and I found that the following function alone takes up 66% of my CPU time.
Token Scanner::ScanNumber()
{
// We are here because this->unscannedChar contains a start of a number.
// A number can have different forms: 23, 3.14159265359, 6.0221409e+23, .0001
std::string ScannedNumber;
// Add current unscanned char
ScannedNumber += this->unscannedChar;
this->NextChar();
// This might allow wrongs decimal formats to be scanned, like for example: 2.3..4, 1e3e3, 1e6--6e-
// But since we are depending on std::from_chars to convert the string representation to a real number,
// and also the mesh file will not be generated by a human, so such wrong formats are very unlikely
// we are going to stick to this abomination for now.
while (
std::isdigit(this->unscannedChar) ||
this->unscannedChar == '.' ||
this->unscannedChar == 'e' ||
this->unscannedChar == '-'
)
{
ScannedNumber += this->unscannedChar;
this->NextChar();
}
Token token;
token.type = TokenType::NUMBER;
token.data = std::move(ScannedNumber);
return token;
}
And this is Token definition:
struct Token
{
TokenType type;
String data;
};
It's worth to note that NextChar() is not a concern at all according to the profiler (it handles around 2 million characters in 439 milliseconds), and strangely the lookup for e character is taking most of the function time.
Would appreciate the review of ScanNumber() and any tips to make it faster, to handle points in the range of millions.
