3

I'm trying to regex groups matching the same pattern using C#. Here is a little example which I can't get to work.

I need to get all the strings between the single quotes (CodigoEmpresa, for example)

uses MainRecord, objErrorList, SysUtils, XMLMXMWebServiceReturn, objMainProcesso,
 objProcessoWS, objProcessaRelatorioQuickReport, QuickRpt, Forms,
 RBalanc, RBalancete, RBaCCMens, RBalaMensal, RBalaMensalCons,
 objcadcontabilidade, objContabilidadeValidacoes;

const
CODIGO_EMPRESA             = 'CodigoEmpresa';
ANO_MES                    = 'AnoMes';
RELATORIO_POR              = 'RelatorioPOR';
CONTA_INI                  = 'ContaIni';
CONTA_FIM                  = 'ContaFim';
GRAU_CONTA                 = 'GrauConta';
CCUSTOS_INI                = 'CCustosIni';
CCUSTOS_FIM                = 'CCustosFim';
GRAU_CCUSTOS               = 'GrauCCustos';
DETALHAR_CONSOLIDADO       = 'DetalharConsolidado';
DESCONSIDERAR_ENCERRAMENTO = 'DesconsiderarEncerramento';
QUEBRA_CCUSTO              = 'QuebraCCusto';
CONTAS_SEM_MOVIMENTO       = 'ContasSemMovimento';
CODIGO_ALTERNATIVO         = 'CodigoAlternativo';

const

ERROR_BALANCETE_MENSAL_0001 = 'BALANC0001';
ERROR_BALANCETE_MENSAL_0002 = 'BALANC0002'; //Empresa Inexistente
ERROR_BALANCETE_MENSAL_0003 = 'BALANC0003';
ERROR_BALANCETE_MENSAL_0004 = 'BALANC0004';
ERROR_BALANCETE_MENSAL_0005 = 'BALANC0005';
ERROR_BALANCETE_MENSAL_0006 = 'BALANC0006';
ERROR_BALANCETE_MENSAL_0007 = 'BALANC0007';
ERROR_BALANCETE_MENSAL_0008 = 'BALANC0008';

I've tried that so far:

Match match = Regex.Match(delphiFileInText, @"const.+=\s*'(?<property>[\d\w]+)'", RegexOptions.IgnoreCase | RegexOptions.Singleline);

But all I get is that last match (BALANC0008);

I hope I can be clear. Thanks for help

5
  • 1
    You need to be much clearer about what pattern you are hoping to match, and what captures you need. Commented Nov 12, 2013 at 18:51
  • that doesn't make much sense Commented Nov 12, 2013 at 18:53
  • 2
    Show your code, be more specific about what you want, what you've tried, what's going wrong, etc. Commented Nov 12, 2013 at 18:58
  • You need the first .+ to be non-greedy. Use either CONST.+?(EST)ING+ or CONST(.+(EST)ING+)? I think. Commented Nov 12, 2013 at 19:14
  • 1
    I also doubt you are trying to match 1-many Gs on the end. Thats what the G+ is doing, allowing for TESTINGGGGGGG. Although since its G+? (ungreedy) and at the end of the pattern it is equivalent to just G anyway. Commented Nov 12, 2013 at 19:18

3 Answers 3

3

Simply replacing your expression with

'(?<property>[\d\w]+)'

will get all of them.

Sign up to request clarification or add additional context in comments.

Comments

1

I suggest the following Regular expression:

'(?<property>(?:\\'|[^'])*)'

Which will capture all of the single quote delimited strings in the input. If you want to capture the constants as well, I'd recommend the following regular expression:

(?<const>\w+)\s*=\s*'(?<property>(?:\\'|[^'])*)'

1 Comment

I changed it a little bit. The final version is: [A-Z_]*\s*=\s*'(?<property>(?:\\'|[^'])*)'. This way I can get the strings according to the uppercase constants. Your answer was perfect, I've just adapted it to my specific case. Thanks everyone.
0

It seems like in order to do what you want, you don't really need regular expressions but can just go through the file character by character and parse it that way. It will be much easier than trying to figure out the regular expression and it won't be "read only" (meaning when you come back to the code later you'll know exactly what to do. Here is a class that I baked up for this (I haven't fully tested it but did do a quick test on a sample string and it works as advertised there):

public class Parser
{
    bool inQuotes;
    public Parser()
    {
        inQuotes = false;
    }

    public List<string> Parse(string input)
    {
        List<string> output = new List<string>();
        StringBuilder temporaryString = new StringBuilder();
        for (int i = 0; i < input.Length; i++)
        {
            if (input[i] == '\'' && !inQuotes)
            {
                inQuotes = true;
                continue;
            }
            else if (input[i] == '\'' && inQuotes)
            {
                output.Add(temporaryString.ToString());
                inQuotes = false;
                temporaryString = new StringBuilder();
            }
            else if (inQuotes)
            {
                temporaryString.Append(input[i]);
            }
        }
        return output;
    }
}

This code will go through character by character and when it hits a single quote it will start "saving" the string until it hits another single quote. It will ignore all other characters and only focus on the characters inside the single quotes. Best of all, is you could adapt this to allow for nested quotes rather easily.

1 Comment

"Regular Expressions are Hard" is a terrible excuse. They're a powerful tool with a lot of time and expertise behind them. They're usually highly optimized for what they're doing, and can be written in maintainable and un-maintanable ways just like any other implementation. They're just finite state machines - kind of like your answer here, which could be regexed as '[^']*', for the record. That seems a whole lot maintainable than what you've written.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.