0

I have create a simple code of comment detection.

import javax.swing.*;
import java.util.regex.*;
import java.awt.*;
import java.awt.event.*;
public class regex extends JFrame implements ActionListener
{
    JPanel center=new JPanel();
    JPanel title=new JPanel();
    JTextArea text=new JTextArea();
    JTextArea result=new JTextArea();
    JScrollPane sctext=new JScrollPane(text);
    JScrollPane scresult=new JScrollPane(result);
    JButton proc=new JButton("proccess");
    regex()
    {
        setSize(600,600);
        setLayout(new BorderLayout());
        add(title,BorderLayout.NORTH);
        title.setLayout(new GridLayout(1,2));
        title.add(new JLabel("code"));
        title.add(new JLabel("Regex"));
        add(center,BorderLayout.CENTER);
        center.setLayout(new GridLayout(1,2));
        center.add(sctext);
        center.add(scresult);
        add(proc,BorderLayout.SOUTH);
        proc.addActionListener(this);
        show();
    }
    public void actionPerformed(ActionEvent e)
    {
        if(e.getSource()==proc)
        {
            try
            {
                result.setText("");
                Matcher m=Pattern.compile("(/\\*(.|[\\n]|(\\*+([^*/]|[\\r\\n])))*\\*+/)|(//.*)").matcher(text.getText());
                while(m.find())
                {
                    result.append(m.group()+"\n");
                }
            }
            catch(Exception x)
            {
                try
                {
                    File err = new File("error.txt");
                    java.io.PrintStream ps = new java.io.PrintStream(err);
                    x.printStackTrace(ps);
                    ps.close();
                }
                catch(Exception exx){}
            }
        }
    }
    public static void main(String[]agrs)
    {
        new regex();
    }
}

I don't know why my code can't detect long comments.
I have a sample of text that contain long comment.

/*
 * The Apache Software License, Version 1.1
 *
 * Copyright (c) 1999-2003 The Apache Software Foundation.  All rights 
 * reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer. 
 *
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in
 *    the documentation and/or other materials provided with the
 *    distribution.
 *
 * 3. The end-user documentation included with the redistribution, if
 *    any, must include the following acknowlegement:  
 *       "This product includes software developed by the 
 *        Apache Software Foundation (http://www.apache.org/)."
 *    Alternately, this acknowlegement may appear in the software itself,
 *    if and wherever such third-party acknowlegements normally appear.
 *
 * 4. The names "The Jakarta Project", "Tomcat", and "Apache Software
 *    Foundation" must not be used to endorse or promote products derived
 *    from this software without prior written permission. For written 
 *    permission, please contact [email protected].
 *
 * 5. Products derived from this software may not be called "Apache"
 *    nor may "Apache" appear in their names without prior written
 *    permission of the Apache Group.
 *
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
 * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
 * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
 * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 * ====================================================================
 *
 * This software consists of voluntary contributions made by many
 * individuals on behalf of the Apache Software Foundation.  For more
 * information on the Apache Software Foundation, please see
 * <http://www.apache.org/>.
 *
 */ 

But it's working in detecting short comment.

Program catch lot of error

code error

8
  • 1
    please make a minimal code example reproducing your problem. most of the code you posted is irrelevant. Commented Jun 4, 2015 at 5:12
  • What are you trying to do in the first place? Not quite sure if I understand your objective. Commented Jun 4, 2015 at 5:14
  • @1010 : i sorry, but it's my problem i don't know what should i do, but it's just my problem and other of long comment. Commented Jun 4, 2015 at 6:07
  • @Gosu : my problem in there Pattern.compile("(/\\*(.|[\\n]|(\\*+([^*/]|[\\r\\n])))*\\*+/)|(//.*)"). can't detect long comment Commented Jun 4, 2015 at 6:08
  • 2
    @newbie: Your gui code is irrelevant. I suggest you edit your post keeping the regular expression, the comment that fails and the error stacktrace. Other users may get a hint how to solve similar problems. Commented Jun 4, 2015 at 6:10

1 Answer 1

2

It seems that the Java regular expression engine is recursion-based. That means that the regular expression has to be optimized to produce fewer backtrackings. Yet I cannot see which backtracking produces this call stack.

Following proposals work for larger comments:

  • Pattern.compile("(/\\*.*?\\*/)", Pattern.DOTALL) (matches only /* .. */)
  • Pattern.compile("(/\\*([^\\*]|(\\*(?!/))+)*+\\*+/)|(//.*)")

Explanation:

  • (.| ...)* usually produces backtrackings because . matches (almost) all character and the other alternatives are usually also matching .* - so the first action is to eliminate .. In your case replace it by [^\\*].
  • [^\\*]|[\\n] == [^\\*] so remove [\\n]
  • [^*/]|[\\r\\n] == [^*/] so remove [\\r\\n]
  • to prevent backtracking we use *+ after the content loop (possessive regular expression). But this requires that the last dot is not consumed by the content loop. So we insert a negative loopahead for / after the matched *, i.e. (\\*(?!/))+
Sign up to request clarification or add additional context in comments.

3 Comments

that's great, it seem work for me. thanks for your explanation. i'll try it in my another long comment.
The SOE is due to the greedy/lazy repetition of some variable length sub pattern. Possessive quantifier is the way to go here, since it doesn't require stack linear to the length of the pattern.
Thats right - I'll remove the second solution (the correct one is equivalent to one of Alan Moore's)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.