I am trying to match open source license types in the comment out code in the beginning of most files. However, I am having difficulty for situations where the desired string (e.g. Lesser General Public License) spans two lines. See code below license for example.
* Copyright (c) Codice Foundation
* <p/>
* This is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser
* General Public License as published by the Free Software Foundation, either version 3 of the
* License, or any later version.
* <p/>
* This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
* even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details. A copy of the GNU Lesser General Public License
* is distributed along with this program and can be found at
* <http://www.gnu.org/licenses/lgpl.html>.
*/
Using a regex lookback is not possible due to the unknown number of spaces in commented code as well as the different comment characters in different languages. Examples of my current regex expressions are included below:
self._cr_license_re['GNU'] = re.compile('\sGNU\D')
self._cr_license_re['MIT License'] = re.compile('MIT License|Licensed MIT|\sMIT\D')
self._cr_license_re['OpenSceneGraph Public License'] = re.compile('OpenSceneGraph Public License', re.IGNORECASE)
self._cr_license_re['Artistic License'] = re.compile('Artistic License', re.IGNORECASE)
self._cr_license_re['LGPL'] = re.compile('\sLGPL\s|Lesser General Public License', re.IGNORECASE)
self._cr_license_re['BSD'] = re.compile('\sBSD\D')
self._cr_license_re['Unspecified OS'] = re.compile('free of charge', re.IGNORECASE)
self._cr_license_re['GPL'] = re.compile('\sGPL\D|(?<!Lesser)\sGeneral Public License', re.IGNORECASE)
self._cr_license_re['Apache License'] = re.compile('Apache License', re.IGNORECASE)
self._cr_license_re['Creative Commons'] = re.compile('\sCC\D')
I welcome any suggestions on how to tackle this problem using regex in python.
'OpenSceneGraph Public License'(and anywhere) with\s+, that is all.