0

After understating log concept I'm working on web request logs and try to match some keywords inside of string of those logs are their method is GET and I also reviewed pythonic log parsing and related post 1 and 2.

let's assume that I have a portion of a log line that looks something like this:

"GET /pas/public/ping?pas-client=007 HTTP/1.1"

the full log is:

[10/Oct/2021:05:45:20 +0000] SsAzyfdtrfdV1GKU7+Q== user_Zwfikdlo5CgOT0Loq8g== "GET /pas/public/ping?pas-client=007 HTTP/1.1" 200 "-b" 53b 5ms "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)" qFhyYRqbqtsM7tA== 50001 - -

So I tried to use a regex can match with some part of API or query path (means ) before ? mark or (URL parameters). I want to check the events in spark dataframe which has single column raw and if it was GET http method then match it with /ping?. if it was the case I create column name Type and label it ping in from of that event as below and if it was not I label it POST:

raw Type
[10/Oct/2021:05:45:20 +0000] SsAzyfdtrfdV1GKU7+Q== user_Zwfikdlo5CgOT0Loq8g== "GET /pas/public/ping?pas-client=007 HTTP/1.1" 200 "-b" 53b 5ms "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)" qFhyYRqbqtsM7tA== 50001 - - GET_ping
# parsing df
sdf = (df  
    .withColumn('Type', F
        .when(F.isnull('raw'), '-')
        .when(F.regexp_extract('raw', '(?i)^/ping?$', 0) == F.col('raw'), 'ping') #To specify ping activities

So here my problem is similar to this post kind of multi condition regex problem. So how I can create right conditional regex which can match:

  • with a word start with " following by space and it has 3 letters G E T
  • with a word start with / and ended up with ? and it has 4 letters p i n g

1 Answer 1

0

Your regex seems a bit off. This is my version. You should at least escape the backslashes \ and the question mark ? after ping.

(df
    .withColumn('type', F
        .when(F.isnull('raw'), '-')
        .when(F.regexp_extract('raw', '(GET).*(\/ping)\?', 0).isNotNull(), 'GET_ping')
        .otherwise('POST')
    )
    .select('type')
    .show()
)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.