0

I am trying to get group by and count in python. It does not seem to group for some reason

Using python 2.7

#!/usr/bin/env python
counts = {}
logfile = open("/tmp/test.out", "r")

for line in logfile:
    if line.startswith("20") in line:
        seq = line.strip()
        substr = seq[0:13]
        if substr not in counts:
            counts[substr] = 0
            counts[substr] += 1
            for substr, count in counts.items():
                print(count,substr)

I would like output like below grouped by count

 6 2019-06-17T00
 13 2019-06-17T01
  9 2019-06-17T02
  7 2019-06-17T03
  6 2019-06-17T04
6
  • 1
    Can you a sample of the file's contents and the output you're getting for it? Commented Jun 21, 2019 at 11:14
  • The file got many random lines..I am picking up only lines like below 2019-06-19T09:56:04.378+0000: [Times: user=153.84 sys=1.15, real=18.13 secs] 2019-06-19T09:59:46.370+0000: [Times: user=154.93 sys=1.24, real=18.65 secs] 2019-06-19T10:00:05.074+0000: [Times: user=155.21 sys=1.39, real=20.03 secs] Commented Jun 21, 2019 at 11:18
  • I am interested in only the hour and the counts of the occurrences..thanks Commented Jun 21, 2019 at 11:19
  • and I am getting the below output and it not grouped ('2019-06-16T10', 1) ('2019-06-15T19', 1) ('2019-06-16T13', 1) ('2019-06-16T12', 1) Commented Jun 21, 2019 at 11:20
  • The problem is the IF line (if substr not in counts:) Commented Jun 21, 2019 at 11:23

2 Answers 2

2

You have the substring incrementing indented one block too far

for line in logfile:
    if line.startswith("20") in line:
        seq = line.strip()
        substr = seq[0:13]
        if substr not in counts:
            counts[substr] = 0
        # Un-indented below
        counts[substr] += 1

# Print output only after loop completes
for substr, count in counts.items():
    print(count,substr)

Before you would only do the increment if the substring was not in the count dictionary.

Sign up to request clarification or add additional context in comments.

7 Comments

@MadPhysicist I agree unless they want to see the progress on each iteration, so I left it there.
Thanks Sam that works..the out seems to be looping continuously ('2019-06-17T05', 1) ('2019-06-17T03', 7) ('2019-06-17T07', 1) ('2019-06-17T06', 3) ('2019-06-16T02', 1) ('2019-06-16T00', 1) ('2019-06-17T02', 10)
The desired output does not indicate that they want to see it at every iteration
Looking for something uniq count...I do this in unix to achieve this..awk '{print substr($1,1,13)}' | sort | uniq -c
@user345270 Does this answer work properly now? Is there something that is not working still?
|
0
counts = {}
logfile = open("/tmp/test.out", "r")

for line in logfile:
    if line.startswith("20") in line:
        seq = line.strip()
        substr = seq[0:13]
        if substr not in counts:
            counts[substr] = 0
        counts[substr] += 1
for substr, count in counts.items():
    print(count,substr)

I think this would work

7 Comments

Why should that work? An explanation is more valuable than the solution, especially when your solution is basically a copy of the existing answer.
basically at the end of complete iteration of the file that you we opened we want to print the number of the times we encounter the string that starts with "20".
what your solution is does is it prints the string in each iteration, so if there are 50 lines in the file so your loop iterate through 50 times and print each time ...it you want to do so that why are you counting the string.
@MadPhysicist you updated your solution after seeing my solution so basically you copied my solution.
@user345270 if this solution worked for you than you may mark this solution as write.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.