I am new to Spark and It seems very confusing to me. I had gone through the spark documentation for Java API But couldn't figure out the way to solve my problem. I have to process a logfile in spark-Java and have very little time left for the same. Below is the log file that contains the device records(device id, decription, ip address, status) span over multiple lines. It also contains some other log information which I am not bothered about. How can I get the device information log from this huge log file. Any help is much appreciated.
Input Log Data :
!
!
!
device AGHK75
description "Optical Line Terminal"
ip address 1.11.111.12/10
status "FAILED"
!
device AGHK78
description "Optical Line Terminal"
ip address 1.11.111.12/10
status "ACTIVE"
!
!
context local
!
no ip domain-lookup
!
interface IPA1_A2P_1_OAM
description To_A2P_1_OAM
ip address 1.11.111.12/10
propagate qos from ip class-map ip-to-pd
!
interface IPA1_OAM_loopback loopback
description SE1200_IPA-1_OAM_loopback
ip address 1.11.111.12/10
ip source-address telnet snmp ssh radius tacacs+ syslog dhcp-server tftp ftp icmp-dest-unreachable icmp-time-exceed netop flow-ip
What I have done so far is:
Java Code
JavaRDD<String> logData = sc.textFile("logFile").cache();
List<String> deviceRDD = logData.filter(new Function<String, Boolean>() {
Boolean check=false;
public Boolean call(String s) {
if(s.contains("device") ||(check == true && ( s.contains("description") || s.contains("ip address"))))
check=true;
else if(check==true && s.contains("status")){
check=false;
return true;
}
else
check=false;
return check; }
}).collect();
Current Output :
device AGHK75
description "Optical Line Terminal"
ip address 1.11.111.12/10
status "FAILED"
device AGHK78
description "Optical Line Terminal"
ip address 1.11.111.12/10
status "ACTIVE"
Expected Output is:
AGHK75,"Optical Line Terminal",1.11.111.12/10,"FAILED"
AGHK78,"Optical Line Terminal",1.11.111.12/10,"ACTIVE"