threat hunting | zeek | pcap | tshark | http user agents | RITA

2. finding cumulative talk time with zeek + datamash

cat conn.log | zeek-cut id.orig_h id.resp_h duration | sort | grep -v -e '^$' | grep -v '-' | datamash -g 1,2 sum 3 | sort -k 3 -rn | head

1st sort command meaning, whenever you see same source ip communicating with same destination ip, list those connections one by one. 

(removing blank lines) we are telling grep that, go and look for character $. $ means blank line. -e says this is a pattern match, regex match. not character match. this is the signature that we use to match for blank lines.  

grep -v means, select lines not matching any of the specified patterns. and the pattern is mentioned using -e flag. -e '^$'     select everything except those blank lines. now why we are removing blank lines? because datamash dont understand blank lines. datamash only wants numerical values in lines. 

grep -v '-' --> (removing dash -) select lines not matching the - dash character. select everything except dash - character. when zeek sees two systems connected to each other but no data got passed, then it marked the duration as 0. again dash - is not a numeric value, datamash also puke while seeing that. so we also need to remove that. 

datamash --> any time the 1st and 2nd column value matches, add them in the 3rd column. say we have 10 lines of same source and destination ip. datamash will add them as a single entry and sum up the 3rd column data. remember the 1st sort command? the 1st sort command already listed same source ip and same destination ip one line after another for you. 

now last sort command will take the 3rd raw, reverse the numerical value for you. 


1. find top 10 longest connection or top 10 talkers:

zeek stores duration in conn.log

capinfos -aeu <pcap file>

cat conn.log | zeek-cut id.orig_h id.resp_h duration | sort -k 3 -rn | head 


3. say we found top 2 ip's who are top talkers. now what? now do the followings:

cat dns.log | zeek-cut query answers | grep | sort | uniq -c

cat conn.log | zeek-cut id.orig_h id.resp_h service | grep | sort | uniq -c

cat ssl.log | zeek-cut id.resp_h server_name subject version | grep | sort | uniq -c  


you can view zeek logs using less command:

less -S conn.log

use right, left arrow key to move around. use q to get out of this and go back to the command prompt.


Session size analysis with zeek:

cat conn*.log | zeek-cut id.orig_h id.resp_h orig_bytes | grep | sort | uniq -c | sort -rn | head

1st sort means, when you see same source ip and same destination ip then list all of them serially one line after another. 

uniq -c meaning, if you see 1000 of lines (same source and same destination) then dont print them in 1000 lines. rather sum them up in front of the line.

1000 546

1000 times it found this source ip is communicating with this destination ip address. and each time 546 bytes of data are being transferred from source to destination. 


protocol decode zeek:

cat conn.log | zeek-cut id.orig_h id.resp_h id.resp_p proto service orig_bytes resp_bytes | column -t | head    5353    udp    dns    213    0    67    udp    dhcp    316    300 


counting FQDNs per domain using tshark:

tshark -r thunt-lab.pcapng -T fields -e | sort | uniq | rev | cut -d '.' -f 1-2 | rev | sort | uniq -c | sort -rn | head -10


tshark -r thunt-lab.pcapng -T fields -e udp.port==53 | head -10

tshark -r sample.pcap -T fields -e http.user_agent tcp.dstport==80 | sort | uniq -c | sort -n | head -10 


look for unique http user agents:

cat http*.log | zeek-cut user_agent | sort | uniq -c | sort 

cat http*.log | zeek-cut id.orig_h user_agent | sort | uniq | grep 

tshark -r sample.pcap -T fields -e http.user_agent tcp.dstport==80 | sort | uniq -c | sort -n | head -10

number of unique target ip's where a specific user agent was used:

cat http.log | zeek-cut id.orig_h id.resp_h user_agent | grep | sort | uniq | cut -f 3 | sort | uniq -c | sort -rn 


ngrep example to find specific keyword from the packet or payload analysis with ngrep

ngrep -q -I odd.pcap Admin | head -15

-q dont print # sign for non-matches

-I read a pcap file

we want to match Admin keyword from the packet. 

say we found ip pair and we suspect this. so analyze with ngrep

ngrep -q -I trace1.pcap host and host | less




rita | less         (list all of rita's commands.)

rita list            (list all of the known databases. pcap lab files need to import into                          rita.)

rita import <zeek log files>      (it will import zeek log files.) 

rita | grep beacons 

rita show-beacons lab1 -H | less -S   (lab1 is your dataset that you import                                                                   earlier) 

anything that score starting from 0.8, please investigate.  

rita show-beacons-fqdn lab1

rita show-beacons-proxy lab1


what data we are sending: single minded request:

cat http.log | zeek-cut id.orig_h id.resp_h uri | grep | sort | uniq -c | sort -rn



