Aggressive Google Scraping and Information Gathering with 'snitch'

in #scraping8 years ago (edited)


Snitch is an information gathering tool which automates information and sensitive gathering for a specified domain. Using built-in dork categories, it helps gather specified information which has been indexed by a search engine. It can be quite useful as a 'go to' initially during a pentest - allowing you to effectively spider interesting aspects of a domain or site without making a single request to the servers.
Example usage and output:

h2hth@root:~/snitch$ python snitch.py
_ __ __
_________ () /______/ /
/ / __ / / / / __ \
(
) / / / / /
/ /
/ / / /
/____/
/ /
/
/_/_// /_/ ~0.3

Usage: snitch.py [options]

Options:
-h, --help show this help message and exit
-U [url], --url=[url]
domain(s) or domain extension(s) separated by comma*
-D [type], --dork=[type]
dork type(s) separated by comma*
-C [dork], --custom=[dork]
custom dork*
-O [file], --output=[file]
output file
-S [ip:port], --socks=[ip:port]
socks5 proxy
-I [seconds], --interval=[seconds]
interval between requests, 2s by default
-P [pages], --pages=[pages]
pages to retrieve, 10 by default
-v turn on verbosity

Dork types:
info Information leak & Potential web bugs
ext Sensitive extensions
docs Documents & Messages
files Files & Directories
soft Web software
all All

devil@hell:~/snitch$ python snitch.py -D ext -U gov -P15
[!] Pages limit set to 15
[+] Target: gov

[+] Looking for sensitive extensions

http://www.seismic.ca.gov/pub/CSSC_1998-01_COG.pdf.OLD
http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/CoreSet_2010/formatdb.log
http://www.uspto.gov/web/patents/pdx/permitting_access.pdf_2010may17.bak
https://software.sandia.gov/trac/canary/attachment/ticket/3917/Pike_Hach%26SCAN_Oracle.edsx_convert.log
http://www.dss.virginia.gov/tst.log
http://appliedresearch.cancer.gov/nhanes_pam/create.pam_perday.log
https://igscb.jpl.nasa.gov/igscb/station/log/abmf_20150428.log
http://sun.ars-grin.gov:8080/dbf.sql
http://cci.lbl.gov/~phzwart/Betty_data/latest_data/acorn/14_molrep.log
http://appliedresearch.cancer.gov/nhanes_pam/create.pam_perminute.log
https://software.sandia.gov/trac/dakota/attachment/ticket/4166/hopperConf.log
https://igscb.jpl.nasa.gov/igscb/station/mgexlog/nya2_20130905.log
http://www.swrcb.ca.gov/losangeles/board_decisions/adopted_orders/index.shtml.old
http://web.epa.ohio.gov/phpMyAdmin.2.11.5/scripts/create_tables_mysql_4_1_2+.sql
https://trac.mcs.anl.gov/projects/mpich2/attachment/ticket/83/config.log
https://tcga-data.nci.nih.gov/docs/index.html.bak
http://spec.jpl.nasa.gov/ftp/pub/catalog/c098001.log
http://www.glerl.noaa.gov/metdata/2check_all.log
http://www.maine.gov/dep/ftp/MAIRIS/5.2.3_Installation/mairis_5_2_3_seq_mgmt.sql
http://ft.ornl.gov/eavl/regression/configure.log
http://airsar.jpl.nasa.gov/airdata/PRECISION_LOG/hd1883.log
http://www.uspto.gov/main/homepagenews/pprwrk_rdctn_act.htm_2009sep29a.bak
http://eula.mindspark.com/cookies/
http://www.antd.nist.gov/pubs/Sriram_BGP_IEEE_JSAC.pdf.old
http://www-esh.fnal.gov/pls/default/itna.log
http://web.epa.ohio.gov/phpMyAdmin.2.11.5/scripts/upgrade_tables_mysql_4_1_2+.sql
http://www.modot.mo.gov/newsandinfo/documents/_baks/Whathappenstoyourbenefitswhenyouterminatestateemployment.pdf.0001.c487.bak
http://maine.gov/REVENUE/netfile/WS_FTP.LOG
http://mls.jpl.nasa.gov/lay/UARS_MLS.LOG
http://airsar.jpl.nasa.gov/airdata/PRECISION_LOG/hd1469.log
http://www.iowa.gov/boee/handbook.pdf.old
http://yuri.lbl.gov/ontologies/obo-all/uberon_prerelease/uberon_prerelease.obo_xml.OLD
https://igscb.jpl.nasa.gov/igscb/station/general/blank.log
http://yuri.lbl.gov/ontologies/obo-all/disease_ontology/disease_ontology.owl2.OLD
https://www.health.ny.gov/health_care/medicaid/nyserrcd.ini
http://www.thruway.ny.gov/business/contractors/expedite/bid.ini
http://www.wpc.ncep.noaa.gov/html/ecmwf0012loop500_ak.cfg
https://fermilinux.fnal.gov/documentation/security/krb5.conf
http://spartatools.dnsops.gov/wiki/index.php/Dnsval.conf
http://w3.pppl.gov/~hammett/comp/MSWindows/teraterm/TERATERM.INI
http://usgcb.nist.gov/usgcb/content/configuration/workstation-ks.cfg
https://ics-web.sns.ornl.gov/kasemir/CSS/Training/DLS/Config/settings.ini
http://cmip-pcmdi.llnl.gov/cmip5/docs/esg.ini
http://spartatools.dnsops.gov/wiki/index.php/Dnssec-tools.conf
http://www.usatlas.bnl.gov/~caballer/files/cvmfs/etc/httpd/welcome.conf
https://security.fnal.gov/krb5.conf
http://collaborate2.nws.noaa.gov/canned_data/data_files/pqact.conf
http://archives1.dags.hawaii.gov/gsdl/collect/vitalsta/etc/oai.cfg
http://lambda.gsfc.nasa.gov/data/suborbital/BICEP2/B2_3yr_camb_planck_withB_params_20140314.ini

[+] Done!

Snitch can identify general information, potentially sensitive extensions, documents & messages, files and directories and web applications, another useful tool from the community!