Commit 1897a772 authored by Russ Fish's avatar Russ Fish
Browse files

Improved probe undoing and setup/teardown logic. Added some documentation.

parent dc52a1ea
This diff is collapsed.
#
# EMULAB-COPYRIGHT
# Copyright (c) 2007 University of Utah and the Flux Group.
# All rights reserved.
#
sec-check/README-FIRST.txt - Sec-check documentation outline.
. Goals
- Purpose: Locate and plug all SQL injection holes in the Emulab web pages.
Guide plugging them and repeat to find any new ones we introduce.
- Useful as a test harness, even if not probing.
- Method: Combine white-box and black-box testing, with much automation.
. Background (See sec-check/README-background.txt)
- SQL Injection vulnerabilities: Ref "The OWASP Top Ten Project"...
- Automated vulnerability scan tools, search and conclusions...
. Sec-check concepts. (See sec-check/README-concepts.txt)
- Overview of sec-check tool
. This is an SQL injection vulnerability scanner, built on top of an
automated test framework. It could be factored into generic and
Emulab-specific portions without much trouble.
. Drives the HTML server in an inner Emulab-in-Emulab experiment via wget,
using forms page URL's with input field values. Most forms-input values
are automatically mined from HTML results of spidering the web interface.
. This is "web scraping", not "screen scraping"...
. Implemented as an Emulab GNUmakefile.in for flexible control flow...
- Several stages of operation are supported, each with analysis and
summary...
. src_forms:
Grep the sources for <form and make up a list of php form files.
. activate:
Sets up the newly swapped-in ElabInElab site in the makefile...
. spider:
Recursively wget a copy of the ElabInElab site and extract a <forms list.
. forms_coverage:
Compare the two lists to find uncovered (unlinked) forms.
. input_coverage:
Extract <input fields from spidered forms.
. normal:
Create, run, and categorize "normal operations" test cases.
. probe:
Create and run probes to test the checking code of all input fields.
. Details of running and incremental development (See README-howto.txt)
- General
. Directories
. Inner Emulab-in-Emulab experiment
- High-level targets
. all: src_forms spider forms_coverage input_coverage normal probe
. msgs: src_msg site_msg forms_msg input_msg analyze probes_msg
- Stages of operation (makefile targets)
. src_forms: src_list src_msg
. activate: activate.wget $(activate_tasks) analyze_activate
. spider: clear_wget_dirs do_spider site_list site_msg
. forms_coverage: files_missing forms_msg
. input_coverage: input_list input_msg
. normal: gen_all run_all analyze
. probe: gen_probes probe_all probes_msg
#
# EMULAB-COPYRIGHT
# Copyright (c) 2007 University of Utah and the Flux Group.
# All rights reserved.
#
sec-check/README-background.txt
See README-FIRST.txt for a top-level overall outline.
- Background on SQL Injection vulnerabilities: Ref "The OWASP Top Ten Project"
http://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project
. "The OWASP Top Ten represents a broad consensus about what the most
critical web application security flaws are."
. The first flaw on the list (many others are consequences of this one.)
"A1 Unvalidated Input -
Information from web requests is not validated before being used by a
web application. Attackers can use these flaws to attack backend
components through a web application."
http://www.owasp.org/index.php/Unvalidated_Input
. One of the consequences:
"A6 Injection Flaws -
Web applications pass parameters when they access external systems
or the local operating system. If an attacker can embed malicious
commands in these parameters, the external system may execute those
commands on behalf of the web application."
http://www.owasp.org/index.php/Injection_Flaws
. More details:
- The OWASP Guide Project
http://www.owasp.org/index.php/Category:OWASP_Guide_Project
- Guide Table of Contents
http://www.owasp.org/index.php/Guide_Table_of_Contents
. Data Validation
http://www.owasp.org/index.php/Data_Validation
- Data Validation Strategies
http://www.owasp.org/index.php/Data_Validation#Data_Validation_Strategies
- Prevent parameter tampering
http://www.owasp.org/index.php/Data_Validation#Prevent_parameter_tampering
- Hidden fields
http://www.owasp.org/index.php/Data_Validation#Hidden_fields
. Interpreter Injection
http://www.owasp.org/index.php/Interpreter_Injection
- SQL Injection
http://www.owasp.org/index.php/Interpreter_Injection#SQL_Injection
- Automated vulnerability scan tools, search and conclusions
. In July 2006, I surveyed 29 available free and commercial tools,
categorized as site mappers, scanners, http hacking tools, proxies,
exploits, and testing tools. 9 of them were worth a second look.
Many were Windows-only, or manual tools for "penetration testing" to find
a single unchecked query hole and then attack a database through it.
Some had automation specific to Microsoft SQL server, which is easy
because it reports SQL error messages through query results that can leak
onto HTML pages. These can easily be used to locate injection holes and
spill the internal details of the database schema; then the DB data is
siphoned out and/or mischief is done through the hole.
None of the tools targetted MySQL, but I verified that MySQL is still
vulnerable to SQL injection attack from any unchecked or unescaped inputs
to GET or POST forms. Trivially, just include an unmatched single-quote
in any input string that goes into a dynamically-built SQL query that has
argument strings delimited by single-quotes.
This is of course only useful if you have another way to know the
database schema. Such as an open-source distribution of our software,
which would also allow finding input checking holes by inspecting the PHP
code. Hence the goal to locate, plug, and verify *all* such holes before
open-source distribution.
Some site mappers are combined with plugins to generate "blind" SQL
injection probes against the input fields of forms. They might be
effective against a suicidal site with very little sanity checking on
inputs.
Our PHP code checks "almost all" inputs serially, so the first input that
is rejected short-circuits checking the rest of the inputs and generating
queries. But a clever penetrator could generate reasonable inputs and
get past that to find a hole and exploit it. We need an automated way to
provide some assurance that there are no holes to find.
. I selected several of the tools to try:
- Screaming Cobra - Mapper with form vulnerability "techniques". (Perl)
I tried it in the insulation of an Emulab-in-Emulab, with various
combinations of arguments. Not surprisingly, its blind thrashing
didn't penetrate any of the public Emulab pages.
But one could go much further with an Emulab login uid and password,
after adding necessary login session cookie logic. And further still
if admin mode were breached...
- Spike Proxy, an HTTP Hacking / Fuzzing proxy
(Open Source, Python, Windows/Linux) w/ SQL injection.
This is a good example of what a manual "penetration tester" (black or
white hat) would use to attack a web site.
I used it to record browsing sessions while manually operating the
Emulab web site. It also allows editing inputs and replaying attacks
but I didn't try that.
It was useful to determine exactly what POST arguments were passed and
their values. This broke a chicken-and-egg bootstrapping problem: I
had to script "activation" of the DB (creating "one of everything")
before I could spider the web site and enumerate the full set of forms
arguments.
- WebInject - Free automated test tool for web apps and services.
Perl/XML, GPL'ed.
This sounded perfect at first, particularly since it presents a GUI of
the test results. But to do that, it has to be provided with a
complete set of success and failure match strings, expressed along with
the URL's to probe in an XML file.
Unfortunately, it's made for monolithic runs, where everything is in a
single session. That's not useful for incremental development,
particularly since the the web pages are not retained to help
understand what happened.
Maybe this will be useful at some point though, so I kept it under
sec-check with some stubs in the makefile for generating the XML.
Meanwhile, I found that the (new version of the) venerable "wget"
command has the necessary options to retain login cookies, set
traversal limits for recursive spidering, and convert pages so they are
browsable from local directories. Far more convenient.
I wound up implementing logic similar to WebInject in a much simpler
way (but without a GUI.)
#
# EMULAB-COPYRIGHT
# Copyright (c) 2007 University of Utah and the Flux Group.
# All rights reserved.
#
sec-check/README-concepts.txt - Design and methods employed in sec-check.
For more details of running and incremental development, see README-howto.txt .
See README-FIRST.txt for a top-level outline.
- Overview of the sec-check tool
. This is an SQL injection vulnerability scanner, built on top of an
automated test framework. It could be factored into generic and
Emulab-specific portions without much difficulty.
. Drives the HTML server in an inner Emulab-in-Emulab experiment via wget,
using forms page URL's with input field values. Most forms-input values
are automatically mined from HTML results of spidering the web interface.
. This is "web scraping", not "screen scraping"
http://en.wikipedia.org/wiki/Web_scraping
Web scraping differs from screen scraping in the sense that a website
is really not a visual screen, but a live HTML/JavaScript-based
content, with a graphics interface in front of it. Therefore, web
scraping does not involve working at the visual interface as screen
scraping, but rather working on the underlying object structure
(Document Object Model) of the HTML and JavaScript.
. Implemented as an Emulab GNUmakefile.in for flexible control flow.
- Some actions use gawk scripts to filter the results at each stage,
generating inputs and/or scripts for the next state.
- Several stages of operation are supported, each with analysis and summary,
corresponding to the top-level sections of the GNUmakefile.in .
For more details of running and incremental development, see README-howto.txt .
"gmake all" to do everything after activation.
"gmake msgs" to see all of the summaries.
----------------
. src_forms:
Grep the sources for <form and make up a list of php form files.
Here's an example of the src_msg output:
** Sources: 107 separate forms are on 89 code pages. **
** (See src_forms.list and src_files.list
** in ../../../testbed/www/sec-check/results .) **
----------------
. activate:
Sets up the newly swapped-in ElabInElab site in the makefile to create
"one of everything" (or sometimes two in different states), thus turning
on as many forms as we can for spidering.
** Activation analysis: success 12, failure 0, problem 0, UNKNOWN 0 **
** (See analyze_activate.txt in ../../../../testbed/www/sec-check/results .) **
. Cookie logic for logout/login/admin actions is also in the makefile.
----------------
. spider:
Recursively wget a copy of the ElabInElab site and extract a <forms list.
** Spider: 1773 ( 3 + 1770 ) forms instances are in 55 ( 3 + 55 ) web pages. **
** (See *_{forms,files}.list in ../../../testbed/www/sec-check/results .) **
- Actually, spider it twice, once not logged in for the public view,
and again, logged in and with administrative privileges, for the
private view.
- Don't follow page links that change the login/admin state here.
- Also reject other links to pages which don't have any input fields,
and don't ask for confirmation before taking actions. These must be
tested specially.
----------------
. forms_coverage:
Compare the two lists to find uncovered (unlinked) forms.
** Forms: 34 out of 89 forms files are not covered. **
** (See ../../../testbed/www/sec-check/results/files_missing.list .) **
- Generally, unlinked forms are a symptom of an object type (or state)
that is not yet activated. Iterate on the activation logic.
----------------
. input_coverage:
Extract <input fields from spidered forms.
** Inputs: 9965 input fields, 343 unique, 123 over-ridden. **
** (See site_inputs.list and input_names.list
** in ../../../testbed/www/sec-check/results,
** and input_values.list in ../../../testbed/www/sec-check .) **
- form-input.gawk is applied to the spidered public and admin .html
files to extract forms/input lists.
- That process is generic, but there are a few little Emulab special
cases in the makefile where they are combined into a single list.
Special cases (hacks) are marked with XXX to make them easy to find.
- Start making an input values over-ride dictionary to point the pages
at the activation objects, using common input field names.
----------------
. normal:
Create, run, and categorize "normal operations" test cases.
** Run analysis: success 47, failure 6, problem 2, UNKNOWN 0 **
** (See analyze_output.txt in ../../../testbed/www/sec-check/results .) **
- The forms/inputs list is combined with the input value over-ride
dictionary using forms-to-urls.gawk, producing a list of forms page
URL's with GET and/or POST arguments.
- The url list is separated into setup, teardown, and show (other)
sections using the sep-urls.gawk script.
The {setup,teardown}_forms.list control files specify sequences of
PHP pages in the order that their operations must be performed,
e.g. creating a new project before making new experiments in the
project.
- A subtlety is that the activation objects are used by the "show"
script, where the setup and teardown scripts leave those alone and
suffix the ephemeral Emulab objects they create and delete with a
"3". There are many XXX special cases in sep-urls.gawk .
- The separated url lists are transformed into scripts containing wget
commands (generated by the urls-to-wget.gawk script) and run.
- Iterate until everything works, categorizing UNKNOWN results with
{success,failure,problem}.txt pattern lines until everything is
known. "Problems" are a small subset of failures, showing errors in
the sequencing of operations, or broken page logic due to the testing
environment, rather than input errors detected by the page logic.
- Additional commands, prefixed with a "!" character, are added to the
{setup,teardown}_forms.list files, which start to look more like
scripts.
Arbitrary commands can be ssh'ed to $MYBOSS and $MYOPS.
There's a special "sql" pseudo-command that can be used for select
queries to fetch values from the Emulab DB into shell variables, or
update queries to set DB state. (See urls-to-wget.gawk for details.)
It's also useful to surround sections with conditional logic to check
that necessary objects are in place to avoid a lot of unnecessary
collateral damage from page failures.
----------------
. probe:
Create and run probes to test the checking code of all input fields.
** Probe analysis: 408 probes out of 408 executed: 166 showed success,
** 242 failure (or probes caught), 38 dups, 0 UNKNOWN.
** Probes to 53 pages gave 13 hits: 13 backslashed, 0 UNCAUGHT in 0 pages.
** (See probe-labels.list and uncaught-files.list
** in ../../../testbed/www/sec-check/results .) **
- SQL injection probes are special strings substituted for individual
GET and POST arguments by the forms-to-urls.gawk script. They start
with an unmatched single-quote and are labeled with their form page
and input field name, for example:
query_type='**{kb-search.php3:query_type}**
- One page will be probed in the generated wget scripts as many times
as it has input fields (up to 30 in one case.) After the probes to a
page, one "normal" wget line is generated to perform the page
function and create the necessary conditions for going on to probe
the next page.
- A "probe catcher" is put into the underlying PHP common query
function to look in constructed SQL query strings for the probe
string prefix and throw an error if it's seen, with or without a
backslash escaping the single-quote. (This "hit" error message is
included in the failure.txt file, so probe hits are also failures.)
Obviously, the goal is to probe everything, and let no probe go
uncaught or unescaped.
- Sometimes, it's necessary to wait for a "backgrounded" page action to
complete before going on in the script. There's a "waitexp" helper
script for the common case of waiting for an Emulab experiment to be
in a particular state, "active" by default.
- Many probe strings will be ignored or escaped by the page logic,
causing the page to perform its function (such as creating or
deleting a user, project, or experiment.) There may some strange
text included (or not, if only the presence of the argument is
considered by the page logic.)
- The failure.txt file is used to determine whether the page performed
its function and if so to undo it. "Undo" command lines are added
after PHP page files in the {setup,teardown}_forms.list files,
prefixed with a "-" character. There's an undo_probes.pl script with
common logic for a variety of Emulab object types.
- Plug all of the holes by adding or fixing input validation logic.
. Re-run probes to check.
. Re-do it periodically, as the system evolves.
sec-check/README-howto.txt - Documentation outline.
- Overview
. Purpose: Locate and plug all SQL injection holes in the Emulab web pages.
- Guide plugging them and find any new ones we introduce.
. Method: Combine white-box and black-box testing, with automation.
- Background
. Ref "The OWASP Top Ten Project"
http://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project
- "The OWASP Top Ten represents a broad consensus about what the most
critical web application security flaws are."
- The first flaw on the list (many others are consequences of this one.)
"A1 Unvalidated Input -
Information from web requests is not validated before being used by a
web application. Attackers can use these flaws to attack backend
components through a web application."
http://www.owasp.org/index.php/Unvalidated_Input
- One of the consequences:
"A6 Injection Flaws -
Web applications pass parameters when they access external systems
or the local operating system. If an attacker can embed malicious
commands in these parameters, the external system may execute those
commands on behalf of the web application."
http://www.owasp.org/index.php/Injection_Flaws
- More details:
. The OWASP Guide Project
http://www.owasp.org/index.php/Category:OWASP_Guide_Project
. Guide Table of Contents
http://www.owasp.org/index.php/Guide_Table_of_Contents
- Data Validation
http://www.owasp.org/index.php/Data_Validation
. Data Validation Strategies
http://www.owasp.org/index.php/Data_Validation#Data_Validation_Strategies
. Prevent parameter tampering
http://www.owasp.org/index.php/Data_Validation#Prevent_parameter_tampering
. Hidden fields
http://www.owasp.org/index.php/Data_Validation#Hidden_fields
- Interpreter Injection
http://www.owasp.org/index.php/Interpreter_Injection
. SQL Injection
http://www.owasp.org/index.php/Interpreter_Injection#SQL_Injection
- Forms coverage
. Grep the sources for <form and make up a list of php form files.
gmake src_forms
Creates: src_forms.list, src_files.list
- 105 separate forms are on 95 php code pages (plus 7 "extras" on Boss.)
gmake src_msg
. Spider a copy of the EinE site with wget and extract its forms list.
Have to edit the EinE experiment details into the makefile.
It's better to change your password in the EinE than put it in the makefile.
See GNUmakefile.in for details.
gmake login
gmake spider
gmake site_forms
Creates: admin.wget subdir, site_forms.list, site_files.list
- 40 "base" forms are visible once logged in as user, 47 with admin on.
gmake site_msg
. Compare the two lists to find uncovered (unlinked) forms.
gmake forms_coverage
Creates: files_missing.list
gmake forms_msg
. Create a script to activate the EinE site to turn on all forms.
- Look in the sources to find where the missing links should be.
- Connect to the EinE site from a browser through Spike Proxy.
- Interactively create DB state that will elicit the uncovered forms.
. Projects/users awaiting approval,
. Experiments swapped in with active nodes, and so on.
- Capture a list of URL's along with Get or Post inputs for automation.
- Add steps to the activate: list in the GNUmakefile.in .
. Re-spider and compare until everything is covered (no more missing forms.)
gmake spider
gmake forms_msg
- Input fields coverage
. Grep spidered forms for <input definitions and devise acceptable values.
gmake input_coverage
Creates: site_inputs.list, input_names.list
You make: input_values.list
At first, Copy input_names.list to input_values.list,
then edit default values onto the lines for auto-form-fill-in.
Values with a leading "!" over-ride an action= arg in the form page URL.
After the first time, you can merge new ones into input_values.list .
Lines with no value are ignored and may be flushed if you want.
- 1631 <input lines in admin-base, 511 unique, with 156 unique field names.
gmake input_msg
- But only 78 of the unique field names are text fields.
- "normal operation" test cases
. Convert the list to test cases submitting input field values.
gmake gen_normal
Creates: site_normal.urls, normal_cases.xml
. Test until "normal" input tests work properly in all forms.
gmake run_normal
Creates: normal_output.xml
- Probe the checking code of all input fields for SQL injection holes
. Generate test cases with SQL injection probes in individual fields.
Probe strings include form and field names that caused the hole.
. Successfully caught cases should produce "invalid input" warnings.
. Potential penetrations will log DBQuery errors with the form/field names.
- Plug all of the holes by adding or fixing input validation logic.
. Re-run probes to check.
. Re-do it periodically, as the system evolves.
# -*- mode: text; indent-tabs-mode: nil -*-
#
# EMULAB-COPYRIGHT
# Copyright (c) 2007 University of Utah and the Flux Group.
# All rights reserved.
#
sec-check/README-howto.txt - Details of running and incremental development.
See README-concepts.txt for a description of the design and methods employed.
See README-FIRST.txt for a top-level outline.
- General
. Directories
- Sec-check is run via configure/gmake in an Emulab obj-devel/www/sec-check
directory. Below, it is assumed you're cd'ed there.
- Control files come from testbed/www/sec-check in the user's source tree.
- Generated intermediate files and output are checked into source subdir
testbed/www/sec-check/results, so they can be compared over time with CVS
as things change.
- These variables, relative to obj-devel/www/sec-check, are used below:
set tws=../../../testbed/www/sec-check
set twsr=$tws/results
. Inner Emulab-in-Emulab experiment
You need an inner Elab swapped in to do almost everything below. This is
documented in https://www.emulab.net/tutorial/elabinelab.php3 .
There is a simple ElabInElab experiment ns file in $tws/vulnElab.ns .
Change EinE_proj and EinE_exp in the $tws/GNUmakefile.in to match the
Emulab project and experiment names you create.
- High-level targets
. all: src_forms spider forms_coverage input_coverage normal probe
Do everything after activation. Ya gotta activate first!
gmake all |& tee all.log
. msgs: src_msg site_msg forms_msg input_msg analyze probes_msg
Show all of the summaries.
gmake msgs
- I log a copy of the results to CVS once in a while:
gmake msgs | egrep -v gmake | tee $twsr/msgs.log
- Stages of operation (makefile targets)
----------------
. src_forms: src_list src_msg
Grep the sources for <form and make up a list of php form files.
gmake src_forms
gmake src_msg
- Output goes in $twsr/src_{forms.files}.list, bare filenames and raw
<form grep lines respectively.
----------------
. activate: activate.wget $(activate_tasks) analyze_activate
Sets up the newly swapped-in ElabInElab site in the makefile to create
"one of everything" (or sometimes two in different states), thus turning
on as many forms as we can for spidering.
- Don't forget to first log in to the inner Elab in a web browser and
change your password in Edit Profile in the inner Elab to match the
string given in the GNUmakefile.in as $(pswd).
The initial password is your outer Elab password, imported into the
inner Elab. There's no reason to put your real Elab password into the
makefile! Don't do it!
For me, the URL of the inner Elab apache httpd server on "myboss" is
https://myboss.vulnelab.testbed.emulab.net .
Check that it's workin