NGINX Lua Scraping Protection
If you have followed the guide to configuring NGINX with Lua, you can now use BotGuard for Application's scraping protection module.
Prerequisites
- NGINX that has been installed with the LuaJIT.
- A backend service protected by NGINX.
Getting Started
Your account manager can provide a software package which includes the following files:
README.md
- contains this information and a link to this documentationexample.block.conf.nginx
- an example NGINX config file demonstrating blocking/redirecting with NGINX and Luaexample.scraping.conf.nginx
- an example NGINX config file demonstrating scraping protection with NGINX and Lualua-plugins/injector.lua
- injects a script into the<head>
of an HTML documentlua-plugins/mitigation.lua
- requests anACTION
from the mitigation APIlua-plugins/mitigation/
- contains the plugin moduleslua-plugins/scraping_check.lua
- contains the code to check whether the scraping protection has passedlua-plugins/scraping_guard.lua
- contains the code to protect and endpointlua-plugins/xss.lua
- helper to protect redirects from attackslua-plugins/tests/
- contains the unit testslua-plugins/interstitial.html
- an example interstitial page
Considerations
Before integrating the scraping solution it is worth considering
- The scraping solution uses domain specific cookies. As a result any page that is on the same domain and protected by scraping will be protected by the cookie. If the page requires to load assets from a different domain, then that domain cannot also be protected by scraping as it will require its own cookie to serve the content. If content is loaded from a seperate domain, then either an IP protected content server or a subdomain is recommended.
- Due to the many configurations that NGINX offers, starting with the Docker Sandbox demo is recommended to configuring a setup locally that matches your production setup as closely as possible. Once you have a setup running locally, using this NGINX.conf and these docker containers in your cloud environment is recommended. Only when this is working as expected is it recommended to move to a different configuration or environment. Ask your account manager for the DEMO sandbox.
- Providing your accont manager with an accurate production environment topology of your infrastructure as well as your application level design will help them design a solution for you. As part of this demonstration applications that closely represent the real setup will allow them to forsee any issues that might occur.
Configuration
Server block configuration:
Configuration Required Type Default Example Description $custom_scraping_fields
false string NONE '{"field":"value"}' Used to configure custom fields to control scraping. $session_secret
false string NONE 'correcthorsebatterystaple' The key that will be used to encrypt the cookies. This will default to a random string, however if you have multiple NGINX instances you will want to set this to the same value on all instances. $scraping_interstitial_url
true string NONE '/interstitial' The route where to redirect a protected page to. You will need to create the location block to serve your custom interstitial page. $session_name
false string session scraping-session The session name stored in the browser for the cookie. This defaults to "session" so it is recommended to set this to something unique. $scraping_cookie_ttl
false number 15 10 Specifies how regularly the interstitial page should be shown to a user if they have navigated away from the site. It is recommended that this is set to 10-15 seconds. $scraping_referer_parameter
false string next referer On successfully passing the scraping checks, this is the query parameter that HUMAN will look for to find out where to redirect the user to. $session_cookie_lifetime
false number 3600 63072000 Length of time (in seconds) that the cookie will be valid for. This should be set to a high value (e.g. 2 years). $scraping_protection
false number nil 1 This field, when set to 1
is used to define pages that should be protected against scraping. When set to2
this field defines the interstitial page.$scraping_refresher
true string nil /nginx/scraping/refresh This field informs scraping protection of the URL to poll to renew the user's cookie. This is a required field. Protecting an endpoint.
To protect an endpoint from scraping, add the following to any NGINX
location
block that you wish to be protected:access_by_lua_file /etc/lua-plugins/scraping_guard.lua;
Set up a location endpoint that displays the interstitial page. This page will be shown while the protection is running checks. See the example
interstitial.html
file provided.- Please note that it's important to protect this endpoint from XSS attacks. BotGuard offers a protection against this that is straightforward to implement. Add the following to the location block for the interstitial page:
access_by_lua_file /etc/lua-plugins/xss.lua;
To prevent users from repeatedly seeing the interstitial page as they browse around the site add a session refresh endpoint that the BotGuard tag will use to check the status of the current user in the background:
location /refresh { set $detection_tag_mo "2"; content_by_lua_file /etc/lua-plugins/scraping_check.lua; }
- Note, its important that the
mo
value for this endpoint is 2, so that BotGuard is regularly requested and can update the status of the user. In this example we have explicitly set it for demonstration, although this is the default.
- Note, its important that the
Please note, it doesn't make sense to block/redirect and endpoint and to also apply scraping protection. Please choose one or the other depending on your needs.
Examples
All examples and conf files will need the following set:
set $mitigation_api_key "API_KEY"; # the api key provided to you by your account manager
set $mitigation_api_et "12"; # the event type (scraping protection)
set $detection_tag_ci "CUSTOMER_ID"; # your customer ID
set $detection_tag_dt "DETECTION_TAG_ID"; # your tag ID
set $detection_tag_si "SITE_ID"; # a site identifier, specified by the customer
#scraping management
set $session_secret "$SCRAPING_SESSION_SECRET";
set $scraping_interstitial_url "/interstitial";
set $scraping_refresher "/refresh";
set $session_name "x-reload-session"; #some descreet name for the scraping session
set $scraping_referer_parameter "next"; #customise the parameter that will be used as the query parameter (defaults to next)
set $session_cookie_lifetime 63072000;
set $scraping_cookie_ttl 15;
Catch All
The following is the most basic example. It will send all non-GET requests to the mitigation API and inject the script
tag on all responses that contain a </head>
and/or <body>
tag. This assumes that you have unzipped the release to
/etc/lua-plugins
.
worker_processes auto;
pcre_jit on;
events {
worker_connections 1024;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
include mime.types;
default_type application/octet-stream;
gzip on;
access_log /dev/stdout;
lua_package_path "/etc/lua-plugins/?.lua;;";
more_clear_headers Server;
server_tokens off;
server {
listen 3000;
server_name some.example.com localhost;
resolver 8.8.8.8;
client_header_buffer_size 8k;
large_client_header_buffers 8 64k;
error_log /dev/stdout debug;
# required variables
set $mitigation_api_key "API_KEY";
set $detection_tag_ci "CUSTOMER_ID";
set $detection_tag_dt "DETECTION_TAG_ID";
set $mitigation_api_et "12";
set $detection_tag_si "SITE_ID";
set $detection_tag_host "sub.example.com";
set $detection_tag_path "/ag/CUSTOMER_ID/clear.js";
set $detection_tag_spa "0";
set $detection_tag_mo "2";
#scraping management
set $session_secret "$SCRAPING_SESSION_SECRET";
set $scraping_interstitial_url "/interstitial";
set $scraping_refresher "/refresh";
set $session_name "x-reload-session"; #some descreet name for the scraping session
set $scraping_referer_parameter "next"; #customise the parameter that will be used as the query parameter (defaults to next)
set $session_cookie_lifetime 63072000;
set $scraping_cookie_ttl 15;
location ~* \.(?:ico|css|js|gif|jpe?g|png|woff2|woff|ttf)$ {
root /usr/share/nginx/html;
index index.html index.htm;
}
location ^~ /refresh {
set $detection_tag_mo "2";
content_by_lua_file /etc/lua-plugins/scraping_check.lua;
}
location ^~ /interstitial {
default_type text/html;
header_filter_by_lua_block {
ngx.header.content_length = nil;
}
set $detection_tag_spa "1";
set $scraping_protection 2;
body_filter_by_lua_file /etc/lua-plugins/injector.lua;
access_by_lua_file /etc/lua-plugins/xss.lua;
return 200 '<html><body><h1>Please wait while we check some things....</h1></body></html>';
}
location ^~ / {
default_type text/html;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $remote_addr;
header_filter_by_lua_block {
ngx.header.content_length = nil;
}
body_filter_by_lua_file /etc/lua-plugins/injector.lua;
lua_need_request_body on;
set $scraping_protection 1;
# protect all endpoints from scraping
access_by_lua_file /etc/lua-plugins/scraping_guard.lua;
proxy_pass http://localhost:$BACKEND_PORT;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
}
Route Management
The following example is very similar to the one above, however it defines some major differences which are listed below and commented within the example for easier reading.
- Variables that are shared between endpoints are part of the
server
, and notlocation
block. - An NGINX
location
block to handle signup attempts that will be redirected to if a signup is blocked by the mitigation API. - The
/signup
route is a vanilla HTML/CSS website and redirects a blocked user to a/catch
endpoint, however it informs the client that the redirect is a 200. This code is configured to deceive the client rather than inform them. - Different routes to define different configurations for
/login
vs/signup
. - The
/login
route is an SPA and defines a response code and body to respond with when a request is blocked. - The interstitial page is served at
/interstitial
. - All endpoints are protected by scraping.
worker_processes auto;
pcre_jit on;
events {
worker_connections 1024;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
include mime.types;
default_type application/octet-stream;
gzip on;
access_log /dev/stdout;
lua_package_path "/etc/lua-plugins/?.lua;;";
more_clear_headers Server;
server_tokens off;
server {
listen 3000;
server_name some.example.com localhost;
resolver 8.8.8.8;
underscores_in_headers on; #required for signal headers
# buffers for headers and body
client_header_buffer_size 512k;
large_client_header_buffers 8 512k;
client_max_body_size 100M;
proxy_busy_buffers_size 512k;
proxy_buffers 4 512k;
proxy_buffer_size 256k;
location ~* \.(?:ico|css|js|gif|jpe?g|png|woff2|woff|ttf)$ {
root /usr/share/nginx/html;
index index.html index.htm;
}
# 1. Variables that are shared between endpoints are part of the server, and not location block
set $mitigation_api_key "API_KEY";
set $mitigation_api_et "12";
set $detection_tag_ci "CUSTOMER_ID";
set $detection_tag_dt "DETECTION_TAG_ID";
set $detection_tag_host "sub.example.com";
set $detection_tag_path "/ag/CUSTOMER_ID/clear.js";
#scraping management
set $session_secret "$SCRAPING_SESSION_SECRET";
set $scraping_interstitial_url "/interstitial";
set $scraping_refresher "/refresh";
set $session_name "x-reload-session"; #some descreet name for the scraping session
set $scraping_referer_parameter "next"; #customise the parameter that will be used as the query parameter (defaults to next)
set $session_cookie_lifetime 63072000;
set $scraping_cookie_ttl 15;
location ^~ /refresh {
set $detection_tag_mo "2";
content_by_lua_file /etc/lua-plugins/scraping_check.lua;
}
location ^~ /interstitial {
default_type text/html;
header_filter_by_lua_block {
ngx.header.content_length = nil;
}
set $detection_tag_spa "1";
set $scraping_protection 2;
body_filter_by_lua_file /etc/lua-plugins/injector.lua;
access_by_lua_file /etc/lua-plugins/xss.lua;
return 200 '<html><body><h1>Please wait while we check some things....</h1></body></html>';
}
# 2. An NGINX location block to handle signup attempts that will be redirected to if a signup is blocked by the mitigation API
location ^~ /signup {
default_type text/html;
# required variables
set $detection_tag_spa "0";
set $detection_tag_mo "2";
set $detection_tag_si "SITE_ID";
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $remote_addr;
header_filter_by_lua_block {
ngx.header.content_length = nil;
}
body_filter_by_lua_file /etc/lua-plugins/injector.lua;
lua_need_request_body on;
set $scraping_protection 1;
# protect endpoint from scraping
access_by_lua_file /etc/lua-plugins/scraping_guard.lua;
proxy_pass http://localhost:$BACKEND_PORT;
}
# 4. Different routes to define different configuration for /login vs /signup
location ^~ /login {
default_type text/html;
# required variables
set $detection_tag_spa "1";
set $detection_tag_mo "2";
set $detection_tag_si "SITE_ID";
# 5. The /login route is an SPA and defines a response code and body to respond with when a request is blocked in the case of an SPA
set $block_spa_response_code "200";
set $block_spa_response_body '{"success":"you are now logged in"}';
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $remote_addr;
header_filter_by_lua_block {
ngx.header.content_length = nil;
}
body_filter_by_lua_file /etc/lua-plugins/injector.lua;
lua_need_request_body on;
set $scraping_protection 1;
# protect endpoint from scraping
access_by_lua_file /etc/lua-plugins/scraping_guard.lua;
proxy_pass http://localhost:$BACKEND_PORT;
}
location ^~ / {
header_filter_by_lua_block {
ngx.header.content_length = nil;
}
body_filter_by_lua_file /etc/lua-plugins/injector.lua;
set $scraping_protection 1;
# protect all endpoints from scraping
access_by_lua_file /etc/lua-plugins/scraping_guard.lua;
proxy_pass http://localhost:$BACKEND_PORT;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
}
Final Remarks
The interstitial page can be any HTML static page, or any other web serveable content that can run JavaScript. An example is supplied as part of the Lua zip package but this could be as simple as something like the following:
<html>
<head>
<link rel="stylesheet" href="/public/checking.css">
<style>
section {
text-align: center;
background-color: #CCBCBC;
margin: 0 auto;
width: 80%;
padding: 1.5em;
}
#message {
padding: 1em;
text-align: center;
}
.loader {
margin: 0 auto;
border: 16px solid #1C1D21;
border-radius: 50%;
border-top: 16px solid #F1E3E4;
width: 120px;
height: 120px;
-webkit-animation: spin 2s linear infinite; /* Safari */
animation: spin 2s linear infinite;
}
/* Safari */
@-webkit-keyframes spin {
0% { -webkit-transform: rotate(0deg); }
100% { -webkit-transform: rotate(360deg); }
}
@keyframes spin {
0% { transform: rotate(0deg); }
100% { transform: rotate(360deg); }
}
</style>
</head>
<body>
<section>
<div class="loader"></div>
<div id="message"></div>
</section>
</body>
</html>