Prowser is a "programmatic browser" library for Java.
The Prowser library enables applications to easily surf the web via an API that simulates the way a human user would operate a tabbed GUI browser.
Prowser is intended for use in applications that need to automate the retrieval of web content (including form-driven pages). It's main goal is to provide a simpler, more intuitive alternative to complex web client libraries like Jakarta's HttpClient.
Applications that use the Prowser library have the ability to retrieve web pages -- including encrypted https requests -- by simply specifying URLs and (when necessary) form data. In lieu of actually being able to see the resulting pages, their contents (text or binary) are easily obtained via simple method calls. Also, all cookie processing and all normal server-side redirections are handled automatically by the Prowser library. Prowser even performs HTTP Basic and Digest Authentication seamlessly.
Prowser's API is designed to mirror the actions that a human user takes when operating a tabbed GUI browser. As such, the library provides methods that perform common browser operations like going back a page, going forward a page, refreshing the current page, returning to the home page, stopping the current page request, etc. In addition, Prowser provides several configurable browser parameters. (See the features list below.)
Prowser's features include:GET
and POST
methods.GET
]
or request body [POST
]).Prowser is designed to give applications a facility for automated web surfing via an API that simulates the way a human user would operate a tabbed GUI browser. That is, the library's classes and methods are designed to mirror the concepts, interfaces, and actions associated with the use of mainstream web browsers.
To better understand how Prowser mimics a browser, examine the following tables that describe some of the library's classes and methods:
Classes of the Prowser Library | |
---|---|
Prowser |
Serves as the creator and owner of one or more Tab
instances whose job it is to perform web page requests. You can think of a
Prowser object as a running instance of a web browser
application, while the Tab objects that it owns simulate the
tabbed windows in which pages are actually retrieved. |
Tab |
Represents a "tab window" that belongs to a particular
Prowser instance; Tab objects actually do the work of
executing web page requests and processing the responses. Multiple
Tab instances can be owned by a single Prowser in the
same way that a tabbed GUI web browser can have more than one tab window open
at any one time.
|
Request |
A collection of properties that specify a web page request to be made
by a Tab instance. When a Request is executed
by a Tab , a Response object is generated. |
Response |
The data resulting from a web page request made by a Tab
instance. A Response object is generated when a
Tab executes a Request . |
Selected Methods of the Prowser Library | |
---|---|
new Prowser() |
Simulates the starting of a web browser. |
createTab() |
Simulates the browser action, File -> New Tab. |
go("http://sun.com/") |
Simulates entering a URL into a browser's address bar and clicking the Go button (or hitting the Enter key). |
getPageSource() |
Simulates the View -> Page Source action of a browser. (This method is used in lieu of actually being able to "see" the retrieved page.) |
getPageBytes() |
Simulates the downloading of a binary file (e.g., PDF, JPG, EXE, MP3, etc.). |
refresh() |
Simulates clicking a browser's Refresh button. |
stop(10000) |
Simulates clicking a browser's Stop button after 10 seconds (if the page has not finished loading yet). |
goBack() |
Simulates clicking a browser's Back button to retrieve the previous page. |
Use of the Prowser library requires Java 5.0 (aka, 1.5.0) or higher.
Installation of the Prowser library (for compiling or running) simply
involves placing the prowser-x.x.x.jar
file in your classpath,
where x.x.x
is the version number of the Prowser
distribution.
For more information, see the distribution's README file.
An object of the Prowser
class is used to create and maintain
one or more Tab
objects. A Tab
object acts
upon a Request
object in order to retrieve a web page. The request
results in a Response
object that contains the page contents (and
other information related to the transaction).
For example, to print out the page source for
http://java.net/
, you could do this:
Prowser prowser = new Prowser(); Tab tab = prowser.createTab(); Request request = new Request("http://java.net/"); Response response = tab.go(request); String html = response.getPageSource(); System.out.println(html);
Or, you could forego most of the intermediate variables, and reduce the above code to this:
Tab tab = new Prowser().createTab(); System.out.println(tab.go("http://java.net/").getPageSource());
Note: If you later need the
Prowser
object instantiated in the above code snippet (for example, to create moreTab
s that share the same cookie set), you could retrieve it with the following statement:Prowser prowser = tab.getProwser();
To download a file, you use the Response.getPageBytes()
method instead of
Response.getPageSource()
. For example (assuming that a Tab
object named tab
already exists) you could obtain a PDF file
with the following code:
Request request = new Request("http://java.sun.com/xml/webservices.pdf"); byte[] fileContents = tab.go(request).getPageBytes();
To submit a web form and retrieve the resulting page (e.g., logging into
your bank account), you can use the pertinent methods to configure a
Request
object with the proper URL, HTTP method (e.g., POST
)
and form field values. However, you also have the option of specifying the
request's configuration in a request file, which can be set up once
and used repeatedly without having to configure the request
programmatically. See this Request constructor
for more information.
To request a page that requires Basic or Digest Authentication, simply include the username and password as part of the URI, like this:
http://username:password@www.site-requiring-auth.com/