Pages

Friday, 4 August 2017

PhantomJS and Jsoup with Spring Boot



You'll need to have PhantomJS installed locally and on the PATH, you can accomplish this by following my Install Instructions for Mac.

Gradle Dependencies

In your Spring Boot project add the following Gradle dependencies to the build.gradle file.

repositories {
  mavenCentral()
  maven { url 'https://jitpack.io' }
  ...
}

dependencies {
  ...
  compile('org.jsoup:jsoup:1.8.3')
  compile('com.github.jarlakxen:embedphantomjs:3.0')
  compile('com.github.detro:ghostdriver:2.1.0')
  ...
}


Code

The following code will return the page. The key line is the Jsoup.parseBodyFragment(sourceHtml).

// Imports
import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.phantomjs.PhantomJSDriver;
import org.openqa.selenium.remote.DesiredCapabilities;

// Inside the code somewhere...
DesiredCapabilities caps = new DesiredCapabilities();
caps.setJavascriptEnabled(true);
WebDriver driver = new PhantomJSDriver(caps);
driver.get(urlString);
// Incase you need to debug the sourceHtml.
String sourceHtml = driver.getPageSource();
// Use Jsoup to parse the HTML.
Element document = Jsoup.parseBodyFragment(sourceHtml);
driver.close();

Logging Message

When the code is triggered, the following log message will appear in the Spring Boor Console.
The "executable:" line will show the path to where phantomjs is installed.

INFO 1929 --- [nio-8080-exec-1] o.o.s.phantomjs.PhantomJSDriverService   : executable: /<Local Install Path>/phantomjs-2.1.1-macosx/bin/phantomjs
INFO 1929 --- [nio-8080-exec-1] o.o.s.phantomjs.PhantomJSDriverService   : port: 1112
INFO 1929 --- [nio-8080-exec-1] o.o.s.phantomjs.PhantomJSDriverService   : arguments: [--webdriver=1112, --webdriver-logfile=/<Working Directory>/phantomjsdriver.log]
INFO 1929 --- [nio-8080-exec-1] o.o.s.phantomjs.PhantomJSDriverService   : environment: {}
[INFO] GhostDriver - Main - running on port 1112
[INFO] Session [90a9fb50-7179-11e7-bf8f-09865228037e] - page.settings - {"XSSAuditingEnabled":false,"javascriptCanCloseWindows":true,"javascriptCanOpenWindows":true,"javascriptEnabled":true,"loadImages":true,"localToRemoteUrlAccessEnabled":false,"userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1","webSecurityEnabled":true}
[INFO] Session [90a9fb50-7179-11e7-bf8f-09865228037e] - page.customHeaders:  - {}
[INFO] Session [90a9fb50-7179-11e7-bf8f-09865228037e] - Session.negotiatedCapabilities - {"browserName":"phantomjs","version":"2.1.1","driverName":"ghostdriver","driverVersion":"1.2.0","platform":"mac-unknown-64bit","javascriptEnabled":true,"takesScreenshot":true,"handlesAlerts":false,"databaseEnabled":false,"locationContextEnabled":false,"applicationCacheEnabled":false,"browserConnectionEnabled":false,"cssSelectorsEnabled":true,"webStorageEnabled":false,"rotatable":false,"acceptSslCerts":false,"nativeEvents":true,"proxy":{"proxyType":"direct"}}
[INFO] SessionManagerReqHand - _postNewSessionCommand - New Session Created: 90a9fb50-7179-11e7-bf8f-09865228037e
INFO 1929 --- [ null to remote] o.o.selenium.remote.ProtocolHandshake    : Detected dialect: OSS

No comments:

Post a Comment

Note: only a member of this blog may post a comment.