I had previously read about JCrawler from a link on Erik's Linkblog , and thought id give that a go!
- Download jcrawler from http://jcrawler.sourceforge.net/
- Download eclipse 3.1M4 from ftp://ftp.mirrorservice.org/sites/download.eclipse.org/
- Started up eclipse, and created a new java project called jcrawler, using src as the osurce folder
- Unzipped jcrawler and copied the contents to the jcrawler project
- Tried to run the build.xml in the jcrawler project, but got the cannot find compiler message
- Added tools.jar to the eclipse ant classpath - Window,Preferences,Ant,Runtime,Ant Home Entries and add lib/tools.jar from your jdk directory. This should now prevent the Eclipse/Ant "cannot find javac" error.
- right clicked and properties on the jcrawler project and added the jars from the dist/lib folder to the project build path
- Modified crawlerConfig.xml in conf folder - Added the url to start crawling from as our test website url, changed the url-patterns permissions to true, and changed the url pattern to
^ourtestwebsite.co.uk$, so that it would only crawl urls on our website and not follow any external links
- Added the launcher buttons to the toolbar - right click toolbar, customize perpective, commands, and tick launcher
- selected the jcrawler.jar, select Run... on the run dropdown, java application, new, add com.jcrawler.Main as the main class
jcrawler very quickly caused the test website to lock up as required. Brilliant.
Now I can try and reproduce this in the dev environment on my local PC and see if I can pinpoint the problem!