I had previously read about JCrawler from a link on Erik's Linkblog , and thought id give that a go!
- Download jcrawler from http://jcrawler.sourceforge.net/
- Download eclipse 3.1M4 from ftp://ftp.mirrorservice.org/sites/download.eclipse.org/
- Started up eclipse, and created a new java project called jcrawler, using src as the osurce folder
- Unzipped jcrawler and copied the contents to the jcrawler project
- Tried to run the build.xml in the jcrawler project, but got the cannot find compiler message
- Added tools.jar to the eclipse ant classpath - Window,Preferences,Ant,Runtime,Ant Home Entries and add lib/tools.jar from your jdk directory. This should now prevent the Eclipse/Ant "cannot find javac" error.
- right clicked and properties on the jcrawler project and added the jars from the dist/lib folder to the project build path
- Modified crawlerConfig.xml in conf folder - Added the url to start crawling from as our test website url, changed the url-patterns permissions to true, and changed the url pattern to
^ourtestwebsite.co.uk$, so that it would only crawl urls on our website and not follow any external links - Added the launcher buttons to the toolbar - right click toolbar, customize perpective, commands, and tick launcher
- selected the jcrawler.jar, select Run... on the run dropdown, java application, new, add com.jcrawler.Main as the main class
- Run
jcrawler very quickly caused the test website to lock up as required. Brilliant.
Now I can try and reproduce this in the dev environment on my local PC and see if I can pinpoint the problem!
No comments:
Post a Comment