Bioscanners Project Main - Archive2007 |
||
OboScanners for GeneOntology FilesSeptember 14, 2007 (dg)
Java-Memory ObservationsSeptember 13, 2007 (dg) A problem with Java based applications is the hugh memory amount required to run the scanner. Tested were diffeent settings of the maximum memory allocation pool using the commandline option -Xmx.
OboScannerSeptember 13, 2007 (dg) Comparing 64 and 32bit Scanners generated with the tply-lexer for pascal (free pascal) and with the jflex-Lexer for Java. Java programs require about 800Mb of memory whereas the pascal programs require just 1Mb of memory. However the Java programs where faster with the complete gene ontology obofile (about 250000 lines).
WC-ComparisonsSeptember 12, 2007 (dg) Again the same set of blastfiles was used for testing of a word counting scanner. Flex and re2c based scanners again were performing best.
BlastParsers vs BlastScannersJune 26, 2007 (dg) We recently compared our newly generated Blast scanners with currently available BLAST-scanners from the BioJava-project (1), the BioPerl-project [2] and with the Zerg-BLAST parser [3]. Those parsers were compared with our scanners created either with C-based scanner generators like Re2c [4] and Flex [5] or with the Java based scanner generator Jflex [6]. Wheras the parsers mentioned above requires source code editing for parsing and analysing blast files our scanners are emitting SQL-code. Analyzing of blast results can afterwards done with a high level language (SQL). Please note that the BioJava scanner does not work with actual BLAST-versions. File sizes for the blast files has been about 1 (small), 14 (medium) and 140 (large) Mb
Comparison of several scanner generators for a simple BLAST scannerApril 10, 2007 (dg) Sample: BlastFile with 1 to 10.000 result items
The Re2c based scanner is the fastest, but the setup and the coding is more complicated than for the other scanners. Flex-based scanners are 2-3 times slower than Re2c based scanners, regardless if there is an embedded Tcl-interpreter for better string handling (flex-tcl), Jflex code (java), executed with the Sun-Java Hotspot virtual machine (1.5) as well as to machine code compiled Jflex code (java-gcj) and Plex (sbs-plex = Pascal lex) based scanners are about 5 and 10 times slower than Re2c based scanners. Interpreted Java-Code either executed with the Sun-interpreter (java-ip = “java -Xint”) or with the gnu-interpreter (java-gij) is about 50 times slower than Re2c-Code. The Tcl based scanner is about 1000 times slower than the Re2c based. The per scanner is a line based scanner thereof not able to do complicated scanning with more than two states or patterns on the same line. Initial Setup of the Bioscanners and Bioparsers WebpageMarch 13, 2007 (dg) Project AimWrite parsers for biological data based on scanner generators like Flex (C), Re2c(C), Jflex (Java) and Ifickle (Tcl). These scanner generators are providing easier maintainance, development and higher speed than hand written scanners. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Edit -
History -
Print -
Recent Changes -
Search
Page last modified on May 26, 2009, at 12:07 PM Using Modified Blue Zinfandel Wordpress Theme created by Brian Adjusted for by Dr. Detlef Groth www.dgroth.de |