This is a quick synopsis of the steps needed to initialize a GBrowse database from a genbank record. For the purposes of illustration, we will use the RefSeq record for M. bovis, accession NC_002945.
Download the Genbank record and convert it into GFF format. You can do this easily using the bp_genbank2gff.pl script, which is part of Bioperl (scripts/Bio-DB-GFF/genbank2gff.pl):
bp_genbank2gff.pl -stdout -accession NC_002945 > mbovis.gff
This will download the record for M. bovis (refseq NC_002945) and save it to the file mbovis.gff.
If you already have the genbank record available as a file named NC_002945.gb, you can convert it like this:
bp_genbank2gff.pl -stdout -file NC_002945.gb > mbovis.gff
The newly-converted file uses GFF3 format, which combines feature data with sequence/DNA data. This means that you do not need a separate FASTA file for the sequence.
Copy this file into your in-memory GFF databases directory, as described in the tutorial. We will assume /usr/local/apache/htdocs/gbrowse/databases.
mkdir /usr/local/apache/htdocs/gbrowse/databases/mbovis chmod o+rwx /usr/local/apache/htdocs/gbrowse/databases/mbovis cp mbovis.gff /usr/local/apache/htdocs/gbrowse/databases/mbovis
Use the configuration file 08.genbank.conf as your starting template. This is located in contrib/conf_files:
cp contrib/conf_files/08.genbank.conf /usr/local/apache/conf/gbrowse.conf/mb.conf
You will need to change the [GENERAL] section to use the in-memory adaptor and to point to the location of the M. bovis GFF file:
[GENERAL] description = Mycobacterium Bovis In-Memory db_adaptor = Bio::DB::GFF db_args = -adaptor memory -dir /usr/local/apache/htdocs/gbrowse/databases/mbovis
You might also want to change the ``examples'' tag to introduce the accession number for the whole genome, and a few choice gene names and search terms:
examples = NC_002945 Mb1800 galT glucose
That's all there is to it, but since this is a pretty big chunk of DNA (> 4 Mbp), it uses a considerable amount of memory and performance will be sluggish unless you have a fast machine with lots of memory. So you might wish to view it using a MySQL, PostgreSQL or Oracle database. The following are instructions for doing this.
We will assume that you are using a MySQL database.
Create the database using mysqladmin:
mysqladmin create mbovis
As described in the GBrowse tutorial, give yourself write permission for the database, and give the web server user (e.g. ``nobody'') select permission.
The bp_genbank2gff.pl script can download the accession, convert it into GFF and load the database directly in one smooth step:
bp_genbank2gff.pl -create -dsn mbovis -accession NC_002945
If you prefer, you can do this in two steps by first creating the gff file as described for the in-memory adaptor, and then using Bioperl's bp_bulk_load_gff.pl or bp_fast_load_gff.pl.
If you are using a PostgreSQL or Oracle database, you must specify the appropriate adaptor to bp_genbank2gff.pl:
bp_genbank2gff.pl -create -dsn mbovis -adaptor dbi::oracle -accession NC_002945
Use the configuration file 08.genbank.conf as your starting template. This is located in contrib/conf_files:
cp contrib/conf_files/08.genbank.conf /usr/local/apache/conf/gbrowse.conf/mb.conf
You will need to change the [GENERAL] section to use the appropriate database adaptor:
[GENERAL] description = Mycobacterium Bovis Database db_adaptor = Bio::DB::GFF db_args = -adaptor dbi::mysql -dsn dbi:mysql:database=mbovis;host=localhost -user nobody -passwd ""
You might also want to change the ``examples'' tag to introduce the accession number for the whole genome, and a few choice gene names and search terms:
examples = NC_002945 Mb1800 galT glucose
That should be it!
You can load as many accessions into the database as you like. Each one will appear as a ``chromosome'' named after the accession number of the entry.