GenBank, maintained by the National Center for Biotechnology Information, is the world-wide genetic sequence repository.
While this is a huge undertaking, the structure of the database (flat file format) has been flawed from the start. While it would be a mammoth task to record all the flaws I have found, I will put a few down here.
LOCUS Hs17_109281308691 bp DNA PRI 07-FEB-2002 DEFINITION Homo sapiens chromosome 17 working draft sequence segment. ACCESSION NT_010771 VERSION NT_010771.8 GI:18587249The length has fused with the LOCUS name. Rule: Between every piece of data always put a space. This way, if there is an overflow, it will be buffered. This one cost me about an hour of my time to track down. Other cases of Fused numbers:
NT_010799:LOCUS Hs17_109563280135 bp DNA PRI 07-FEB-2002 NT_025892:LOCUS Hs14_2604824020381 bp DNA PRI 07-FEB-2002
Schneider Lab
origin: 2002 Feb 10
updated: 2002 Feb 10