I setup the release of version 1.0 for bafprp the other day. It has been a fun project that I hope is useful to others like myself who needed an improvement over bafview without going to the big software companies. For today’s post I want to list some of the helpful tips I picked up while writing the program.
DSN-Less ODBC Connection String
I found it really annoying that in order to connect through ODBC you had to setup a data source. In windows this process involves going to the control panel, administrative tools, and data sources. In Linux, this involves setting up the ODBC driver, setting up unixODBC to recognize the driver, and finally setting up the server information and data source in both your driver and unixODBC.
I spent some time looking online and found a nice and easy way to connect without an external data source in windows. Basically it involves supplying all the information in one string like so:
std::string dsn = “DRIVER=sql server;DATABASE=” + _database + “;SERVER=” + _server + “;Uid=” + _user + “;Pwd=” + _password + “;”;
You must also use SQLDriverConnect instead of the usual SQLConnect function since the former will accept a dsn string, instead of just a dsn name and login info.
In linux its still a little annoying. You must still define your driver in unixODBC for starters but you do not need to setup a dsn or give server type information. Also, the support the DSN-less connections depends on the driver you choose. Since my primary use for bafprp was to connect to a ms sql database, I used FreeTDS which as of a recent update does have dsn-less capability. If you are using something else, please consult the documentation about this as well.
In FreeTDS the basic connection string goes like this
std::string dsn = “DRIVER=FreeTDS;SERVER=” + _server + “;Uid=” + _user + “;Pwd=” + _password + “;DATABASE=” + _database + “;TDS_VERSION=8.0;Port=1433;”;
TDS_VERSION and Port are FreeTDS specific settings, but the basic idea remains the same. Also it should be noted that different version of ms sql require a different TDS_VERSION. If you are using FreeTDS its important to know the correct version.
As a final side note about connection strings, make sure to include the terminating semi-colon. If you find your string unable to connect no matter what you do, this is probably the problem.
File Output
Sometimes when your program terminates unexpectedly, text being written to a file can be lost. If you are using fprintf or writef or any of the stdio functions for logging you will probably come across this problem. The only solution I found to guarantee that the text gets written is to use std::fstream and call the flush() method when you are done writing. Flush will make sure the text gets written before returning so it will be a bit slower, but for something as important as logging this is important.
Duplicate Removal
I remember reading about a similar situation in Programming Perls. It involved making a hash of your data and comparing collisions I believe. The situation I was in was that I had thousands of records that could be byte for byte duplicates with any other record in the original binary file. Like Programming Perls states, comparing each record against every other record is a joke. Hashing is definitely a better solution, but you need not create such a complex hash table for something like this. I ended up pulling a crc32 method to calculate the crc for the originals bytes in the record. After the record parsing was completed I sorted the array of records by their crc value. It was then a very easy procedure to remove any duplicates since they would be sitting right next to one another with the same crc.
One thing to note however, std::unique in algorithm.h seems like a wonderful function, but I could not get it to work for the life of me. It is supposed to sort the array, and place any duplicates at the end of the array, returning an iterator pointing at the start of the duplicates. Theoretically you can then use std::remove to remove all elements after that point to erase the duplicates. I managed to get the list sorted and std::unique did identify the correct number of duplicates ( the number of elements after the returned iterator matched the number of duplicates I later removed by said method ), but it did not seem to place the real duplicates at the end of the array. I ended up removing valid records that were unlucky enough to have a high crc and thus were at the end of the array.
So in the end I went through the entire list and removed neighbors with the same crc, which worked quite nicely.
Static Factories
I do not believe I have covered this concept here before so I will do a brief summary. This subject requires a much more detailed post but here is the cut and dry. If you are familiar with abstract factories you might know its a bit of a pain to add a new object. If you are using some kind of enumeration you need to add the id to that list, and add the correct new object code in the create method of your factory. Eventually you end up with enumerations of 100+ elements and a very scary switch statement. Fear not however, there is a better way!
Imagine a system where all you need to do to add a new object in your factory is compile a cpp. Thanks to static factories this is not just a dream. The trick involves a very natural side effect of static objects. The basic idea is simple. You have a main ‘maker’ class with a static registry variable that stores the names and pointers to other maker classes. When you want an object to be built through this factory you need to create a simple maker class for your object with a method called make, which is defined in the parent maker as pure virtual. The child maker defines an instance of itself as static and thus when the program starts it is created.
When the child gets created it calls the parent constructor with the name, or some other form of identification, of the object it creates. The parent maker then adds the information to its static database. When the programmer needs an instance of that object it simply calls the parent’s make method which looks at the database, pulls up the correct child maker and has the child make the object.
This technique is quite powerful if used correctly. It is absolutely necessary for data driven applications in my opinion, and very handy when working with any kind of file data. Using this method you can seperate file structure from logic in a very effective and pretty design.
Recent Comments