Friday, October 9, 2009

Bash script for test data generation

Here is the premise of this blog entry. I needed to discover the database size of a meta data database with full index data in it for a project I am working on. I am trying to discover worst case scenario kind of numbers so I don't hit a wall unexpectedly at some point in time when someone pushes the boundaries of the system.

The first test that I had done was very simple, generate a single 5GB file containing random ascii. The reason I went this route was because I wanted to ensure that the index engine I am using wasn't able to reduce the data set. It does vector based indexing and common word elimination so the completely random data represented in the file wouldn't be able to be reduced by the vector index in anyway that would reduce the database size. This process is awesome for space saving and in the real world does an amazing job at what it does, but I needed to defeat it for my test in case someone else introduced data to my system that would push the limits.

My simple approach to creating this test file was to us this command. BTW I am using ubuntu 9.10 so before this command would work I needed to install binutils using "sudo apt-get install binutils"

cat /dev/urandom |strings > file.txt

I then opened a second terminal window to monitor the file size manually till it got to the size that I needed it to be. Simple enough...

I ran the data into the system, after the test looked at my results and thought to myself "This can't be right". I then realized where the failure was in my test. Quite simply every single file generated would also generate a row in the meta data database. At this point I only generated a single row of data in the database for a 5GB file. This is not very real world or worst case.

I then started working on the problem from a more controlled approach since I realized I was going to have to generate a good deal more data in smaller batches. The first step was to figure out how to generate an ascii document of a specific size since my earlier approach gave absolutely no control over the size of the file other than through manual intervention. Anyone who has worked with more than a couple files here understands this would be impossible to generate the data set in the number of files I was going to need.

So the next thing I came up with was to use dd to create a file of the proper size directing the output of /dev/urandom into a tmp file then using that tmp file to collect my data.

My script started to look like this, I needed to make the script sleep for a period of time because I was having issues getting enough data into the tmp file before I would try and collect it for the dd.

cat /dev/urandom | strings >tmp &
sleep .25
dd if=tmp of=testdata/n.txt bs=32k count=1
kill `ps -ef | grep /dev/urandom | nawk '{print $2}'`
rm tmp

This gave me a single file of 32k full of random ascii inside a directory labeled testdata that I had previously created to try and keep this mess organized for cleaning up the testing of this script. You will notice the .txt extension on my file name, that was for my specific testing purposes as I identify my document types using extensions at this point when deciding how to handle them, which is a completely different subject.

After getting this all working it was a simple matter of putting some controls in place to generate the number of files I needed in the size that I needed them and making the output of the script easier to deal with. I made it so I could use the command line to input my variables for file generation for those two variables. I ultimately ended up with a script that looked like this.
#/bin/sh
#The purpose of this script is to generate a set of files that are of a specific file size
#filled with random ascii data as to create a dataset for testing against
#deduplication and tsv indexing algorithms, specifically creating a worst case
#scenario as it pertains to the size of meta data databases.

#To use this script from the command line you can input the variables for
#the number of files you want generated and the size of those files in kb.

#Your command should look like "mkfiles.sh filecount filesize" or
#"mkfiles.sh 163840 32" This command would generate 163,840
#32k files or 5GB of data.

#If the files being created but are smaller than the intended file size you may have
#to increase the sleep timer because it is not generating enough data into the tmp
#file to properly fill the tmp file before dd attempts to extract data from the file.

i=0
fs=$2
bs=$fs'k'

mkdir testdata
until [ $i -eq $1 ]
do
cat /dev/urandom | strings > tmp &
sleep .25
echo "Making file " '#'$i "of size "$bs
dd if=tmp of=testdata/$i.txt bs=$bs count=1 1>/dev/null 2>/dev/null
kill `ps -ef | grep /dev/urandom | nawk '{print $2}'` 2> /dev/null
rm tmp
i=$(($i + 1))
done

I do owe a friend of mine some kudo's for helping with my clean up here as there was a bunch of noise being propagated all over the screen that made the output of the script look like everything was blowing up. Plus he helped me make this script more usable in general purpose, instead of just in my narrowly focused problem.


Friday, July 31, 2009

Avett Brothers - Four Thieves Gone

More often than not you will hear me ranting about the effect Minor Threat had on rock n roll, how Fugazi's opening track on "13 Songs" is one of the best opening tracks of all times. I will ramble endlessly on how Refused's record "The Shape of Punk to Come" along with At the Drive in's "Relationship of Command" shaped the majority of up and coming popular music through the better part of the new millennium.

I am also well aware of the fact that I am getting older and pretty closed minded when it comes down to new music, getting out of the hole I have dug for myself is often impossible. Its not a purposeful thing by any stretch of the imagination just tied to the fact that I am burnt out on music. I spent the better part of 15 years in bands, playing shows, employed as a sound engineer. I have seen more rock concerts in my 20's than most high school graduating classes will see collectively.

Which brings me to something new at least to me that has caught my ear recently. I must preface this by stating that I was exposed to the Avett Brothers about a year ago when I went to a show with someone to see them, I had never really been introduced to them before this. It was a really fun show and the bands energy was infectious. Sadly every time I heard them on the radio or someone would play me a track here or there it just didn't have any real impact for me. I recently realized I broke my own rule of determining if I liked a band. I didn't listen to the whole record, I didn't let the artist express themselves how they chose. I let someone give me a preview, I let a DJ attempt to direct my attention. The problem with all of these things is I don't like singles and one hit wonders are than a three minute and thirty second pieces of fun to remind us of much simpler times in our lives.

Which brings us to "Four Thieves Gone", an amazingly diverse record, well written, perfectly executed and one of the better arrangements I have heard in a long time. The opening track, "Talk on Indolence" is an infectious charming high energy song that commands you sing along. It has all of the trashy rough edges to it that let you know this is something real produced by real people.

Each song paints its own beautiful picture, but as you listen to the record you realize that each songs lends itself to the next song, paving the way for the next brilliant moment in sonic expression. The entry into "Pretty Girl from Feltre" is an abrupt change from the song before it, yet it doesn't feel awkward or contrived. The song paints an amazing soundscape somewhere between depressed and reminiscent.

One of the things I really enjoy about the Avett Brothers is that someone in this band had to have grown up listening to punk rock or real hardcore music. "Colorshow" takes the sing along parts and the gang vocals from "Talk on Indolence" and steps it up to the point where I feel like I am listening to a cross between Elton John, Bob Dylan, Tom Petty and Suicidal Tendencies. They fuse all of these different styles together without sounding like some bad mesh between multiple genres, I know we all remember the combination of Rap and Metal......

The record beginning to end is arranged to be listened to as a record, something lost in most contemporary music. Some of the highlight tracks for me are "Colorshow", "Pretty Girl from Feltre", "The Lowering" and "Gimmeakiss". When I try and listen to these songs away from the rest of the record they lose something for me, so I highly suggest spinning it from front to back before you make your judgement.

Tuesday, July 28, 2009

Pedestrians on Bike Paths

I was riding my bike down the Warner/Shepard bike path yesterday, leaving my home around 8PM figuring I could get a quick 12/15 miles in before dark to work off the great burger I just enjoyed at the Lowertown Bulldog.

As I was riding down the path and as always there are some pedestrians to watch out for, kind of a common thing along that path. Its never been terrible minus durning the RNC last year, tonight was no exception small slow down right as I got on the path and then home free.

It was a beautiful night for a ride, about the right temperature, about a 5/7 mile an hour wind that I was pushing with a little gusting here and there. I was pushing pretty hard keeping about a 18/19 mph average on the path which has never been an issue as there is for the most part two paths one for pedestrians and one for cyclists. These paths are clearly marked and you can always tell the people who have spent any amount of time on the path as they tend to stay on the correct path on the trail, unlike the Phalen path which is a free for all mess most of the time, a major reason I like this trail better than most.

As I was riding along the path I came upon a younger couple maybe early 20's walking along the path at a pace that could only be expressed as dragging their feet. They were walking on the wrong path when I came across them but got out of my way without issue and I kept on riding.

As I reached the end of the first leg of my ride I decided it would be nice to ride back with the wind on my back instead of getting off on the trail on the other side of the river, which at dusk can be a great deal darker than one is comfortable riding at higher speeds through and I would not have the wind to help me improve my speed. So I turned around and went straight back the way I came instead of turning and heading south across the bridge a little way back down the path.

As I was riding pushing even harder than when I was riding against the wind now keeping a 20+ mile per hour pace, I started coming up on the two people who were dragging themselves down the path again, both still walking in the bike path so I let out a good yell when I knew I was in ear shot and the couple turned back and took noticed of me. As I drew closer they both stepped off to the left hand of the path to let me by I briefly made eye contact with the female of the duo, then the unexpected happened. She raised her arms to her chest and she shoved her male counterpart directly in front of me.

I reacted as quickly as I could to the situation and made a quick course correction aiming to shoot the now growing gap between the couple. This could of been a workable situation but the woman who had already proven herself to be lacking in the ability to make a sound decisions made another poor decision. Feeling that I was now coming closer to her than made her comfortable she decide to step forward towards her stumbling male counterpart directly into my newly adjusted path of travel. Again I corrected my course as quickly as I could to attempt to go around the outside of her.

In the short amount of time that passed from her shoving what can only be described as a 250lb plus giant of a man into my path I managed to miss him, and I managed to miss this devil of a women in a bright red oversized hoodie with some silly lanyard slung around the back of her neck. Leaving only one option to clip the big rock I couldn't see in the grass. This caused me to lose control of my bike and wipe out. Luckily in all of this I had managed to grab onto both of my brake leavers removing some of the excess speed that I had been traveling but still didn't save me from taking a quick spill to the ground scuffing my bike on the pavement and rubbing a little grass and dirt onto my my clothes and myself.

I quickly picked myself off the ground and removed one of my ear buds from my ear to try and gather a bearing on the situation to discovered that the couple who all evening couldn't walk faster than a tortoise had suddenly discovered the ability to turn around and run while holding a middle finger in the air facing back at me and yelling obscenities.

The surprises I found after picking up my bike were the need for new bar tape, my pedal being scuffed up, my rear wheel being out of true, a big hole in the tire its self and of course a flat tube to go with it.

After getting myself sorted out, wiping the dirt off of my bike and myself I started walking down the path to the now almost completely set sun. I myself managed to come away without much of an issue myself, little bit of a sore knee and a bit of a pinch in my hip, yes I am getting old. I proceeded to walk the 4/5 miles the rest of the way home, this time my bike riding me. Made it home and after a couple Advil and pulling my bike apart I decided it would be a good time to go to sleep. No worse for the wear, knowing I now have another excuse to spend more money on my already expensive bike addiction.