All posts by admin

Thoughts on Net Neutrality

As I’ve said in here, this is not the “long version”.  Nor is it the “short version”.  I could write a book on the subject, but it would bore you to tears.  Anyway, read on if you’re interested in this subject.

I’m about as much of an expert on this subject as anybody. Let me say this: net neutrality is a good idea. I don’t know anybody else with as much knowledge and experience on the subject as I have who feels differently.

The implementation (giving the FCC more power) is subpar, but probably better than the status quo. This is a far more complicated issue than either side is making it and it cannot be reduced to a simple soundbite. I get left-wing emails telling me about “rich corporations having an internet fast lane with the rest of us being in the slow lane” – they literally don’t even know what they’re talking about. At the same time the right doesn’t either – I see “government take over of the internet” all the time. Newsflash: the government already has more control than you can imagine. This doesn’t change too much.

Let me explain this as simply as I can. I use Comcast for internet service and pay $50/month. I also have Netflix for $9/month. Comcast went to Netflix and said “nice little internet service you have there, hows about paying us to ‘protect’ it for you, make sure your customers who use our service don’t have any slow downs or anything?” They then lowered the bandwidth available to Netflix – something they can trivially do in software – causing Netflix customers to have hiccups when watching movies. Netflix quickly caved and payed Comcast extra money and their service suddenly got better.

Here’s the problem: Netflix doesn’t print money. Every dollar they get is from a customer like me. That means that when they pay Comcast it’s really me who’s paying Comcast. But – I’m *already* paying the bastards. So now I have to pay them twice. Or three or four or five times – after all Netflix is just one company.

Why does Comcast do this? Because they’re not a disinterested third party here. They have their own video streaming service that they would prefer for me to use. They’re in an interesting position in that their competition relies on them to deliver their product.

Comcast isn’t the only one who did this, by the way. Verizon did, too. That we know of.

The basic concept of net neutrality is to say to Comcast “you cannot treat Netflix traffic any differently than you treat any other traffic – including traffic coming from your own competing services”. This applies to phone, video, etc.

Where net neutrality falls short is in the Verizon case. Verizon famously had a dispute with Level 3 over this, the details of which you can find online. The short version is that Netflix traffic was coming through Level 3 to get to Verizon’s network and customers. Verizon’s peering point – the point at which the two networks are connected – was congested badly to the point of dropping traffic due to Netflix large amount of traffic. The cost to upgrade the equipment was on the order of a few thousand dollars – pocket change in this world. The network interface had unused capacity that could have been trivially exploited had Verizon cared. But they instead left it congested and then blamed Level 3. This particular problem is as big as the other issue but not easily solved by government or free markets. There’s no good solution.

Anyway, that’s not even the long answer to all this. The takeaway is that we want to be in a world where Comcast, Verizon, AT&T, Time Warner, and other ISPs don’t get into a shakedown racket with Netflix and other internet companies in order to make sure their traffic isn’t limited. Unfortunately the market wasn’t able to properly handle this situation (Comcast is the only high-speed internet available for me, for instance) so the government stepped in. That’s not good, but it likely beats the alternative.

As for Forbes being “pro-business” – that’s good. Business is where we all get our jobs ultimately. Net neutrality if extremely pro-business.

The Forbes article is also confused. They say ” the company that links your computer/tablet/smartphone to the internet should not be able to discriminate among users and providers in the level of connectivity service provided. That is, we should all be able to send and receive the same number of bits of data per second.” Wrong. ISPs like Comcast offer different service levels at different price points. This is unrelated to net neutrality. Literally – totally different ideas.

To go on: “He may think it is not, but it completely blocks certain business models and stops any possible innovation that might emerge if given the option of seeking differential access to bandwidth.” Wrong. The only business model that it blocks is the classic protection racket. It’s telling that although the internet has been around for over 20 years in its present form (accessible to everybody and having the “world wide web” as a foundation for most people’s usage) Forbes can’t name even a single business or business model that would benefit from being able to buy higher bandwidth from an ISP on the back end (again, this is unrelated to your bandwidth at your connection point).

“If an ISP blocks Netflix because of the bandwidth it requires, consumers who want Netflix will take their business elsewhere.” Yeah, if only. I can either get Comcast, Comcast, or Comcast for my internet service. Oh, AT&T offers some laughably slow internet connection here, too, but the top end of the bandwidth that they offer isn’t enough for the work that my wife and I do. We cannot take our business elsewhere without moving, and it’s unlikely that anybody is going to run the fiber here to get a faster connection for me. If it was cost effective (on the order of a couple thousand dollars) I’d do it myself.

“The fact that most people cannot afford some of those models does not mean they should be removed from sale. Similarly, the fact that some businesses or consumers may choose to pay for better access to the Internet is not a bad thing. Some people pay more to fly first class, but they do not interfere with my travel in coach.” Again, the author literally doesn’t understand the issue. It’s not about the service Comcast offers to me – the customer. It’s about them shaking down Netflix *who isn’t their customer*.

Let me draw an analogy that actually works. Remember when AT&T was broken up? Suddenly there were other long-distance providers, such as MCI and Worldcomm. Now, I used Ameritech in the early 90s with MCI for long distance. I was a customer of both Ameritech – who provided my local phone service – and MCI – who provided long distance. That means that my calls went through Ameritech to get to MCI. But Ameritech also sold long distance service. They competed with MCI.

Now, imagine that you worked at Ameritech and you wanted to get more long distance customers. After all, it’s a very lucrative market. What to do? How about this – go find the physical wires that connect to MCI and alter them so that the phone calls going over to MCI sound like crap. Then call MCI and say “Hey, wow, your phone calls sound like crap. You know, for a little payment I bet we could find out where the problem is and make those calls sound good again.”

Do you think Ameritech should have been able to do that to MCI? If not, you’re actually supporting net neutrality.

Elevators and Other Musings

I had an interesting day yesterday.  In between all the regular work I  read three different things that brought me to the same conclusion from three different routes.

First, there was an article on Slashdot titled “Engineers Develop ‘Ultrarope’ For World’s Highest Elevator” (here).  The gist of the article was that steel cables could only scale up so far in elevator usage and so as taller buildings were built newer materials for the cables had to be developed.  That led to people reasonably questioning whether the proper course of action is to continue attempting to scale the 100 year old architecture of elevators or to start from scratch with a new method of traction for elevators.  One of the highly rated comments derided the “armchair engineers” in the comment section and tried to make the case that obviously if there were a better way to design elevator systems the engineers who’ve worked on them for so long would have come up with it by now.

I later read something about MakerBot – probably the first popular 3D printer on the market and some of the continued development that they’re doing.

And last night I was reading in “Choose Yourself!” by James Altucher, specifically the chapter titled “It doesn’t take a lot to make $1 Billion”.  The chapter is specifically about Sara Blakely of Spanx fame.  The story is easy to find but the one line summary is that she saw a need and an easy way to fill that need, did the homework and built a company with no debt and no outside investment that’s made her a billionaire in just a few years.  She sold copiers before that and had nothing to do with the hosiery industry.

What I see around me repeatedly is that real innovation often – if not usually – if not always – comes from outside the inner circle of experts in a given field.  I can give so many examples that you would be reading for days (or just giving up at some point) but let’s consider Spanx.  It looks to me like Spanx is a cross between panty hose and a corset. Both have been around a lot longer than me, so why didn’t Hanes or whoever come up with this idea?

How about vacuums?  I have two Neatos that clean my house a couple of times each week.  Before them were the Roombas.  Before Roombas were maids.  Or people who cleaned their own homes.  Why didn’t Hoover come up with a robot vacuum?  I’ll answer that – it’s because Hoover makes a certain kind of vacuum cleaner and they do it well.  And they continually improve their designs.  But it’s rare that a company makes the kind of leap to not only improve on the current design but to say “is there another way to do this?”

Elon Musk is another example.  Let’s look at his innovations.  First, electric cars.  Impressively, the mainline auto industry has been turning out a few models here and there.  But they don’t “get it” in the way that Musk gets it.  He’s building a battery factory because that’s the only way to get enough batteries to supply all the cars he wants to build.  And, let’s face it, the cars he’s building are just more exciting than a Nissan Leaf.  The broader point is, though, that if the companies that make cars based on petrochemical burning engines were thinking about electric cars as seriously as Musk is one of them would have already started building a battery factory.

He’s also into rocketry with Space-X.  Not just rocketry – they’re working on making rockets that take off and land vertically.  That’s a new kind of crazy.  Right now they’re working on making a rocket land on a floating, moving platform in the ocean.  Their first test failed – and they actually expected that – but they are using that failure to learn and improve.  How many companies do you know that actually don’t mind anticipating failure?

Musk is also making a bunch of noise about the “hyperloop”, a mode of transportation that involves a vacuum tube to move stuff (and people perhaps) really fast.

In any of these industries, Musk is an “armchair engineer”.  He didn’t work at Ford, GM, Honda, or any other big car maker before he started his own company.  He didn’t work at Boeing or Lockheed.  He doesn’t own a railroad.

What I’m getting at here isn’t that this is an anomaly – rather, the reason he succeeds in these industries is because he’s not burdened by the weight of knowing how things are supposed to be done, meaning he’s free to write a new book on how to do things now.

Even looking at Roomba and Neato is instructive.  Roombas were cool when they came out in 2002.  They move until they hit something, at which point they change direction and move somewhere else.  They have sensors so that they don’t go down the stairs and include “barriers” so you can easily keep them in one area to clean.

The problem is that the Roomba uses a method kindly called “statistical cleaning”.  What that means is that it moves randomly but the algorithms for “random” are tweaked to make it likely to cover every square inch of a room given time.  It also has a dirt sensor so it knows if it needs to do extra work in a certain area.  I saw one review that said when the Roomba cleans your house it looks like you’ve had a crazy maid vacuum it with tracks going in all directions.

iRobot, the makers of Roomba, have been working on their algorithms since the start and have made them better.  And better.  At some point competitors came along, particularly Mint, which used a “beacon” and actually tried to map rooms out and actually clean them.  Mint was for solid floors, though, and did well for what it did.  iRobot bought them to gain access to the technology they were using.

Then along came Neato in 2010.  The Neato platform is a huge step up from Roomba because the robots don’t randomly move around and don’t require an external “beacon” or any other hardware to guide them.  Instead, they built a really cheap laser rangefinder on a rotating platform into the top of the vacuum.  In simple terms the Neato maps out the room and methodically cleans it.  It maps as it cleans and keeps track of what still needs to be cleaned.  When it’s finished it simply returns to its base and lets you know that its dirt bin needs emptied.  There’s no better comparison of the two methods that I could find than this video.

What’s next in robot vacuums?  I don’t know, but I do know this: it’s unlikely that either iRobot or Neato Robotics will be the company to go there.  They will most likely continue to improve their respective platforms for years to come, though.

All this to say that I believe the architecture of the elevators in the extremely tall buildings being proposed and built need to be envisioned by someone with no baggage and no preconceived notions of how an elevator is supposed to be built or how it’s supposed to work.  I don’t think stronger  and lighter cables are the answer simply because I don’t think cables are the answer, at least not for traction.  Surely there’s a way to make an elevator move up and down quickly in such a way that they don’t need cables at all – rack and pinion, linear motors – I don’t know but I know something is possible.

Getting rid of the cables also makes something else really strange possible – more than one car in a single shaft.  Yeah, weird.  But rethink the elevator.  Imagine that instead of the two-button interface I had the elevator app on my phone that I could use to tell the elevator system what floor I was on and to where I wanted to go.  Then, it could let me know which elevator I needed to get on and it could take care of scheduling the cars to get everybody where they needed to go in the most efficient manner.  (The CEO might also want to give himself priority.)

Anyway, I think the elevator industry is in dire need of a shake up, and I think it’ll only come from outside.

Customers

We work with the following music catalogs and libraries (in no particular order):

Crucial Music http://www.crucialmusic.com/

Black Toast Music http://blacktoastmusic.com/

S3 Music + Sound http://www.s3mx.com/

Downtown Music Services and dms.FM http://downtownmusicservices.com/ and http://dms.fm/

Anacrusis http://anacrusissongs.com/

Hella Good Records http://www.hellagoodrecords.com/

Zero Point Productions http://www.zeropointproductions.com/

We’re also behind other various web sites, such as http://www.whohasesp.com/.

 

Russian doll caching for collections in Rails 4

A lot has been written about Russian Doll Caching in Rails 4 but surprisingly little about caching of collections.   One of the basic tenets is that you should use “touch: true” on your “belongs_to” relationships in order to update the “updated_at” time on the parent record and thus invalidate its cached copies.

An issue then arises when you have a relationship that isn’t a strict parent/child relationship.  As an example, let’s imagine this particular relationship structure:


 class Item < ActiveRecord::Base
   has_many :project_items, inverse_of: :item
 end
class Project < ActiveRecord::Base
  has_many :project_items, -> { order("project_items.position") }, inverse_of: :project
  has_many :items, through: :project_items
end
class ProjectItem < ActiveRecord::Base
  belongs_to :project, inverse_of: :project_items, touch: true
  belongs_to :item, inverse_of: :project_items
end

Items and Projects are then in a many-to-many relationship with a join table providing support for ordering of items within a project. Now, let’s imagine a simple view:

 <% cache @project do -%>
   <h1><%= @project.title %></h1>
   <% @project.project_items.each do |project_item| -%>
     <% cache project_item %>
       <p><%= project_item.item.description %></p>
     <% end -%>
   <% end -%>
 <% end -%>

The problem here is that if an item is updated it won’t change the updated_at date on the project_items that are associated with it, nor do we want it to.   But we need to invalidate both the main Project cache and the individual Item cache.  Simply changing the cache key on the second item to “project_item.item” doesn’t fix this as the Project still won’t be updated.

This is then a two-fold problem:

  1. What is the proper cache key for the outer @project cache?
  2. What is the proper cache key for the inner item cache?

It’s tempting on the @project to do something like this:

<% cache [@project,@project.items] %>

And that mostly works.  The problem is that the cache key will simply grow as items are added.  What might not be a problem with two or three items gets out of whack with 50 or 100.  I’ve seen cache keys that are a block of 10+ lines at 80 characters wide.  That’s inefficient.

We can come up with something simpler:

<% cache [@project,@project.items.maximum(:updated_at)] %>

That works.  If someone removes a ProjectItem, the project should be touched, and if a track is updated then the max updated_at should be changed.  I like this even better:

<% cache [@project,@project.items.count,@project.items.maximum(:updated_at)] %>

Now, the question is how best to do this.  That’s a little ugly.

I can actually change this in the relationship:

class Project < ActiveRecord::Base
  has_many :project_items, -> { order("project_items.position") }, inverse_of: :project
  has_many :items, through: :project_items do
    def cache_key
      [count(:updated_at),maximum(:updated_at)].map(&:to_i).join('-')
    end
  end
end

Now, our cache line is a little simpler:

<% cache [@project,@project.items] do %>

With that, it’ll get @project.items.cache_key and the cache will be invalidated if any item is updated.  The bonus is that the cache key is made up of only a few items and is much more manageable.  It’s also much more readable to humans, both in code and in the cache itself.

The inner cache is then simply:

<% cache [project_item,project_item.item] %>

That way any update to either the project_item or the item will invalidate the cache.  I found a gem that should add the cache_key for associations but it seems to not work with Rails 4.  It would be useful for someone to update it as this functionality is even better when the code doesn’t have to be specified each time.

The argument for being non-DRY is that some cache key schemes might be lighter-weight and work for some places.  The example here is another table that contains viewing logs for projects.  The table is basically write-only, never updated.  So I can just look at the record count or maximum id on the joined table to determine the cache key.  Etc.

Unicursal Mazes

I’ve been working on my javascript maze generator again, you can grab it here off github:

https://github.com/mdchaney/jsmaze

I added a couple of new algorithms to it for generating mazes: Prim’s algorithm and the “bacterial” algorithm.  Neither one is good for generating mazes, honestly, and my drunk-walk algorithm remains the best way.  Still looking at a possibility of adding another algorithm similar to the recursive backtracker that allows one to determine from where the “backtracking” will continue.

I became interested a few weeks ago in the concept of the unicursal maze, also known as a labyrinth.  This isn’t really a maze; it’s a space-filling curve that visits all areas of the space exactly one time with no branches.

Screen shot 2014-03-05 at 8.55.54 AM

Interestingly, I cannot find anything online about generating them except one.  That gentleman’s approach to generating them is to first generate a standard maze using any algorithm, then close the exit, and finally solve the maze leaving the “solution” in place as a set of walls.  That creates a unicursal maze, but a very specific kind where the entrance and exit are side-by-side.  Plus, it works because the solution will cut the rectangular cells into more rectangles as long as it goes to the middle of each cell.  Using another base such as hexagons or snub-square tiling will cause the new maze to have differently shaped cells.  It’s not a general algorithm.

So what is a general algorithm?  I don’t have a good answer, yet.  I was able to make a simple one from a recursive backtracker.  It’s actually remarkably simple.  First, I keep track of the number of cells that remain unfilled.  Then, I do a standard recursive backtracking algorithm with one modification.  If it runs into a dead end, it returns “false”.  If it gets to the end cell and there are still unfilled cells, it returns false.  After moving to a square if it returns “false” then it closes the wall back up and tries the next.  If all moves for a cell return false then we simply return false.

But there is one way to make it return “true”.  That is if it gets to the last cell and there are no remaining unfilled cells.  In that case it returns true.  And when that one returns true it simply unwinds the stack all the way back to the beginning and is done.

Ultimately it will try a lot of paths.  The number of tries that it requires grows exponentially with the size of the maze.  So a 7×5 works in a couple of seconds, and a 10×7 just doesn’t.  At least not in a 4 or 5 minutes that I’ve waited.

So that particular algorithm, for lack of a better term, sucks.  But it works and I think it’s a start.  One thing that I know is that we can avoid some dead ends and such by looking ahead a bit and trying to not move in such a way as to create a dead-end tunnel.  Thinking of a standard rectangular maze that would mean that if we took a right turn we would have to move at least two squares before turning right again.  Coming within one space of an outside wall would be a problem unless it’s the last run to the end cell.  Generalizing the algorithm to work on any graph is the key, and it’s a tough one.  I’ve added some comments to the maze generator, will be doing more as I get more ideas.

Maze.maze_styles.unicursal = function() {

   var end_cell = this.end_cell();
   var pieces_left = this.cells.length-1;

   function recursive_maze(cell,entry_wall,depth) {
      cell.depth=depth;
      cell.entry_wall=entry_wall;

      // Check if this is the end (yes, I'm aware of the
      // optimization).
      // Alternately, we could just say "if pieces_left is 
      // 0 then this cell is the end", maybe with an edge
      // check.
      if (cell == end_cell) {
         if (pieces_left == 0) {
            return true;
         } else {
            return false;
         }  
      }  

      // Now, go through the surrounding cells and recurse
      for (var k=0 ; k<cell.perm.length ; k++) {
         var wall_num = cell.perm[k];
         var neighbor = cell.walls[wall_num].neighbor(cell);
         if (neighbor && !neighbor.visited()) {
            pieces_left--;
            cell.walls[wall_num].open();
            var winner = arguments.callee(neighbor,cell.walls[wall_num],depth+1);
            if (winner) return true;
            pieces_left++
            cell.walls[wall_num].close();
         }
      }

      return false;
   }
   var success = recursive_maze(this.start_cell(),null,0);
   if (!success) {
      throw "Cannot create unicursal maze with these parameters."
   }
}

Pro Tools update causes headaches

The latest Pro Tools update has caused two separate problems:

  1. I’ve had some customers with AIF files created with Pro Tools that have empty ANNO chunks.  Unfortunately, the Finder on the Mac ignores the ID3 chunk if there’s an ANNO chunk, so the metadata doesn’t show up unless the ANNO chunk is removed.
  2. Not sure why but some AIF files that previously worked fine with Pro Tools are now causing it to complain.  I had a customer’s customer complain that the AIF surely had a virus in it.  Needless to say it didn’t, but I did have to go through the entire catalog and strip out all extra chunks from the AIF files, leaving only COMM, SSND, and ID3.  I’ll post the ruby code for that later if there’s interest.  I don’t know which chunk was causing problems.

Collecting and Managing Data, Part 2

Spreadsheets are a wonderful way to collect data.  There are many advantages and not a few disadvantages.  For starting out the disadvantages are usually minimal.

Advantages:

  1. Almost everybody has a spreadsheet, and anyone can download a free one from Open Office
  2. Rows and columns are an intuitive way to view and think about data and closely mimic rows and columns in a database table
  3. You can see quite a bit of data at a time
  4. It’s easy to copy/paste if there is a lot of similar data

Disadvantages:

  1. Rows and columns only allow two-dimensional data, and most data is more complex
  2. Only one user can update it at a time, no concurrent access
  3. Even if you can get concurrent access, the spreadsheet model doesn’t lend itself to working concurrently
  4. Most spreadsheets have a row limit which effectively limits the size of your data

For many or most pieces of data, though, a spreadsheet is the best way to organize your data.  As your business grows, those disadvantages will take over and you will grow beyond the spreadsheet.  Even if you don’t, this information is useful.  But if you do, that data that you’ve collected will have to be moved somewhere else, and the act of moving it is going to be difficult unless you’ve created your dataset properly from the beginning.  Properly formatted data can be imported into a true database by a competent programmer.  Improperly formatted data will have to be retyped by hand, introducing errors and taking far more time.

I’m going to start with two rules that you need to keep in mind regardless of where you store your data, but it’s even more important with a spreadsheet.

Rule #1: Always use a serial number for your records

Rule #2: Always format data consistently

If you get nothing else out of this, those two rules will help you immensely.

So let’s talk about these rules.  Rule #1: You always need to use a serial number for your main records.  This means that if you have a list of customers, each customer should have a number.  It should be printed on any documentation created for that customer (e.g. an invoice) so that the customer may be referenced easily.  Of course, you also need to keep track of the address, phone number, etc., and perhaps those pieces of data could be used to find an customer.  But the customer number is the unique key, or “primary key” as we like to call it.  If you have an customer number you can find that unique customer among all others.

Most serial numbers don’t start at “1”.  Why?  Mostly for psychological reasons – businesses look small if you’re customer #10.  But there’s another good reason and that is consistency.  Let’s say your business will likely never have more than 87,000 customers.  You can then start your customer number at 12100.  As long as you don’t get more than 87,900 customers your customer numbers will always be exactly 5 digits.

That gives you an advantage in many ways.  Formatting your documents is a little easier as you know the size of the customer number.  If someone calls and offers a customer number you have a simple validation since it must be 5 digits.  And the starting number makes it difficult for someone to determine the size of your customer base.  Note that you can start with anything that’s 5 digits as long as you won’t exhaust your set of possible 5 digits numbers.

Of course, if your business is bigger simply scale that up.  Start with 121200 if you need a 6-digit customer number.  Or start with 115 if you anticipate having only a few hundred customers.  Whatever your business size you can easily scale this up or down to fit your needs.

This scheme obviously works with other data besides customer lists.  Order numbers, invoice numbers – anything that will get a serial number – can be assigned in this way.

Rule #2: Always format your data consistently.  This is also simple when you understand the basis of it.  Let’s consider our customer data again.  Let’s say we have a simple database of customer number, name, and phone number.  We need to determine up front how we’re going to format each piece of data, and then do it consistently on each record.

Imagine this scenario:

Customer # Name Phone
121 Jim Beam (812)555-1212
122 Jack Daniel 615-555-1212
123 Dickel, George +1 323 555-1212

This is normal data, unfortunately.  There are two problems here:

1. There are three different formats for phone numbers.  Not a huge deal because it’s rare that we need to get the component pieces of a phone number (i.e. the area code) and people/phones can generally deal with about anything nowadays.  Still, consistency is good because we can better anticipate how it will print on a page.  If your customer base is in US/Canada, I highly recommend simply using the middle format.

2. Look at those names.  Specifically, “George Dickel” is stored as “Dickel, George”.  Names are notoriously difficult to parse because of all of the possible suffixes that might be added to them after a comma.  I’m going to talk more about names specifically in a later part.

The point here is to pick some format and be consistent.  We can argue about what format is best – and will at some point – but you’re better off with a consistent inferior format than a dozen inconsistent better formats.

I have a piece of mail that very clearly demonstrates the problems with manipulating data that’s not well-formatted.  Years and years ago I worked at a university and my title way “Database Programmer”.  I had filled out a card or something somewhere that had my name as “Darrin Chaney”.  And so I got mail to “Darrin Chaney” and the second line would be my title “Database Programmer”.

At some point this data was sold and massaged and manipulated and screwed up.  I received a letter addressed to “Darrin C Dbase Prog”.  Inside the envelope there was a personalized letter that started out “Dear Mr. Prog”.

As a small business owner – or large business owner for that matter – you do not want to be the schmuck who sends a piece of mail to dear old Mr. Prog.  That’s amateur hour stuff.  The piece of mail came from a Fortune 500 company with whom I had done business under my actual name – all the worse.  Keeping your data format consistent will go a long way toward making sure that your customers are addressed as you would want to be addressed.  You don’t want your mail to be posted to my wall of shame.

 Beyond the Two Rules

When storing data in a spreadsheet, you have to consider that at some point you’ll most likely have a computer program manipulating that data in various ways.  Spreadsheets are generally not something that programs directly read (Microsoft’s popular Excel format has been mostly reverse-engineered but Microsoft has no interest in truly making the format public).  Instead, we usually export the spreadsheet to a text-based format known as “CSV”, or “comma-separated values”.  The above table ends up looking like this:

Customer #,Name,Phone Number
121,Jim Beam,(812)555-1212
122,Jack Daniel,615-555-1212
123,”Dickel, George”,+1 323 555-1212

You shouldn’t worry about the mechanics of CSV, i.e. the double quotes in the last line.  What you *should* worry about is the fact that any “extra” formatting in your spreadsheet is lost when you export to CSV.  That means that if you have bold, underlined, or italicized print that formatting is lost.  If you have formatted numbers, dates, times, etc. in a certain way that formatting will likely be lost.  And if you’ve changed colors of cells, rows, or columns, or the text colors that will be lost.  The only formatting that CSV will maintain is your spaces and any multi-line text that you might have in the sheet.

So, this brings us to a minor rule which we’ll call Rule #3:

Rule #3: All data must be in text in the spreadsheet.

In other words, consider that you want to know whether your customers are “local” or “non-local”.  You could make the background of non-local customers yellow, for instance, so that they’re easily viewed.  But when you export to CSV that information is lost.  The only way to handle it properly is to add another column that keeps track of that data.

In our next installment we’ll consider how to format specific data types.

Collecting and Managing Data, Part 1

The reason I’m writing this is quite simple: I want to save you a lot of time, money, headaches, and frustration.  If you are in business – any kind of business – you are collecting data.  At a minimum you are selling a product to someone.  That means that you have at least two datasets: products that you sell and people to whom you sell.  Additionally, you have monetary transactions that you have to track for tax purposes if nothing else.  Now we have three datasets.  Every single business on the planet has these three datasets.

Unless your product is created from dirt (e.g. you run a coal mine) you also have a supplier or list of suppliers from whom you buy.  At one end, you have a store where you buy from a warehouse and sell directly to consumers.  At the other end, you buy, say, cloth from cloth makers, create clothing, and sell the clothing.  In other words, you add value to the product.

Many small business owners keep these datasets in their heads.  Literally, they know their suppliers, they don’t need to know their customers (people walk in the door) and a cash register takes care of the minimal data required to track monetary exchanges.  Problems arise as the business grows too big for this model or when the key people who have the information get sick, die, attempt to delegate responsibilities, etc.

Small businesses often grow, and as they grow data collection and management becomes more and more important.  Often, the methods used to collect small amounts of data don’t scale well.  Spreadsheets are a good way to manage some data sets, but they’re limited to being accessed by one user at a time.  Some online spreadsheets, such as Google Docs, allow simultaneous access by multiple users, but the spreadsheet format doesn’t lend itself to such usage.

In the next part, I’m going to discuss general data collection techniques for small businesses and focus on spreadsheets as a good way to get started.

Collecting and Managing Data, Overview

I’m going to start a new series on collecting and managing data.  Ultimately this is going to be geared specifically toward those who have copyright data (music publishers, catalogs, libraries, etc.) but I’m going to start out with general information that will be applicable to anybody who is starting out managing data or has a data set that needs to be brought under control.

I’m not going to give a complete outline here as I haven’t started writing it and I’m not even sure of the path that I’ll ultimately take as I pull this out of my head and write it down.  I will cover general data collection and maintenance, using spreadsheets, using a desktop database like Filemaker or Access, and then specifically handling copyright data and other data associated with a collection of music or songs.

When I’m finished (if I ever finish – this may end up being a lifetime of work) there should be enough information here to write a short book.  I will number each part sequentially so you’ll want to read them in order.