Simon Online

2015-01-02

Shaving the Blog Yak

I’ve been on wordpress.com for 2 years now. It does a pretty fair job of hosting my blog and it does abstract away a lot of the garbage you usually have to deal with when it comes to hosting your own blog. However it cost about $100 a year which is a pretty fair amount of money when I have Azure hosting burning a hole in my pocket. There are also a few things that really bug me about wordpress.com.

The first is that I can’t control the plugins or other features to the degree that I like. I really would like to change the way that code is displayed in the blog but it is pretty much fixed and I can only do it because I farm out displaying code to github.

Second I’m always super nervous about anything written in PHP. You can shoot yourself in the foot using any programming language but PHP is like gluing the business end of a rail gun to your big toe. One of the most interesting things I’ve read recently about PHP was this blog post which suggests that something like 78% of PHP installs are insecure. Hilarious.

The third and most major thing is that all the content on wordpress.com is locked into wordpress.com. Sure there is an export feature but frankly it sucks and you’re really in trouble if you want to move quickly to something else. For instance exporting in a Ghost compatible format requires installing a ghost export plugin. This can’t be done because wordpress.com doesn’t allow installing your own plugins.

So, having a few days off over Christmas, I decided it was time to make the move. I was also a bit prompted by the Ghost From Source series of posts from Dave Wesst who showed how easy it was to get Ghost running on Azure.

As I expected this whole process was jolly involved. The first thing I wanted was to get an export in a format that Ghost could read. This meant getting the Ghost export plugin working on wordpress.com. This is, of course, impossible. There is no ability to install your own plugins on wordpress.com.

So I kicked up a new Ubuntu Trusty VM on my OSX box using vagrant. I used Trusty because I had an image for it downloaded already. The Vagrantfile looked like

# -*- mode: ruby -*-
# vi: set ft=ruby :

# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "ubuntu/trusty64"
  config.vm.network "forwarded_port", guest: 80, host: 8000
end

Once I was sshed in I just followed the steps for setting up wordpress at https://help.ubuntu.com/community/WordPress.

I dropped the plugins in for WordPress import and for Ghost export. I then exported the WordPress file which is, of course, a giant blob of XML.

The export seems to contain all sorts of data including links to images and comments. Ghost doesn’t support either of those things very well. That’s fine we’ll get to those.

Once I had the wordpress backup installed into my local wordpress I could then use the Ghost export which dumped a giant JSON file. Ah, JSON, already this is looking better than wordpress.

I downloaded the latest Ghost and went about getting it set up locally.

git clone git@github.com:TryGhost/Ghost.git
cd Ghost
npm install
grunt init
npm start

This got Ghost up and running locally at http://localhost:2368 . I jumped into the blog at http://localhost:2368/ghost and set up the parameters for my blog. I next went to the labs section of the admin section and imported the JSON dump.

#Issues

I was amazed that it was all more or less working. Things missing were

  1. Routes incorrect - old ones had dates in them and the new ones don’t
  2. Github gists not working
  3. Some crazy character encoding issues

They all need to be dealt with.

##Routes

The problem is that the old URLs were like http://blog.simontimms.com/2014/08/19/experiments-with-azure-event-hubs/ while the new ones would be http://blog.simontimms.com/experiments-with-azure-event-hubs/. I found a router folder which pointed me to frontend.js. The contents looked very much like an expressjs router, and in fact that is what it was. Unfortunately the router in express is pretty lightweight and doesn’t do parameter extraction. I had to rewrite the url inside the controller itself. Check out the commit. It isn’t pretty but it works.

Later on I was poking around in the settings table of the ghost blog and found that there is a setting called permalink that dictates how the permalinks are generated. I changed mine from /:slug to /:year/:month/:day/:slug and that got things working better without source hackery. This feature is, like most of ghost, a black hole of documentation.

##Gists Not Working

Because there was no good syntax highlighting in WordPress I used embedded gists for code. To do that all you did was put in a github url and it would replace it with the gist.

There are a number of ways to fix this. The first is to simply update the content of the post, the second is to catch the content as it is passed out to the view and change it, the third is to change the templating. It is a tough call but with proper syntax highlighting in ghost I don’t really have any need for continuing support for gists. I decided to fix it in the json dump and just reimport everything.

I thought I would paste in the regex I used because it is, frankly, hilarious.

%s/https:\\\/\\\/gist.github.com\\\/\d\+/<script src='&.js'><\\\/script>/g
%s/https:\\\/\\\/gist.github.com\\\/stimms\\\/\w\+/<script src='&.js'><\\\/script>/g

##Crazy Character Encoding

Some characters look to have come across as the wrong format. Some UTF jazz, I imagine. I kind of ignored it for now and I’ll run some SQL when I have a few minutes to kill.

#Databases

By default ghost uses an sqlite database on the file system. This is a big problem for hosting on Azure websites. The local file system is not persistent and may disapear at a moment’s notice. Ghost is built on top of the Knex data access layer and this data layer works just fine against MySQL as well as Postgresql. There is a free version of MySQL from ClearDB which is of very limited size and very limited number of connections. This is actually fine for me because I don’t have a whole lot of people reading my blog (tell your friends to read this blog).

So I decided to use MySQL and so began a yak shaving to rival the best. The first adventure was to figure out how to set the connection string. Of course, this isn’t documented anywhere, but this is the configuration file I came up with:

database: {
            client: 'mysql',
            connection: {
              host: 'us-cdbr-azure-west-a.cloudapp.net',
              user: 'someuser',
              password:'somepassword',
              database:'simononline',
              charset: 'utf8',
              debug: false
          }
        },

I logged into the ghost instance I had on azure websites and found that it worked fine. However, when I attempted to import the dump into MySQL I was met with an error. As it turns out the free MySQL has a limit of 4 connections and the import seems to use more than that. I feel like it is probably a bug in Ghost that it opens up so many connections but probably not one that really needs a lot of attention.

And so began the shaving.

I jumped back over to my vagrant Ubuntu instance and found it already had MySQL installed as a dependency for Wordpress. I didn’t, however, know the password for it. So I had to figure out how to reset the password for MySQL. I followed the instructions at https://dev.mysql.com/doc/refman/5.0/en/resetting-permissions.html

That got the password changed but I couldn’t log into it from the host system still. Turns out you have to grant access specifically from that host.

SET PASSWORD FOR 'root'@'10.0.2.2' = PASSWORD('fish');

Finally I got into the database and could point my local Ghost install at it. That generated the database and I could import into it without worrying too much about the number of connections opened.

Then I ran mysqldump to export the content of the databaes into a file. Finally I sucked this into the azure instance of MySQL.

#Coments

Ghost has no comments but you can use Disqus to handle that for you. I’m pretty much okay with that so I followed the instructions here:

https://help.disqus.com/customer/portal/articles/466255-exporting-comments-from-wordpress-to-disqus

and got all my comments back up and running.

#Conclusion
So everything is moved over and I’m exhausted. There are still a couple of thing to fix

  1. No SSL on full domain only the xxx.azurewebsites.net one
  2. Images still hosted on wordpress domain
  3. No backups of mysql yet

I can tell you this jazz cost me way more time than $100 but it is a victory and I really needed a victory.

2014-12-01

Getting Hammered Because of DNSimple?

Poor DNSimple is, as of writing, undergoing a massive denial of service attack. I have a number of domains with them and, up until now, I’ve been very happy with them. Now it isn’t fair of me to blame them for my misfortune as I should have put in place a redundant DNS server. I’ve never seen a DNS system go belly up in this fashion before. I also keep the TTL on my DNS records pretty low to mitigate any failures on my hosting provider. This means that when the DNS system fails people’s caches are emptied very quickly.

DNS has been up and down all day but it is so bad now that something had to be done. Obviously I need to have some redundancy anyway so I set up an account on easyDNS. I chose them because their logo contains a lightning bolt which is yellow and yellow rhymes with mellow and that reminds me of my co-worker, Lyndsay, who is super calm about everything and never gets upset. It probably doesn’t matter much which DNS provider you use so long as it isn’t Big Bob’s Discount DNS.

I set up a new account in there and put in the same DNS information I’d managed to retrieve from DNSimple during one of its brief periods of being up. I had the information written down too so either way it wouldn’t be too serious to recreate it. It does suggest, however, that there is something else you need to backup.

In EasyDNS I set up a new domain

Screen Shot 2014-12-01 at 10.11.29 PM

in the DNS section I set up the same records as I had in my DNSimple account.Screen Shot 2014-12-01 at 10.14.02 PMFinally I jumped over to my registrar and entered two of the EasyDNS server as the DNS servers for my domain and left two DNSimple servers. This is not the ideal way of setting up multiple DNS server. However from what I can tell DNSimple doesn’t support zone transfers or secondary DNS so the round robin approach is as good as I’m going to get.

Screen Shot 2014-12-01 at 10.34.34 PM

With the new records in place and the registrar changed over everything started working much better. So now I have redundant DNS servers for about $20/year. Great deal.

2014-11-23

Books

socialData

2013 ““ Use a variety of JavaScript technologies to build visualizations of data taken from social media sites. Includes sections on LinkedIn, Facebook and StackOverflow.

mastering

2014 ““ Covers all the original GoF patterns as well as functional patterns, MV* patterns and messaging patterns.

2014-11-16

I Have a Short Memory or So I Wrote Another Book

Last year I wrote a book then I blogged about writing a book. I concluded that post with

I guess watch this spot to see if I end up writing another book. -Simon Timms

Well, my friendly watchers, it has happened. This time I wrote a book about JavaScript patterns. I feel like a bit of a fraud because I don’t really believe that I know much about either patterns or JavaScript. I did manage to find some way to fill over 250 pages with content. I also feel like this book is much better than the last book, if only because it is physically heavier.

book

I agreed to write it for a couple of reasons

  1. I had forgotten how much time it takes to write a book. Seriously it takes forever. Even when you’re done you’re never done. There are countless revisions and reviews and goodness knows what else to eat up your time. Maybe a review only takes 10 minutes a chapter but 12 chapters later and you’ve used up another 2 hours. I don’t like to do the math around dollars an hour for writing but it is low, like below minimum wage low. I met Charlie Russel at the Microsoft MVP Summit earlier this year and had a good chat with him about making a living writing books. He has been in the game for many years and has written more books than I have pairs of socks (even if you relax your constrains and let any two socks be a pair regardless of matching). He told me of a golden age when it was possible to make a good living writing books. Those days are gone and we’re all in a mad dash to the bottom ““ which is free and free isn’t maintainable. That’s a topic for another day.

  2. I liked the topic. Patterns are good things to know. I would never recommend that you go out of your way to implement patterns but having a knowledge of them will help you solve common problems in a sensible way. It is said there are only so many basic plots for a story and I think patterns are like that too. You start writing a story and while it is unique you’ll find that one of the archetypal plots emerges. You can also never go wrong talking about JavaScript.

  3. I figured this book would get more exposure than the last one. My last book was pretty niche. I can’t imagine there are more than 2 or 3 dozen people in the world who would be interested in visualizing social media data to the degree they would buy a book on it. This one, however, should have a much broader reach. I’ve been working on getting my name out there in the hopes that the next time I’m looking for work it is easier.

If you happen to be one of the people interested in JavaScript and how to write it building on the patterns we’ve spent 20 years discovering then go on, buy the book.

This time, however, I’m serious! I’m not writing any more books through a traditional publisher. I’ve already turned down an offer to write what would have been an really interesting one. For my next foray I’m going to publish through LeanPub. They are a nice and funny group of folks whose hands off approach allows for much more creativity around pricing and even production of the book. I’m also done with writing books by myself, I need some companionship on the journey.

There will be more books, just watch this space!

2014-11-14

ASP.net 5 Configuration

On twitter yesterday I had a good conversation with Matt Honeycutt about configuration in ASP.net 5. It started with

The more I think about it: yeah, putting app-specific connection strings and things in evn variables is a TERRIBLE idea. #AspNetvNext

Today I’m seeing a lot more questions about how configuration works in ASP.net vNext 5 (sorry, still getting use to that).

Hail Hydra! How do I avoid collision with env variables bw apps on the same server? #AspNetvNext

Isn’t it high time to rename #AJAX to #AJAJ? : Everything gone JSON now! Even the configuration files in #AspNetvNext?

It sounds like there is some clarification needed about how configuration works in ASP.net 5.

The first thing is that configuration in ASP.net 5 is completly pluggable. You no longer need to rely on the convoluted Web.config file for all your configuration. All the configuration code is found in the Configuration repository on github. You should start by looking at the Configuration.cs file, this is the container class that holds the configuration for your application. It is basically a box full of strings. How we get things into that box is the interesting part.

In the standard template for a new ASP.net 5 project you’ll find a class called Startup.cs. Within that class is the configuration code

In the default configuration we’re reading from a json based configuration file and then overriding it with variables taken from the environment. So if you were developing and wanted to enable an option called SendMailToTestServer then you could simply define that in your environment and it would override the value from the json file.

Looking again in the Configuration repository we see that there are a number of other configuration sources such as

  • Ini files
  • Xml files
  • In memory

The interface you need to implement to create your own source is simple and if you just extend BaseConfigurationSource that should get you most of the way there. So if you want to keep your configuration in Zookeeper then all you would need to do is implement your own source that could talk to Zookeeper. Certain configuration providers also allow changes in the configuration to be committed back to them.

The next point of confusion I’m seeing is related to how environmental variables work. For the most part .net developers think of environmental variables as being like PATH: you set it once and it is globally set for all processes on that machine. For those people from a more Linuxy/UNIXy background we have a whole different interpretation.

Environment variables are simply pieces of information that are inherited by child processes. So when you go set your PATH variables by right clicking on My Computer in Windows (it is still called that, right?) you’re setting a default set of environmental variables that are inherited by all launched processes. You can set them in other ways, though.

Try this: open up two instances of powershell. In the first one type

$env:asp=”rocks” echo $env:asp

You should see “rocks” echoed back. This sets an environmental variable and then echos it out. Now let’s see if the other instance of powershell has been polluted by this variable. Type

echo $env:asp

Nothing comes back! This is because the environments, after launch, are separate for each process. Now back to the first window and type

start powershell

This should get you a third powershell window. In this one type

echo $env:asp

Ah, now you can see “rocks” echoed again. That’s because this new powershell inherited its environment from the parent process.

Environments are NOT global. So you should have no issue running as many instances of ASP.net 5 on your computer as you like without fear of cross polluting them so long as you don’t set your variables globally.

Why even bother with environmental variables? Because it is a common language that is spoken by every operating system (maybe not OpenVMS but let’s be realists here). It is also already supported by Azure. If you set up a configuration variable in an Azure WebSite then when that is set in the environment. That’s how you can easily configure node application or anything else.Finally it helps eliminate that thing where you accidentally alter and check in a configuration file with settings specifically for your computer and break the rest of your team. Instead of altering the default configuration file you could just set up and environment or you could set up a private settings file.

Where AddPrivateJsonFile extends the json configuration source and swallows missing file exceptions allowing your code to work flawlessly on production.

In a non-cloud production environment I would still tend to use a file based configuration systeminstead of environmental variables.

The new configuration system is extensible and powerful. It allows for chaining sources and solves a lot of problems in a more elegant fashion than the old XML based transforms. I love it.

2014-11-11

Is ASP.net 5 too much?

I’ve been pretty busy as of late on a number of projects and so I’ve not been paying as much attention as I’d like to the development of ASP.net vNext, or as it is now called, ASP.net 5. If you haven’t been watching the development I can tell you it is a very impressive operation. I watched two days worth of presentations on it at the MVP Summit and pretty much every part of ASP.net 5 is brand new.

The project has adopted a lot of ideas from the OWIN project to specify a more general interface to serving web pages built in .net technologies. They’ve also pulled in a huge number of ideas from the node community. Build tools such as grunt and gulp have been integrated into Visual Studio 2015. At the same time the need for Visual Studio has been deprecated. Coupled with the open sourcing of the .net framework developing .net applications on OSX or Linux is perfectly possible.

I don’t think it is any secret that the vision of people likes Scott Hanselman is that ASP.net will be a small 8 or 10 meg download that fits in with the culturebeing taught at coding schools. Frankly this is needed because those schools put a lot of stress on platforms like Ruby, Python or node. They’re pumping out developers at an alarming rate. Dropping the expense of Visual Studio makes the teaching of .net awhole lot more realistic.

ASP.net 5 is moving the platform away from propriatary technologies to open source tools and technologies. If you thought it was revolutionary when jQuery was included in Visual Studio out of the box you ain’t seen nothing yet.

The thought around the summit was that with node mired in the whole Node Forward controversy there was a great opportunity for a platform with real enterprise support like ASP.net to gain big market share.

Basically ASP.net 5 is ASP.net with everything done right. Roslyn is great, the project file structure is clean and clear and even packaging, the bane of our existence, is vastly improved.

But are we moving too fast?

For the average ASP.net developer we’re introducing at least

  • node
  • npm
  • grunt
  • bower
  • sass/less
  • json project files
  • dependency injection as a first class citizen
  • different directory structure
  • fragmented .net framework

That’s a lot of newish stuff to learn. If you’re a polyglot developer then you’re probably already familiar with many of these things through working in other languages. The average, monolingual, developer is going to have a lot of trouble with this.

Folks I’ve talked to at Microsoft have likened this change to the migration from classic ASP to ASP.net and from WebForms to MVC. I think it is a bigger change than either of those. With each of these transitions there were really only one or two things to learn. Classic ASP to ASP.net brough a new language on the server (C# or VB.net) and the integration of WebForms. Honestly, though, you could still pretty much write ASP classic in ASP.net without too much change. MVC was a bit of a transition too but you could still write using Response and all the other things with which you had built up comfort in WebForms.

ASP.net 5 is a whole lot of moving parts build on a raft of technologies. To use a Hanselman term is is a lot of lego bricks. A lot of lego can be either make a great model or it can make a mess.

OLYMPUS DIGITAL CAMERAI really feel like we’re heading for a mess in most cases.

ASP.net 5 is great for the expert developers but we’re not all expert developers. In fact the vast majority of developersare just average.

So what can be done to bring the power of ASP.net 5 to the masses and still save them from the mess?

  1. Tooling. I’ve seen some sneak peeks at where the tooling is going and the team is great. The WebEssentials team is hard at work fleshing out helper tools for integration into Visual Studio.

  2. Training. I run a .net group in Calgary and I can tell you that I’m already planning hackatons on ASP.net 5 for the summer of 2015. It sure would be great if Microsoft could throw me a couple hundred bucks to buy pizza and the such. We provide a lot of training and discussion opportunity and Microsoft does fund us but this is a once in a decade sort of thing.

  3. Document everything, like crazy. There is limited budget inside Microsoft to do technical writing. You can see this in the general decline in the quality of documentation as of late. Everybody is on a budget but good documentation was really made .net accessible in the first place. Documentation isn’t just a cost center it drives adoption of your technology. Do it.

  4. Target the node people. If ASP.net can successfully pull developers from node projects onto existing ASP.net teams then they’ll bring with them all sorts of knowledge about npm and other chunks of the tool chain. Having just one person on the team with that experience will be a boon.

The success of ASP.net 5 is dependent on how quickly average developers can be brought up to speed. In a world where a discussion of dependency injection gets blank stares I’m, frankly, worried. Many of the developers with whom I talk are pigeon holed into a single language or technology. They will need to become polyglots. It is going to be a heck of a ride. Now, if you’ll excuse me, I have to go learn about something called “gulp”.

2014-09-15

Git prompt on OSX

I have a bunch of work to do using git on OSX over the next few months and I figured it was about time I changed my prompt to be git aware. I’m really used to having this on Windows thanks to the excellent posh git. If you haven’t used it in the prompt you get the branch you’re on, the number of files added, modified and deleted as well as a color hint about the state of your branch as compared with the upstream (ahead, in sync, behind).

Screen Shot 2014-09-11 at 10.27.43 PM

It is wonderful. I wanted it on OSX. There are actually quite a few tutorials that will get you 90% of the way there. I read one by Mike O’Breinbut I had some issues with it. For some reason the brew installation on my machine didn’t include git-prompt. It is possible that nobody’s does”¦ clearly a conspiracy. Anyway I found a copy over at the git repository on github. I put it into my home directory and sourced it in my .profile.

if [ -f $(brew –prefix)/etc/bash_completion ]; then . $(brew –prefix)/etc/bash_completion fi source ~/.git-prompt PS1=”33[32m]@ 33[33m]w$(__git_ps1 “ (33[36m]%s33[33m])”) n$33[0m] “

This got me some of the way there. I had the branch I was on and it was coloured for the relationship to upstream but it was lacking any information on added, removed and modified files.

Screen Shot 2014-09-11 at 10.52.10 PM

So I cracked open the .git-profile and got to work. I’ll say that it has been a good 5 years since I’ve done any serious bash scripting and it is way worse than I remember. I would actually have go this done in powershell in half the time and with half the confusion as bash. It doesn’t help that, for some reason, people who write scripts feel the need to use single letter variables. Come on, people, it isn’t a competition about brevity.

I started by creating 3 new variables

local modified=”$(git status | grep ‘modified:’ | wc -l | cut -f 8 -d ‘ ‘)” local deleted=”$(git status | grep ‘deleted:’ | wc -l | cut -f 8 -d ‘ ‘)” local added=”$(git ls-files –others –exclude-standard | wc -l | cut -f 8 -d ‘ ‘)”

The first two make use of git status. I had a quick twitter chat with Adam Dymitruk who suggested not using git status as it was slow. I did some bench marking and tried a few other commands and indeed found that it was about twice as expensive to use git status as to use git diff-files. I ended up replacing these variables with the less readable

local modified=”$(git diff-files|cut -d ‘ ‘ -f 5|cut -f 1|grep M|wc -l| cut -f 8 -d ‘ ‘)” local deleted=”$(git diff-files|cut -d ‘ ‘ -f 5|cut -f 1|grep D|wc -l | cut -f 8 -d ‘ ‘)” local added=”$(git ls-files –others –exclude-standard | wc -l | cut -f 8 -d ‘ ‘)”

Chaining commands is fun!

Once I had those variables in place I changed the gitstring in .git-prompt to read

local gitstring=”$c$b${f:+$z$f}$r$p [+$added ~$modified -$deleted]”

See how pleasant and out of place those 3 new variables are?

I also took the liberty of changing the prompt in the .profile to eliminate the new line

PS1=”33[32m]@ 33[33m]w$(__git_ps1 “ (33[36m]%s33[33m])”) $33[0m] “

My prompt ended up looking like

Screen Shot 2014-09-12 at 6.59.55 AM

Beautiful. Wish I’d done this far sooner.

2014-09-12

So DNS...

Turns out DNS is kind of important and for some reason mine decided to leave. I went ahead and moved over to using DNSimple instead of my somewhat questionable registrar (only 1186 days until that comes up for renewal). So sorry the blog has been offline; I actually didn’t even notice it.

2014-08-19

Experiments with Azure Event Hubs

A couple of weeks ago Microsoft released Azure Event Hub. These are another variation on service bus that go on to join queues and topics. Event Hubs are Microsoft’s solution to ingesting a large number of messages from Internet of Things or from mobile devices or really from anything where you have a lot of devices that produce a lot of messages. They are prefect for sources like sensors that report data every couple of seconds.

There is always a scalabilitystory with Azure services. For instance with table storage there is a partition key; there is a limit to how much data you can read at once from a single partition but you can add many partitions. Thus when you’re designing a solution using table storage you want to avoid having one partition which is particularly hot and instead spread the data out over many partitions. With Event Hubs the scalability mechanism is again partitions.

When sending messages to table storage you can pick one of n partitions to handle the message. The number of partitions is set at creation time and values seem to be in the 8-32 range but it is possible to go up to 1024. I’m not sure what real world metric the partition count maps to. At first I was thinking that you might map a partition to a device but with a maximum around 1024 this is clearly not the approach Microsoft had in mind. I could very easily have more than 1024 devices. I understand that you can have more than 1024 partitions but that is a contact support sort of operation. The messages within a partition are delivered to your consumers in order or receipt.

Event Hubs

In order delivery sounds mildly nifty but it is actually a huge technical accomplishment. In a distributed system doing anything in order is super difficult. Their cheat is that there is only a single consumer for each partition. I should, perhaps, say that there is at most one consumer per partition. Each consumer can handle several partitions. However you can have multiple consumer groups. Each consumer group gets its own copy of the message. So say you were processing alerts from a door open sensor and you want to send text messages when a door is opened and you want to log all open events in a log then you could have two consumers in two groups. Realistically you could probably handle both of these things in a single consumer but let’s play along with keeping our microservices very micro.

An open closed sensor - this one is an Insteon sensorA magnetic open closed sensor ““ this one is an Insteon sensor

Messages sent to the event hub are actually kept around for at least 24 hours and can be configured up to 7 days. The consumers can request messages from any place in the stream history. This means that if you need to replay an event stream because of some failure you’re set. This is very handy should you have a failure that wipes out some in memory cache (not that you should take that as a hint that the architecture I’m using leverages in memory storage).

Until now everything in this article has been discoverable from the rather sparse Event Hub documentation. I had a bunch more questions about the provided EventProcessorHost that needed answering. EventProcessorHost is the provided tool for consuming message. You can consume messages using your own connectors or via EventHubReceiver but EventProcessorHost provides some help for dealing with which node is responsible for which partitions. So I did some experiments

What’s the deal with needing blob storage?

It looks like theEventProcessorHost writes out timestamps and partition information to the blob storage account. Using this information it can tell if a processing node has disappeared requiring it to spread the lost responsibility over more nodes. I’m not sure what happens in event of a network partition. It is a bit involvedto test. The blob storage is checked every 10 seconds so you could have messages going unprocessed for as long as 20 seconds.

Opening up the blog storage there is a blob for each consumer group * each partition. So for my example with only the $Default group and 16 partitions there were 16 blobs. Each one contained some variation of

{"PartitionId":"10","Owner":"host1","Token":"87f0fe0a-28df-4424-b135-073c3d007912","Epoch":3,"Offset":"400"}

Is processing on a single partition single-threaded?

Yes, it appears to be. This is great, I was worried I’d have to lock each partition so that I didn’t have more than one message being consumed at a time. If that were the case it would sort of invalidate all the work done to ensure in order delivery.

Is processing multiple messages on different partitions on a single consumer multi-threaded?

Yes, you can make use of threads and multiple processors by having one consumer handle several partitions.

If you register a new consumer group does it have access to messages published before it existed?

I have no idea. In theory it should but I haven’t been able to figure out how to create a non-default consumer group. Or, more accurately, I haven’t been able to figure out how to get any messages for the non-default consumer group. I’ve asked around but nothing so far. I’ll update this if I hear back.

2014-08-13

Rolling Averages in Redis

I’m working on a system that consumes a bunch of readings from a sensor and I thought how nice it would be if I could get a rolling average into Redis. As I’m going to be consuming quite a few of these pieces of data I’d rather not fetch the current value from redis, add to it and send it back. I was thinking about just storing the aggregate value and a counter in a hash set in Redis and then dividing one by the other when I needed the true value. You can set multiple hash values at the same time with HMSET and you can increment a value using a float inside a hash using HINCRBYFLOAT but there is no way to combine the two. I was complaining that there is noHMINCRBYFLOAT command in Redis on twitter when Itamar Haber suggested writing my own in Lua.

I did not know it but apparently you can write your own functions that plug into Redis and can become commands. Nifty! I managed to dig up a quick Lua tutorial that listed the syntax and I got to work. Instead of aHMINCRBYFLOAT function I thought I could just shift the entire rolling average into Redis.

This script gets the current value of the field as well as a counter of the number of records that have been entered into this field. By convention I’m calling this counter key.count. I increment the counter and use it to weight the old and new values.

I’m using the excellent StackExchange.Redis client so to test out my function I created a test that looked like

The test passed perfectly even against the Redis hosted on Azure. This script saves me a database trip for every value I take in from the sensors. On an average day this could reduce the number of requests to Redis by a couple of million.

The script is a bit large to transmit to the server each time. However if you take the SHA1digest of the script and pass that in instead then Redis will use the cached version of the script that matches the given SHA1. You can calculate the SHA1 like so

The full test looks like