Thomann /dev/blog

Sunday, 23 February 2014

We've been quite busy the last few weeks

Christmas time always is a quite different time for all of us in Treppendorf. Hotline and logistics are working almost 24 hours a day, the local store and even we, the webdevelopers, change our mode from "productivity" to "lean back, watch stats and don't fucking touch anything". We survived the whole christmas sale pretty good without any outage and so we went back to work, which is actually the reason why no one of us was able to write anything here.

There are a few upcoming projects that will still keep us quite busy this year, on top of all probably our new search engine (including a redesign of our search result page) and our new mobile page. We have improved our team and our new developer just finished his first big project, the new customer centre (which is going live on Monday).

And that's all? We'll, for the next few weeks, yes. Other upcoming projects are some other pages we lost on our redesign process last year like the category page, the basket and single elements on the product detail page. And still we have some secret projects to work on which are quite exciting (and bring some great benefits to our customers.. we'll talk about that later).

Oh wait, I almost forgot the most important task we're on right now: Finding things we want in our new office building which is being built as I write. German building laws prohibit a slide (not explicit; it's something about fire safety) and so we accepted the challenge to find other weird and funny things to put inside. If you have any great ideas, just let us know. And if you're interested, our social media guys post updates on our Facebook page regularly.

So long!

Sunday, 12 January 2014

Drawing Graphs with MySQL

Drawing what with what???!?

Yes you read correctly. Let's suppose we got the following table layout in MySql:


+-------------+--------------+------+-----+---------+----------------+
| Fiel        | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| createTime  | DateTime     | YES  |     | NULL    |                |
| dur         | integer      | YES  |     | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

It's a very basic logging table that simply stores generic events, its creation time and the time this event took to run in milliseconds. Lets query that table:

SELECT creationTime, executionTime FROM myLogTable


+---------------------+---------------+
| creationTime        | dur           |
+---------------------+---------------+
| 2013-11-23 13:27:00 |            10 |
| 2013-11-23 13:27:08 |            11 |
| 2013-11-23 13:27:11 |             9 |
| 2013-11-23 13:27:18 |            13 |
| 2013-11-23 13:27:20 |            13 |
| 2013-11-23 13:27:21 |            25 |
| 2013-11-23 13:27:30 |             9 |
| 2013-11-23 13:27:35 |            30 |
| 2013-11-23 13:27:37 |            15 |
| 2013-11-23 13:27:45 |             8 |
+---------------------+---------------+

Obviously this data is a little hard to read, wouldn't it by nice to have some kind of a visualisation?
Let's try this query:

SELECT id, creationTime, executionTime, repeat('|', executionTime) as graph FROM myLogTable


+---------------------+-----+--------------------------------+
| creationTime        | dur | graph                          |
+---------------------+-----+--------------------------------+
| 2013-11-23 13:27:00 |  10 | ||||||||||                     |
| 2013-11-23 13:27:08 |  11 | |||||||||||                    |
| 2013-11-23 13:27:11 |   9 | |||||||||                      |
| 2013-11-23 13:27:18 |  13 | |||||||||||||                  |
| 2013-11-23 13:27:20 |  13 | |||||||||||||                  |
| 2013-11-23 13:27:21 |  11 | |||||||||||                    |
| 2013-11-23 13:27:30 |   9 | |||||||||                      |
| 2013-11-23 13:27:35 |  30 | |||||||||||||||||||||||||||||| |
| 2013-11-23 13:27:37 |  15 | |||||||||||||||                |
| 2013-11-23 13:27:45 |   8 | ||||||||                       |
+---------------------+-----+--------------------------------+

Now we got a graphic representation of the value that shows us anomaly on the first glance.

Thursday, 2 January 2014

Where true or false is not enough!

At some point we've found the following code candy in our source code:


function str2bool($str) {
    if ($str === false or $str === "0" or $str === 0 or $str === "" or $str === "false" or $str === "FALSE" or $str === "falsch" or $str === "FALSCH") {
        return false;
    }
    return true;
}

This piece of code is unused and someone forgot about to delete it. But it looks like it was used in one of our externally connected subsystems to translate values like FALSCH, falsch, FALSE to a boolean false. ("falsch" is german for "false")

We're not sure if there were ever values like "FALSCH" in our code - but it's a little bit scary for us ;)

Thursday, 12 December 2013

Looking into the high resolution timer of node.js

In the nodejs core a high resolution timer can be found. It is a method of the global process object called hrtime.
When the method is called an array is returned:

$ node
> process.hrtime();
[ 31013, 815378921 ]

Well, that does not look too useful.
The official documentation states:

It is relative to an arbitrary time in the past. It is not related to the time of day and therefore not subject to clock drift. The primary use is for measuring performance between intervals.

So measuring the time between two events is the actual purpose. The first event is recorded via an call to process.hrtime():

var start = process.hrtime();

Then when a certain operation has been finished (using setTimeout to simulate this), process.hrtime is called again with the start marker as a parameter:

setTimeout(function() {
  var elapsed = process.hrtime(start);
  console.log(elapsed);
}, 1000);

Not the output looks like this:

[ 1, 14933877 ]

The first element in the array is the elapsed time in seconds, the second element is the additional time in nanoseconds.
So to get an actual useful value out of this array we have to do a little calculation:

var timeInMilliseconds = elapsed[0] * 1e3 + elapsed[1] / 1e6;

Now the result is something like 1014.96679.
Success!

The complete code example:

var start = process.hrtime();

setTimeout(function() {
    var elapsed = process.hrtime(start);
    var timeInMilliseconds = elapsed[0] * 1e3 + elapsed[1] / 1e6;

    console.log(timeInMilliseconds);
}, 1000);

If you dont like to do the same operation over and over again there is a high resolution timer module on npm: https://npmjs.org/package/hirestime

Install via:

npm install hirestime

Invokation:

var hirestime = require('hirestime');

//startpoint of the time measurement
var getElapsed = hirestime();

setTimeout(function() {
    //returns the elapsed milliseconds
    console.log(getElapsed());
}, 1000);

Optionally a timeunit can be assigned:

var hirestime = require('hirestime');

//startpoint of the time measurement
var getElapsed = hirestime();

setTimeout(function() {
    //returns the elapsed seconds
    console.log(getElapsed(hirestime.S));
}, 1000);

Possible time units are:

hirestime.S the elapsed time in seconds
hirestime.NS the elapsed time in nanoseconds
hirestime.MS the elapsed time in milliseconds

The timeunit defaults to milliseconds.

Wednesday, 27 November 2013

Sorting Data in a MySQL query before grouping

In some cases you have the problem that you need to sort your data before grouping it in a single SQL query. Normally it's impossible to do this in a single SQL SELECT statement, because the result is first grouped, then sorted.

Given a database table which contain images in different resolutions for products like this:

+------------+--------------+------+-----+---------+----------------+
| Field      | Type         | Null | Key | Default | Extra          |
+------------+--------------+------+-----+---------+----------------+
| id         | int(11)      | NO   | PRI | NULL    | auto_increment |
| article_id | int(11)      | YES  |     | NULL    |                |
| width      | int(11)      | YES  |     | NULL    |                |
| height     | int(11)      | YES  |     | NULL    |                |
| filename   | varchar(255) | YES  |     | NULL    |                |
+------------+--------------+------+-----+---------+----------------+

Imagine to select only the biggest image for a couple of article ids you could try something like this:

SELECT article_id, width, height, filename
FROM article_images
WHERE article_id IN(23, 42)
GROUP by article_id
ORDER by width*height

But you'll only get the first entry for a given article_id, not for the biggest resolution.

So let's try another approach:

SELECT article_id, width, height, filename
FROM (SELECT article_id, width, height, filename
      FROM article_images
      WHERE article_id IN(23, 42)
      ORDER by width*height
) as ghoti
GROUP by article_id

This query first selects all images in all resolutions for the given article_ids and sorts it by resolution, copying it into a temporary table named ghoti. The query around this subselect then groups the data by article_id. Because we have sorted by resolution before it results in the biggest image for a given article_id.

ghoti is our name used internally for that kind of SELECT, because we need it sometimes to transfer less data to our PHP scripts and crunch data on the database server. What ghoti means is described in this wikipedia article.

Friday, 22 November 2013

Splunk Data Analytics

We're using Splunk> Enterprise for about 3 month now and our conclusion is: It's one of the best decisions for our data analytics and processing we could have made.

Our previous process for logging and analysing data was to store data in a custom mysql table created for that specific logging purpose and reading it with some PHP scripts and pass it to the google charting library on a custom created page.

Every new analysis took us some hours to implement which reduced our willingness to log anything to nearly zero.

With Splunk> the logging just went from "to complicated, won't implement" to "what could we log next?"

We've crafted a logging class which can be used in our current store as easy as it could get:

Log::info('fun', 'woot', array(
    'monkey' => $amountOfMonkeys
));

This results in a key value log event like specified in the Splunk> logging best practice guide like this:

2013-11-22T10:22:18+00:00 mod=fun evt=woot monkey=13

Now its really easy to do some analytics in splunk with some easy search queries.

The big advantage is, that we can specify our log format by ourselves and don't have to rely on the log format of third party tools. But even with some custom log events you can extract data with the built-in field extractions using regex and start analyzing your data within minutes.

Thanks Splunk!

Thursday, 21 November 2013

The definition of rock 'n roll: when your ads get rejected by youporn.

We're about to put some suggestive ads on several porn sites, but somehow YouPorn rejected our "GILP - guitars I'd like to play" ad. Probably too hardcore for them or something.