toStr revisited

Programming No Comments »

After my initial toStr C++ exploration, I recent found myself reading about Boost’s lexical_cast, which is does something similar, albeit more general and with more verbosity. lexical_cast will not only convert nearly anything to a string, it will also do its best to convert anything to anything, via a string in the middle.

Upon finding that, I was considering rewriting toStr to use lexical_cast (the only reason to hang onto toStr at all would have been for brevity in my code), but then I somehow stumbled upon Herb Sutter’s article The String Formatters of Manor Farm, which talks about the performance of various int to string methods. As it is, int to string is what I use my toStr function for most of the time (followed by doubles probably), so this is the performance I’m most interested in.

From Herb’s article, I learned that lexical_cast was extremely slow. stringstream, which is what the current implementation of toStr uses, is also extremely slow (but not as bad as lexical_cast). On the other hand, snprintf is very fast. I verified this with some of my own tests on Solaris/SunStudio and Linux/GCC. Rather then write up my own performance tests, allow me to refer you to this write up, as well as back to Herb Sutter’s article that I reference above.

snprintf requires explicit formatting strings, but this isn’t an issue because I can specialize the template for certain types, like ints. And, if I know the type, then I also know the maximum length string that can be generated. For instance, a 32 bit int, is 12 digits (The ‘-’ if negative, 10 digits for the main number, and one digit for the trailing \0), while a 64 bit int is 22 digits. I could also figure out the maximum size for floats and doubles, but I haven’t yet done so.

So I will now modify toStr to include the existing version while specializing to be about 17 times faster for ints.


template
static inline std::string toStr(T v)
{
std::stringstream s;
s << v;
return s.str();
}

template <>
inline std::string toStr(int v)
{
//max 64 value: 18446744073709551615
char tmp[22]; //derived from the max 64bit value + 1 for '-' and 1 for \0
snprintf(tmp, 22, "%i", v);
return std::string(tmp);
}

Share on Facebook

PostgreSQL connection pooling for mod_php

Programming, System Administration No Comments »

In a quest for better performance with postgres, I’ve been looking for connection pooling tools. There are a few quirks that I tend to require be met. First, it must run on Solaris. This isn’t so much a quirk, since the server runs Solaris and is SPARC hardware, and I’m not going to install a second server in colo just to accomodate software that doesn’t work on Solaris/SPARC. Additionally, I refuse to install GCC, so it must build with Sun Studio, which is much more GCC compatible that it used to be, but still isn’t GCC. Also, I want it to be reasonably simple to install and setup. I am willing to consider prebuilt packages from sunfreeware. If I get desperate enough, maybe even blastwave. Unfortunately, none of the top choices appear to be on sunfreeware.

The top choices appear to be:

  • pgpool
  • This is the classic choice, building and install is easy, but setup is very arcane.

  • pgbouncer
  • This looks like it should be simple to install and setup, but the configure script refuses to find my libevent install.

  • SQLRelay
  • Works for many databases, unlike the others, including sqlite. However, it requires the rudiments library from the same author, and this library won’t build because the autoconf stuff doesn’t understand anything but GCC.

So, I haven’t broken down to checking out blastwave yet, but so far none of the normal choices are working out for PostgreSQL connection pooling.

Then, I made a small breakthrough when I found that PHP has pg_pconnect. pg_pconnect does some background bookeeping to keep connections open after you call pg_close, and return the same connection if the arguments are the same. Practically, this means that if you use a PHP system that keeps persistant php interpreters (say, mod_php in Apache, which is what I use for PHP), then you have effectively gotten connection pooling for PHP only.

This is a big help already, but I still need a solution that helps out with python.

Yes, I am working on a little web development on vacation.

Share on Facebook

Databases for simple web development

Programming, System Administration No Comments »

I have log been a fan of PostgreSQL over MySQL, believe that PostgreSQL is more feature complete and generally as fast or faster, with obvious caveats about being used appropriately, of course, and not to mention no real comparative testing. Every body gets to have an untested opinion, right?

I did end up doing some performance testing though. What I learned is that both are reasonable fast at simple queries. Great. However opening a new connection to MySQL is much faster than opening a new connection to PostgreSQL. Once the connection is open, but seem equally fast for very simple tests.

Why this matters though is that simple web development in many languages with the most common tools don’t do connection pooling. If you want to just whip up an example PHP program using mod_php, then every page load will result in a new connection. The same goes for mod_python or mod_wsgi (as well as frameworks sitting on top of those plugins). Using each of these common tools with PostgreSQL results in a slow web site. This was driven home when I upgraded from a single 550mhz UltraSPARC II to a dual 1.1Ghz UltraSPARC3 III, and still certain web apps I’ve been tinkering with writing using PostgreSQL for the database are slow.

Certainly there are ways around this. Using a database connection pooling tool for starters, would certainly cure the problem. Also, choosing something that keeps your script running (or a least your database connections open) would also help. Or even writing your application as stand alone program that keeps the database connections open and talks to the web server via JSON-RPC or XML-RPC. But, to quickly whip something out MySQL may be simpler.

Of course, for some applications Sqlite could be a contender. Certainly it is very fast, and very simple to use. For a scalable web site though, it is probably out of the question. There is a reason that Django defaults to using Sqlite first though. And there are also, those less traditional database servers like CouchDB or memcachedb which seem to generally have very fast connection times.

This is a bit disappointing though. AOLServer used to offer connection pooling built into the web server. Of course, I certainly don’t want to use TCL as my development language, but still that would be nice to have.

Meanwhile, can anyone suggest a good Solaris and PostgreSQL connection pooling library?

Share on Facebook

toStr – A small C++ utility function.

Programming No Comments »

This can be used as string(“bob”) + toStr(5) without declaring the type, presuming that the type can correctly be inferred by the compiler. Obviously, T must support operator<<.

template <class T>
static inline std::string toStr(T v)
{
  std::stringstream s;
  s << v;
  return s.str();
}
Share on Facebook

RESTful message queuing in Python

Programming No Comments »

Alright.  Pass 1 is done.  Here is a link to it.  The server is in Python.  Clients in PHP and Python are provided.  It follows this design document.  On a quad Opteron, it gets about 600 short messages a second.  It isn’t threaded.  Next step on this project is to redo this in Erlang.  And then maybe C++ for the heck of it.

This uses WSGI, specifically the wsgi reference server, so in theory it shouldn’t be hard to adopt to other wsgi servers, like mod_wsgi.  However, beware of problems of thread safety.  Also beware that wsgi servers that use multiple processes will require some sort of external data store instead of process local memory.

Share on Facebook

A new message queuing system

Programming No Comments »

First, why a new one?  Because I haven’t found any that do what I need and look simple and well supported.  Besides, it seems like a reasonable learning experience.

The initial summary of what I need is a light weight method for PHP (in the form of scripts running in mod_php) to send messages to a back end program written in Python. However, in the future other languages may be used, so general cross language compatibility is important. That means either bindings exist for every conceivable language, or bindings are trivial to write.  Also, the solution must run on Solaris.

Comparison:

At work I use Apache QPID, which is an AMQP implementation. I can’t find any PHP client for AMQP though. I was able to find discussion that suggested that the AMQP protocol is too heavyweight for PHP being run in mod_php.

Looking at other solutions, I don’t want to run anything that requires Java for the server. That rules out Apache ActiveMQ and other JMS systems. I also believe that XMPP is too heavy weight to parse.  I also found some systems written in Perl, Ruby, and PHP, but they looked rather slap dash, and I don’t particularly want to use those languages.  The initial requirement for supporting PHP is only because I’m working on a PHP web app that I forked.  I do not want to add any new PHP code bases that I need to maintain.  Besides, once I start looking at fringe choices, it gets to be a lot easier to justify writing my own, particularly if I am going to use it as a learning project to get more familiar with, say, Erlang.

Summary:

It is to be a RESTful design.  I will be using JSON for the payloads, but I haven’t decided yet if it makes sense to force this, or if it makes sense to allow all ascii data.  Queues will be single read, meaning that if multiple end points need to get the same message, then there will need to be a separate queue for each end point.  Initially there will be no security model or persistence.  Commands will be standard HTTP verbs.  If possible I will try to make response codes valid HTTP response codes.

Goals:

Run on Solaris

Support Python, PHP, and Javascript as clients.

Limitations:

Initially this will not support persistence.

Also, this will not support any security.

Commands:

PUT queueName

Create a new queue. What the responses will be still need to be decided.

Responses:

201 created, entity required, probably just a confirmation message

409 already existed.

POST queueName

The body of the post will be the contents of the message.

403 queueName wasn’t found.

201 created, and perhaps the entity will be an id for the message.

GET

Gets that do not match the following patterns will be answered with a 404.

GET msg/queueName

Get the next message from the queue queueName.  Here I need some way to to return a message ID in addition to the message body.  It may make sense for the response to be JSON: {‘id’: integer, ‘content’: <valid JSON here>}

If the response is as proposed, the the contents of the POST must be valid JSON as well.

403 queueName wasn’t found.

200, the message

GET queues

Get a list of the created queues.

200

DELETE queueName/integer

Delete a message identified with integer from queue queueName.

204 deleted, no entity required

403, queuename or integer not found.

DELETE queueName

Delete a queue and all the messages in it.

204 Deleted, no entity in response

403 queueName wasn’t found

Share on Facebook

Thread Worker Pooling in Python

Programming No Comments »

The worker pool pattern is a fairly common tool for writing multi-threaded programs.  You divide your work up into chunks of some size and you submit the to a work queue.  Then there is a pool of threads that watch that queue for tasks to execute, and when complete, they add the jobs into the finished queue.

Here is the file.

Thanks to Global Interpreter Lock, threads are of somewhat limited usefulness in Python.  I foresee myself mostly using this for network limited tasks, like downloaded a large quantity of RSS feeds.  My idea is that tasks put into the system shouldn’t modify global state, so if I actually needed this for computational tasks, it may be feasible to build it on forks instead, or perhaps the 2.6 multiprocessing system.  However, I still use a lot of systems with only python 2.3 installed, so I’m not likely to want to write 2.6 specific code anytime soon.

Many of the thread pool systems I seem have you specify a single function for the pool, then you just enqueue the inputs.  Mine is different in that each item in the queue can be a different function.  I haven’t actually used it this way though, so it is possible that the extra flexibility is generally wasted.

Python’s lambda seem rather limited.  It is limited to a single expression.  I suppose that this is what Lisp and Scheme do as well, but their expressions offer things like progn.  My first idea is that the task to execute would be a function with no arguments.  I was picturing using a lambda to wrap up whatever I wanted to do.

Now, I still offer that via addTask and assume it internally, but I also offer addTaskArgs, and it takes a function reference, and either an argument list (as a list) or a named argument list (as a dict) and wraps it in a lambda to enqueue.

I now find that my knowledge about how to unit test threaded code is rather limited, and the included unit tests are extremely thin.

Share on Facebook

Seeking

Programming No Comments »

I am now looking for a new job and am no longer with Sigma Electronics.

My first preference would be a position writing software for post production or visual effects at either a software company or a post production or visual effects company.

Other than that, I am also interested in positions or contract work developing embedded systems, graphics applications, or web applications.

I just thought I would throw this out in case anyone can point me towards any leads.

Thank you.

Share on Facebook

About ReferURL

Programming, System Administration No Comments »

ReferURL.net is a link shorting service I created. You paste in a long URL (say to an eBay auction or newpaper article) and it gives you a short URL to use (http://referurl.net/123). You also have the option of picking an alias for a referurl, something like http://referurl.net/r/xxx. Also, a common usage pattern is a bookmarklet that you drag from the page to the toolbar. Whenever you click on the bookmarklet, it runs a bit of javascript code that submits the page you are currently looking at to ReferURL.

URL shortening services are great for emailing URLs to friends. Recently they are even more important for posting URLs on twitter (with only 140 characters, every character saved counts).

A service that does similar things called TinyURL.com has been around for a long time. Personally, I do not like tinyurl.com. I think it is ugly. There is another reason I remember disliking them, but it is possible that I have two services confused, so I won’t mention it. They also didn’t offer aliases when I wrote ReferURL.

I used to use another service, but it broke repeatedly, then when it had several months of downtime I decided to write my own. That service also didn’t support aliases.

At this point when I look around the new services that are similar, I see three things that may be better than ReferURL.

  1. Some services are prettier (of course, extra graphics means slower load times).
  2. Some services put the new shorter url into your clipboard buffer so that you don’t have to copy it yourself. I would love to add this, but as far as I can tell it is implemented with Flash, which I don’t own.
  3. With twitter, every character counts. There are now some services with names much shorter than referurl.net. TinyURL.com is one character shorter. Bit.ly is six characters shorter. If any one has an good idea for a name that is shorter than referurl.net, I would love to steal it. In my own twitter usage, I haven’t had trouble with the length of ReferURL yet though.

Anyway, those are my comments on the creation of ReferURL.net. For the time being, I plan to keep looking for ways to improve it and will keep working on it.

Also, I will be releasing the code for people who want to run/write their own service in the future. I had previously released some code, but now that it is several months old, I took it down until I had time to clean the current code for re-release. If someone were to email me asking about that, it would probably get me to do it sooner.  It is a Python project build on mod_python and PostgreSQL.

Share on Facebook

FreeType2 Usage Notes

Programming No Comments »

The motivation of this page is that the FreeType2 tutorial doesn’t
give you complete working code. Lots of people complain about the lack
of very short simple working example. I too felt that was lacking, so I
posted up my first simple program.

See: text2text

This program should compile on linux, Irix, or solaris. Minor Makefile
adjustments may be needed if files don’t appear in the same locations
for you as the do me. Last tested on Irix w/ MIPSpro, RHEL3, Ubuntu
Warty Warthog, and Solaris9 w/ SunStudio. Solaris with GCC will require
you adjusting your system so cc invokes gcc.

Anyway, the program demonstrates simple usage of freetype2.

Below is the main piece. It opens the library, loads the font file,
sets the text size, then renders out the characters. As written, it
just assumes the canvas will be large enough for the text.

If you extrapolate as it is written, you would have to render the
text twice if you want to create a canvas that is sure to hold the
string (say because you want to render to a picture). Also, the code
below offers no method for dealing with ligatures.

error = FT_Init_FreeType( &library );
if ( error )
{
  printf("an error occured\n");
  exit(-1);
} 

error = FT_New_Face( library, fileName, 0, &face );
if ( error == FT_Err_Unknown_File_Format )
{
  //... the font file could be opened and read, but it appears
  //... that its font format is unsupported
  printf("Face error\n");
  exit(-1);
}
else if ( error )
{
  //... another error code means that the font file could not
  //... be opened or read, or simply that it is broken...
  printf("other Face error\n");
  exit(-1);
}

error = FT_Set_Char_Size( face, /* handle to face object */
0, /* char_width in 1/64th of points */
48*64, /* char_height in 1/64th of points */
40, /* horizontal device resolution */
18 ); /* vertical device resolution */

slot = face->glyph; /* a small shortcut */

pen_x = 0;
pen_y = 0;
for ( n = 0; n < num_chars; n++ )
{
  unsigned char * buf;

  /* load glyph image into the slot (erase previous one) */
  error = FT_Load_Char( face, text[n], FT_LOAD_DEFAULT );
  if ( error ) continue; /* ignore errors */

  /* convert to an anti-aliased bitmap */
  //FT_RENDER_MODE_MONO
  error = FT_Render_Glyph( face->glyph,  FT_RENDER_MODE_NORMAL );
  if ( error ) continue; /* now, draw to our target surface */

  buf = slot->bitmap.buffer;

  for(j=0; jbitmap.rows; j++)
  {
    for(i=0; ibitmap.width ; i++)
    {
      int index = (j*slot->bitmap.width)+(i);
      unsigned char c;
      //printf("index: %d offset %d\n", index/8, index%8);
      c=buf[index];

      pen_y = height -3-slot->bitmap_top;

       if (c>0 && c<128) output[(j+pen_y)*width+(i+pen_x)] = '.';
       if (c>127) output[(j+pen_y)*width+(i+pen_x)] = 'x';
    }
  }

  /* increment pen position */
  pen_x += slot->bitmap.width+2;
}

The big change I’m working on is making it use an array FT_Glyph
structures to be able to hold the string of glyphs. In
psuedo code, that would look something like this:

FT_Glyph * glyphs;
int width=0;
//setup array.

for (/*each char in char string*/)
{
  error = FT_Load_Char( face, string[i], FT_LOAD_NORMAL );
  error = FT_Get_Glyph( face->glyph, &(glyph[i]));
  error = FT_Render_Glyph( &(glyph[i]),  FT_RENDER_MODE_NORMAL );
  width += glyph[i]->bitmap.width + SPACE_BETWEEN_CHARS;
}

//create canvase

for (/*each char in char string*/)
{
  //do something with each render'd glyphs bitmap to copy to destination

  FT_Done_Glyph(glyphs[i]);
}

free(glyphs);

Keep in mind that is incomplete code. It hasn’t really been written
up and tested yet.

And I still don’t know how to deal with ligatures, kerning, and all
sorts of other features. However, if I implement the pseudo code above,
I expect I’ll have everything I need for now.

 

Share on Facebook
WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in