marzo 2011 – Planeta UNPLUG

I’ve lately spent some time developing a webapp for mobile devices to interact with some of the data published by the Gijón City Council. More specifically, data about local public land transportation schedule and live arrivals. The way they are presenting that information for mobile devices at the moment is very very heavy and slow, so I thought it may be useful to do something simpler for personal usage.

Basically, it is a simple web service that intensively caches data (to avoid stressing the data origin with many requests) and a fancy AJAX-powered frontend with some CSS with mobile browsers in mind (works flawlessly on Android’s browser and Mobile Safari). Additionally, if you add it as a bookmark to your iPhone’s home screen it behaves like a native application (you know, splash screen, custom icon, taskbar and so on).

I’m now working on client-side caching using HTML5 caching for offline usage. This way the application will boot way faster. It’s almost done, but it still needs some debugging.

I don’t intend to make it public for now. However, if you find it useful feel free to drop me a line. Beta testers are always welcome (but unfortunately won’t be rewarded).

This is how it looks like at the moment. The source will be released soon.

Update (23:26): Android screenshots provided by Javier Pozueco. Thanks buddy!

Hace unas semanas hemos comprado directamente el fabricante (Asia) un nuevo modelo de ZeroClient o también conocido como MultiPoint/MultiSeat: MWS8840 Afortunadamente el chipset es el mismo que el anterior MWS300 y funciona correctamente. Es más, con gran sorpresa veo que los vídeos de Youtube funcionan a pantalla completa . El dispositivo tiene una entrada USB (que se conecta al servidor de terminales) una entrada de alimentación (5V - 3A), entrada de micro, salida de audio y 4 puertos frontales USB 2.0 (en dos de ellos se conecta el teclado y el ratón). Además este modelo trae una peana para colgarlo en la parte trasera de un monitor TFT o ponerlo en vertical. Aquí podeis ver unas cuantas fotos. El tamaño de la caja es de aproximadamente 10cm x 10cm En breve lanzaremos desde Thinetic una completa y económica solución comercial con estos aparatos tan prometedores, y todo ello con Software Libre .

Google search box completion

*lol*

Nowadays it’s very usual to find websites offering hints while you’re typing on a search box. Google is a pretty good example of it. But, how could it be implemented?

This feature could be implemented either in the client side or in the server side. If the word list is big (usually it is), it’s recommended to keep the lookup logic in the server side to save some bytes while transferring the page to the client and also to save some computing power using server-side caches (cool when you plan to serve many requests).

Either way, there should be a data structure somewhere containing the word list and an algorithm to do the lookup. The simplest approach may be to use a list to store the words and issue something like this when you want to get a list of hints for a given prefix:

filter(lambda x: x.startsWith(prefix), word_list)

That’s Python’s filter, but it works the same way the well-known Haskell’s first-order function filter does. It builds a new list with the elements of the original list (word_list) that match the predicate (the lambda function).

Although the results can (and should) be cached, the very first lookup (or when the cache expires) would be very inefficient because the entire list must be traversed and that operation will take linear time. Not bad, but when the size of the problem gets bigger (i.e. more and more words in the database) the lookup process may be too slow, especially whether you’re serving several users at the same time. If the list was sorted, the execution time could be improved a little bit by writing a more sophisticated algorithm, but let’s keep it that way for now.

Fortunately, there are better and faster ways to face the problem. If you don’t want to write code (usually the best choice) you may use some high-performance indexing engine such as Apache Lucene. But if you prefer the ‘do-it-yourself’ way (for learning purposes), a search tree (more specifically, a trie or a prefix tree) is a good approach.

I’ve poorly benchmarked both alternatives (the list and the tree) and as expected the tree is pretty quicker generating hints. What I did was to feed both data structures with the content of an American English word list holding ~640k words (debian package wamerican-insane).

So, assuming four is a reasonable minimum prefix length, I measured the time it would take to get a list of words prefixed by hous (yes, just one, remember I said this was a poor benchmark? ;). Unsurprisingly, it took around 230 times longer for the list alternative to generate the hints (438.96 ms vs 1.92 ms). Wow.

My implementation of the tree is as follows. The API is quite straightforward, the “hot” methods are put and get_hints. I’ve stripped off the test suite for space reasons.

Usage example:

>>> tree = HintSearchTree()
>>> tree.put("nacho")
>>> tree.put("nachos")
>>> tree.put("nachete")
>>> tree.get_hints("nach")
['nachete', 'nacho', 'nachos']
>>> tree.get_hints("nacho")
['nacho', 'nachos']
>>> tree.delete("nacho")
>>> tree.get_hints("nacho")
['nachos']
>>> tree.count_words()
2
>>> tree.get_hints("n")
['nachete', 'nachos']
>>> tree.is_indexed("nachete")
True
>>> tree.is_indexed("nach")
False
>>> tree.empty()
False

class HintSearchTreeNode(object):
class HintSearchTreeNode(object):
  def __init__(self, parent=None, terminal=False):
    self._children = {}
    self._terminal = terminal
    self._parent = parent
 
  @property
  def children(self):
    return self._children
 
  @property
  def terminal(self):
    return self._terminal
 
  @terminal.setter
  def terminal(self, value):
    self._terminal = value
 
  @property
  def parent(self):
    return self._parent
 
class HintSearchTree(object):
  def __init__(self):
    self._root = HintSearchTreeNode()
 
  def put(self, word):
    """Adds a word to the tree."""
    # TODO: Sanitize 'word'
    if len(word) &gt; 0:
      self._put(self._root, word)
 
  def count_words(self):
    """Retrieves the number of indexed words in the tree."""
    return self._count_words(self._root)
 
  def is_indexed(self, word):
    """Returns True if 'word' is indexed."""
    node = self._find(self._root, word)
    return node is not None and node.terminal is True
 
  def get_hints(self, prefix):
    """Returns a list of words prefixed by 'prefix'."""
    return self._match_prefix(self._root, prefix)
 
  def delete(self, word):
    """Deletes 'word' (if exists) from the tree."""
    terminal = self._find(self._root, word)
    if terminal is not None:
      terminal.terminal = False
      self._prune(terminal.parent, word)
 
  def empty(self):
    """Returns True if the tree contains no elements."""
    return len(self._root.children) == 0
 
  def _put(self, node, word, depth=0):
    next_node = node.children.get(word[depth])
    if next_node is None:
      next_node = HintSearchTreeNode(parent=node)
      node.children[word[depth]] = next_node
    if len(word)-1 == depth:
      next_node.terminal = True
    else:
      self._put(next_node, word, depth+1)
 
  def _count_words(self, node):
    words = 1 if node.terminal is True else 0
    for k in node.children:
      words += self._count_words(node.children[k])
    return words
 
  def _match_prefix(self, node, prefix):
    terminal = self._find(node, prefix)
    if terminal is not None:
      return self._harvest_node(terminal, prefix)
    else:
      return []
 
  def _harvest_node(self, node, prefix, path=""):
    hints = []
    if node.terminal is True:
      hints.append(prefix + path)
    for k in node.children:
      hints.extend(self._harvest_node(node.children[k], prefix, path+k))
    return hints
 
  def _find(self, node, word, depth=0):
    if depth == len(word):
      return node
    else:
      child = node.children.get(word[depth])
      if child is not None:
        return self._find(child, word, depth+1)
      else:
        return None
 
  def _prune(self, node, word):
    if self._count_words(node.children[word[-1]]) == 0:
      del node.children[word[-1]]
      if len(node.children) == 0 and node.parent is not None \
          and node.terminal is not True:
        self._prune(node.parent, word[:-1])

The code is released in the public domain.

Exportar tu shell script a HTML, con sintaxis resaltada en colores es posible con la ayuda de Vim y este comando:

vim -f +"syn on" +"run! syntax/2html.vim" +"wq" +"q" hola.sh

Esta linea convierte hola.sh a hola.sh.html, el parámetro -f forza a Vim a permanecer en el primer plano, cosa que es útil si Vim va a ser utilizado por otro programa como un cliente de correos o en nuestro caso este script.

Activar el coloreado de sintaxis y ejecutar el plugin 2html.vim seria la segunda parte de este script. Aquí pasa algo peculiar, plugin 2html.vim divide la pantalla de Vim donde la parte superior es el código HTML y la inferior es el documento original. Seguidamente se le envía wq para salvar y salir de la primera ventana y se le envía q para que termine con la ventana restante.

La salida de esta linea seria hola.sh.html y luce así:

<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title>/tmp/hola.sh.html</title>
<meta name="Generator" content="Vim/7.4">
<meta name="plugin-version" content="vim7.4_v1">
<meta name="syntax" content="bash">
<meta name="settings" content="use_css,pre_wrap,no_foldcolumn,prevent_copy=">
<meta name="colorscheme" content="none">
<style type="text/css">
<!--
pre { white-space: pre-wrap; font-family: monospace; color: #000000; background-color: #ffffff; }
body { font-family: monospace; color: #000000; background-color: #ffffff; }
* { font-size: 1em; }
.Comment { color: #0000c0; }
.Constant { color: #c00000; }
.Statement { color: #af5f00; }
-->
</style>

<script type='text/javascript'>
<!--

-->
</script>
</head>
<body>
<pre id='vimCodeElement'>
<span class="Comment">#!/bin/bash</span>
<span class="Comment"># orvtech.com</span>
<span class="Statement">echo</span><span class="Constant"> </span><span class="Statement">&quot;</span><span class="Constant">Este script sera convertido</span><span class="Statement">&quot;</span>
<span class="Statement">echo</span><span class="Constant"> </span><span class="Statement">&quot;</span><span class="Constant">a HTML usando Vim</span><span class="Statement">&quot;</span>
</pre>
</body>
</html>
<!-- vim: set foldmethod=manual : -->

#!/bin/bash
# orvtech.com
echo "Este script sera convertido"
echo "a HTML usando Vim"

Incorporar este script en un bucle para que convierta todos los shell scripts a HTML es fácil, con un bucle for, lo podemos hacer, quedaría así:

for ARCHIVO in `ls *.sh`
  do echo -e "Convirtiendo $ARCHIVO a HTML"  vim -f +"syn on" +"run! syntax/2html.vim" +"wq" +"q" $ARCHIVO
done