Post-processing OPTICSxi Clustering

On a previous post, I expressed my concerns regarding the results of OPTICSxi clustering. Namely, I mentioned an “annoying” spike effect, that turns out massively almost at any simulation (so massively that it is almost a “feature”).

A post in StackOverflow, originated an useful exchange of ideas with the author of the ELKI framework. Namely he pointed out a “weakness” of the algorithm, that basis its partition on the reachability distance, which is not always a synonymous of “spatial closeness”. Literally, outliers that are standing in the middle of clusters, could be erroneous misinterpretated as belonging to a cluster or the other.

Since this is a problem on the partition algorithm, the solution could pass through improving the partition algorithm, using a different partition algorithm, or using a different cluster algorithm all together.  As a “quick fix” I opted to some cluster “post-processing”, in order to remove the outliers.

So my research question was: how to identify a point that is a spatial outlier?

I tried a couple of approaches that I will discuss now, as I think they may be useful for someone or generate an useful discussion.

Image

On the image above, the coloured points are outliers, according to our definition. One simple approach, would be to calculate the average path length, the average distance of one point, to all other points in the point cloud. Then we could test the distance of each point, and say if it is greater than a certain value, let us say 5,  we would consider it an outlier and remove it from the dataset. I found this approach actually yield good results, and was able to reduce the “spikes” that we see in this figure:

Image

To this results:

Image

The code that calculates the average distance for each point, is bellow:

public static double getAverageDistance(Coordinate coord, ModifiableDBIDs ids, HashMap dataMap){

double sum=0.0;

try{
for (DBIDIter iter = ids.iter();
iter.valid(); iter.advance())
sum+=coord.distance(dataMap.get(DBIDUtil.toString(iter)));
}
catch( Exception  exp) {
System.out.println( "Unexpected exception:" + exp.getMessage() );
}
return sum/ids.size();
}

From the point of view of “correctness” this algorithm suffers of the “bottleneck” of removing the outlier from the clusters (and thus converting it into “noise”). To avoid this, it would have to be tested if the point actually belongs to another cluster.

Apart from that, in terms of computation the algorithm is extremely “costly” , being the most costly part when it computes the distances from all points to all points, literally a matrix of NxN that can easily increase to huge numbers with a large cluster.

To avoid that, I tried a few different approaches. One was to calculate this distances for a part of the dataset. We know that by the way the algorithm is written, the outliers tend to “appear” either in the beginning or the end of the cluster. Having this in mind, I calculated the average based on the 60% “middle” values (n<20% and n>20%) and tested the condition for the first 20% and last 20% (this refinement was actually not needed as the “costly” part of the algorithm is the distance matrix and not the condition testing). I was not able to reach any reasonable results with this approach, either because the points were not ordered (which defeats the all purpose of my “slicing”) or because outliers were appearing outside these “classes” (i.e.: in the middle of the dataset).

The other approach that I tried was to work with the “final” polygon (the convex hull of the cluster), rather than the raw points. The polygon border has only a few points, when compared to the point cloud used to generate it. It is very easy and quick to identify the “outlier” in the polygon border; however removing it, will understandable result in a “strange polygon”, since the reality is: if that point would not be used to build the polygon, another point (that we don’t have right now!) would be used, and so the real geometry would not be this one. This is particularly noticeable when we see overlapping clusters (which don’t overlap anymore). After this experience, it became clear that the processing would have to be done in the cluster point dataset, before building the polygon.

I ended up with an algorithm that successfully removes the “spike” effects from OPTICSxi, but is rather costly in terms of time (more costly than OPTICSxi itself) and unfortunately this grows exponentially with the size of the dataset, which limits its application with big data.

UPDATE

The approach above was improved, by testing the distances against the 2 neighbours of each point (previous and next point) rather than against the entire matrix. With this hack, the running time of the algorithm was reduced to reasonable values, that don’t grow up so much with the size of the vector.

Advertisements

Encrypting Passwords on the Client Side

I recently implemented an authentication system, supported by a table on the database. In this table, I store the usernames and passwords.
The PostgreSQL “pgcrypto” module supports encryption of specific columns, but I wanted to avoid “bulking” my system with more add-ons; also, I was not sure about trusting the administrator of the database. For all these reasons, I decided to go for client-side encryption.

As I am using the QT API, my first guess was to look for encryption related functionality on this framework.

Since Qt 4.3., there is a class called QCryptographicHash, that provides a way to generate cryptographic hashes, from binary or text data. It supports a couple of “popular” message-digest algorithms such as MD5, MD4 or SHA1. This is interesting, but not usable in my case since “hashing” is a “one-way process”: you can create an hash from data, but not “decrypt” it back to the original value.

For the purpose of encrypting and decrypting data, there is the QCA library, which is based on SSL and therefore uses asymmetric cryptography. I had a quick look at this library and it looks pretty powerful; however, I was in a “quest” to avoid installing libraries, so I decided to look for another alternative. I also have to add that I am not in need to achieve “bullet proof” encryption, but solely to keep my data hidden from “curious eyes of users”, that don’t know much about programming or cryptography. *If you want strong data encryption, then QCA is the way to go*.

Finally I stumbled upon SimpleCrypt (Simple Encryption), a class developed by a Qt user to protect against “casual observation”: just what I was looking for.
SimpleCrypt provides symmetric encryption (the same key is used for encode and decoding the data). This key is a 32 bits quint64, built into the program, on the SimpleCrypt class.

SimpleCrypt crypto(Q_UINT64_C(5ebed0982db747741f47)); //some random number

It can be used to encode strings, as well as streams of binary data. More details about the encryption algorithm can be found here.

In my application, I use a QDataWidgetMapper, to map different widgets such as line boxes, to fields in the database. In the case of the “password” field, the mapping won’t work since the user will type and see a password that is different from the cyphertext stored in the database table. To prevent this, I need a “proxy” between the database and widget; this proxy is the QItemDelegate.
By reimplementing this class and apply it to my widgets, I can “force” the application to transform the values that it writes/reads in the database, by applying the encrypting algorithm.
This would be my the implementation for the item delegate:

#include "passwddelegate.h"
#include "simplecrypt.h"

SimpleCrypt crypto(Q_UINT64_C(5ebed0982db747741f47)); //some random number

PasswdDelegate::PasswdDelegate(QList<int> colsOthers, QList<int> colsText, QList<int> colsPass,QObject *parent):
    NullRelationalDelegate(colsOthers,colsText,parent), m_colsPass(colsPass)
{

}
void PasswdDelegate::setModelData(QWidget *editor, QAbstractItemModel *model, const QModelIndex &index) const
{
    if (m_colsOthers.contains(index.column()) || m_colsText.contains(index.column()) ){
        NullRelationalDelegate::setModelData(editor,model,index);
    }else if (m_colsPass.contains(index.column())){

            model->setData(index, crypto.encryptToString(editor->property("text").toString()));

    }
    else QSqlRelationalDelegate::setModelData(editor,model,index);
}

void PasswdDelegate::setEditorData(QWidget *editor, const QModelIndex &index) const
{
    if (m_colsOthers.contains(index.column()) || m_colsText.contains(index.column())){
        NullRelationalDelegate::setEditorData(editor,index);
    }else if (m_colsPass.contains(index.column())){

           editor->setProperty("text", crypto.decryptToString(index.data().toString()));

    }
    else QSqlRelationalDelegate::setEditorData(editor,index);
}

The method “setModelData” is called each time that we want to write values in the encrypted field, and it transforms the plain text password into a cyphertext, by applying the compiled key. The method “setEditorData” on the other hand, transforms the cyphertext in the database into plain-text, by applying the same key.

This is how the pasword looks to the user:

cypher1

And this is what is actually stored on the database:

cypher2

SimpleCrypt provides a simple method for “cyphering” user passwords in the database; it is not “unbreakable” but it is one extra level of security, and it is pretty easy to use, not requiring the installation of any extra libraries (you may download simplecrypt.h and cimplecrypt.cpp from here). The QItemDelegate is a class that allows us to control what the user “sees”, while still benefiting from the advantages of using a QDataWidgetMapper. In this case, it works as a charm.

Quick guide to Auditing a (postgreSQL) Database

According to Wikipedia, ‘Database auditing’ involves observing a database so as to be aware of the actions of database users. A bit like ‘spying on the user activity’.
This of course, could be useful, if you have a database with multiple users and store some sort of ‘confidential’ information.

spy

Previously I implemented some code to do this in SQL Server, that basically involved:

  • creating a table to store the audit information.
  • create an ‘encode scheme’ for this table (i.e.: how to distinguish ‘edits’ from ‘removes’, etc).
  •  creating triggers for every table that I wanted to audit, that do the ‘house-work’ (these were of course, generated from a script)

Moreover, these ‘encoded changes’ where exported to JSON, as they were the basis for a synchronization system that I implemented,

This was working fine, until I had to port the database to another RDBMS, which gave me the opportunity to rethink the structure that I had in place.
Instead of porting directly the code from T-SQL to PSQL, I did a little research and (gladly!) found out that Postgres had an ‘inbuilt’ support for audit. It took me about 10 minutes, to ‘get grips with it”, which is what I describe next.

First thing would be to create a table to store the ‘changes’. The PostgreSQL wiki, actually advises to use a different schema, which I I agree is a good idea.

-- create a schema named "audit"
CREATE schema audit;
REVOKE CREATE ON schema audit FROM public;

CREATE TABLE audit.logged_actions (
schema_name text NOT NULL,
table_name text NOT NULL,
user_name text,
action_tstamp timestamp WITH time zone NOT NULL DEFAULT current_timestamp,
action TEXT NOT NULL CHECK (action IN ('I','D','U')),
original_data text,
new_data text,
query text
) WITH (fillfactor=100);

REVOKE ALL ON audit.logged_actions FROM public;

The stored information is: the relevant schema and table names, the username (so it is important to enforce a user policy here) and a timestamp. Then we also have the ‘type’ of action, that is one of the following: insert, delete or update. This is not supporting triggers or selects, which I guess it’s ok. Then we have the previous value and current value. For inserts, we have a change from nothing to something; for deletes we have a change from something to nothing and from updates we have a change of something to something. In a way, you could figure out the change type from looking at these values, so the ‘action’ field is perhaps a bit redundant; but then, you would have to represent ‘nothing’ as a sort of special value (or keyword) and not an empty space (that could be misunderstood by an empty string).
Finally we have the ‘query’, which is the exact query that triggered this audit. Although this is also a bit redundant, since it could be reconstructed from the other values, it is not a bad idea since it allows to quickly see/reproduce exactly what happened.

Then we can create some indexes:

CREATE INDEX logged_actions_schema_table_idx
ON audit.logged_actions(((schema_name||'.'||table_name)::TEXT));

CREATE INDEX logged_actions_action_tstamp_idx
ON audit.logged_actions(action_tstamp);

CREATE INDEX logged_actions_action_idx
ON audit.logged_actions(action);

The next step is to create the trigger to ‘fill’ this table, on relevant actions:

CREATE OR REPLACE FUNCTION audit.if_modified_func() RETURNS TRIGGER AS $body$
DECLARE
v_old_data TEXT;
v_new_data TEXT;
BEGIN
/*  If this actually for real auditing (where you need to log EVERY action),
then you would need to use something like dblink or plperl that could log outside the transaction,
regardless of whether the transaction committed or rolled back.
*/

/* This dance with casting the NEW and OLD values to a ROW is not necessary in pg 9.0+ */

IF (TG_OP = 'UPDATE') THEN
v_old_data := ROW(OLD.*);
v_new_data := ROW(NEW.*);
INSERT INTO audit.logged_actions (schema_name,table_name,user_name,action,original_data,new_data,query)
VALUES (TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),v_old_data,v_new_data, current_query());
RETURN NEW;
ELSIF (TG_OP = 'DELETE') THEN
v_old_data := ROW(OLD.*);
INSERT INTO audit.logged_actions (schema_name,table_name,user_name,action,original_data,query)
VALUES (TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),v_old_data, current_query());
RETURN OLD;
ELSIF (TG_OP = 'INSERT') THEN
v_new_data := ROW(NEW.*);
INSERT INTO audit.logged_actions (schema_name,table_name,user_name,action,new_data,query)
VALUES (TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),v_new_data, current_query());
RETURN NEW;
ELSE
RAISE WARNING '[AUDIT.IF_MODIFIED_FUNC] - Other action occurred: %, at %',TG_OP,now();
RETURN NULL;
END IF;

EXCEPTION
WHEN data_exception THEN
RAISE WARNING '[AUDIT.IF_MODIFIED_FUNC] - UDF ERROR [DATA EXCEPTION] - SQLSTATE: %, SQLERRM: %',SQLSTATE,SQLERRM;
RETURN NULL;
WHEN unique_violation THEN
RAISE WARNING '[AUDIT.IF_MODIFIED_FUNC] - UDF ERROR [UNIQUE] - SQLSTATE: %, SQLERRM: %',SQLSTATE,SQLERRM;
RETURN NULL;
WHEN OTHERS THEN
RAISE WARNING '[AUDIT.IF_MODIFIED_FUNC] - UDF ERROR [OTHER] - SQLSTATE: %, SQLERRM: %',SQLSTATE,SQLERRM;
RETURN NULL;
END;
$body$
LANGUAGE plpgsql
SECURITY DEFINER
SET search_path = pg_catalog, audit;

The ‘TG_OP’ variable inside a trigger, is a string that tells us for which operation the trigger was fired (INSERT, UPDATE, or DELETE). The rest is fiddling around with the ‘old’ and ‘new’ values.

Actually this is all to it, regarding having the audit ‘structure’ in place for postgresql. To put it ‘in action’, auditing a table is as simple as this:

CREATE TRIGGER fr_frame_audit
AFTER INSERT OR UPDATE OR DELETE ON fr_frame
FOR EACH ROW EXECUTE PROCEDURE audit.if_modified_func();

In the example above, I created a trigger for auditing table ‘fr_frame’ (of course you can create a sp procedure to generate these statements, if you want to generate a trigger for auditing every table in the database…).
Then I went to table ‘fr_frame’ and deleted a row. It got stored like this:

"public";"fr_frame";"postgres";"2014-02-11 12:36:52.063113+00";"D";"(67,"bin frame",missing,missing,1,1,missing)";"";"DELETE FROM public.fr_frame WHERE id = '67'::integer"

Then I modified and added a row; it got stored like this:

"public";"fr_frame";"postgres";"2014-02-11 12:38:54.793902+00";"U";"(59,"Sampling Frame",missing,"Initial Sampling frame",1,2,missing)";"(59,"Sampling Frame","Sampling Frame","Initial Sampling frame",1,2,missing)";"UPDATE public.fr_frame SET nameeng='Sampling Frame'::text WHERE id = '59'::integer"
"public";"fr_frame";"postgres";"2014-02-11 12:39:14.290898+00";"I";"";"(68,test,test,test,1,2,test)";"INSERT INTO public.fr_frame(name, nameeng, description, id_cloned_previous_frame, id_source, comments) VALUES ('test'::text, 'test'::text, 'test'::text, '1'::integer, '2'::smallint, 'test'::text)"

As I said before, the encode of non-existing values (nothing) as an empty string is not the most accurate approach, but since we have more information to complement it, it works. Also, the fact that it is row-based rather than field-based (as I had in my implementation), originates the serializing arrays, which is not exactly normalized… on the other hand, I can accept it from the point of view of storage.

Overall I think I accept it as a good solution, measuring all the pros and cons, and high fives for being so simple to implement.

You can find this information (and more), in the PostgreSQL wiki.

Building Cross-plattform Applications (for real!)

I have been writing many posts about the Qt library, without making a proper introduction.
Qt is a cross-plattform C++ framework, that provides support for a lot of things such as UI, multi-threading, Graphics, etc. Since C++ itself does not have so many libraries built in as you would find on other libraries, it is a good idea to use something like this. There are other frameworks specific things such as graphics (for instance GTK+), but I don’t know of any as complete as Qt. The .NET framework is quite complete, but unlike Qt it is not cross-platform, neither it has a permissive GPL/LGPL license, so if you care about these two things it is clearly *not* an option.

Having said this, I am quite happy with the Qt libraries in the long run, even things are not as easy as they would seem.

The Windows and Linux environment are different enough, even if you stick to C++ and Qt.
In my Linux environment, I use g++ make (the default compiler), and let the system (aka package manager) to decide what is the most appropriated version. It actually takes care of all the environmental variables and I do not have to worry too much about setting a build environment, whether I use Qt creator (the “native” Qt IDE) or just the command line.
On the other hand, Windows has got a couple of options regarding compilers:

  • there is the mingGW compiler, which people use for portability (but I don’t particularly like it since you need to setup a whole set of tools, that are not native on Windows)
  • there is the native Microsoft Visual Studio toolchain, which is by excellence, a “Windows compiler”
  • there is the Qt creator “jom” compiler, which allows using multi core, but somehow I am a bit reluctant in using it, because I don’t see anybody using it outside Qt creator.

The thoughts and “suspicions” above, are nothing else but that: thoughts and suspicions. Since I have been programming in Windows for quite a while using Microsoft Visual Studio and I am quite happy about it, I just decided to use it in my Windows projects.

If you want to stick to basic configurations, the differences between Windows and Linux projects may be small, but as soon as you start to complicate them a bit it is not so simple anymore. Recently I decided to link my application statically, in order to ease deployment in a complete “user-proof” scenario. These required rebuilding Qt statically, in both OS.
After successfully compiling Qt, I tried to build my project on Linux using Qt creator and it “worked as a charm”. I did not have to change any of the environment variables, but only have to point to which version of Qt I wanted to use inside Qt creator. There were only a few things that I had to keep in mind:

  • My application uses a plugin, so I had to also link it statically (creating a .la)
  • There were some minor changes in the qmake project files, *both* of the plugin and the application, in order to compile them statically.
  • It is better to clean/rebuild the projects, so that we don’t run into the risk of having files *left* from a previous dynamic linking. From my experience, “make clean” is not so tidy, so I would recommend going inside the directories and remove the .obj, or any other intermediate files by hand…

This is the only change that I had to make in the project file of my designer plugin

dynamic linking:

CONFIG += debug_and_release

static linking:

CONFIG += release staticlib

(with static linking, there is generally no point on debugging, so I am generating only a release build)

Then I could compile my application, and link against the static versions of Qt and of this plugin, by slightly modifying the project file:

dynamic linking:

CONFIG += debug_and_release

static linking:

CONFIG += release static

And that was about it: I got a binary file with 26.9 MB, that I can take with me to any Ubuntu system 🙂

Now the Windows part was a bit more painful. First I could not get the Qt Creator to work, since it did not correctly pick up the VC+ settings, and complained about the static version of Qt not having been built with the same compiler I was trying to use. To speed up things, I decided to compile the application on the command line, more specifically inside the Visual studio shell, which is the same place where I compiled Qt.

I had some persistent linking errors regarding a DLL linkage with my plugin. I read somewhere that by default in Windows, qmake will attempt a dll linkage unless its explictly told otherwise; thus I added this flag to the DEFINES of my project files:

DEFINES += QT_NODLL

In Linux I did not run into such a problem. In any case, it did not solve the linking errors. After researching a bit more, I found out that to call a plugin statically, you have to invoke a specific macro on the “main” of the application, *and* include QtPlugin. Another thing that was not necessary in Linux…

#include <QtPlugin>;

Q_IMPORT_PLUGIN(catchinputctrl)

And, finally the plugin has to be *explicitly* added to the project file, in this way:

QTPLUGIN += catchinputctrl

I still had linking errors, this time regarding *not* finding the plugin library; the directive that I had in Linux did not worked, and I messed around a bit with the “-L” and “-l” options in LIBS but without success, so I ended up copying the .lib file to my project directory (a quick fix). After that, I could generate the binary, but not without some linking warnings:

Creating library ..\release\faocas.lib and object ..\release\faocas.exp
frmcatch.obj : warning LNK4217: locally defined symbol ??0CatchInputCtrl@@QAE@PA
VQWidget@@@Z (public: __thiscall CatchInputCtrl::CatchInputCtrl(class QWidget *)
) imported in function "public: void __thiscall Ui_FrmCatch::setupUi(class QWidg
et *)" (?setupUi@Ui_FrmCatch@@QAEXPAVQWidget@@@Z)
frmoperation.obj : warning LNK4049: locally defined symbol ??0CatchInputCtrl@@QA
E@PAVQWidget@@@Z (public: __thiscall CatchInputCtrl::CatchInputCtrl(class QWidge
t *)) imported
frmtrip.obj : warning LNK4049: locally defined symbol ??0CatchInputCtrl@@QAE@PAV
QWidget@@@Z (public: __thiscall CatchInputCtrl::CatchInputCtrl(class QWidget *))
imported
frmcatch.obj : warning LNK4217: locally defined symbol ??1CatchInputCtrl@@UAE@XZ
(public: virtual __thiscall CatchInputCtrl::~CatchInputCtrl(void)) imported in
function "public: virtual void * __thiscall CatchInputCtrl::`scalar deleting des
tructor'(unsigned int)" (??_GCatchInputCtrl@@UAEPAXI@Z)
frmoperation.obj : warning LNK4049: locally defined symbol ??1CatchInputCtrl@@UA
E@XZ (public: virtual __thiscall CatchInputCtrl::~CatchInputCtrl(void)) imported

frmtrip.obj : warning LNK4049: locally defined symbol ??1CatchInputCtrl@@UAE@XZ
(public: virtual __thiscall CatchInputCtrl::~CatchInputCtrl(void)) imported
mt.exe -nologo -manifest "release\faocas.intermediate.manifest" -outputr

The output dir contained my executable, as well as a exports library file and a copy of the lib. These are unnecessary, and I can happily copy my 12.2MB binary around, without having to ship anything else.

Some questions that remain in my mind:

  • Why are the binaries generated by Windows and Linux so different in size? (one almost *doubles* the size of the other)
  • Why is the static and dynamic linking in Windows so different?
  • Why is the static linking in Windows so different from the one in Linux, and why is this so undocumented? (does anybody use it at all??)
Static linked app in Windows

Static linked app in Windows

Static linked app in Linux

Static linked app in Linux

Cross-compiling, using Qt

On a previous post I described a simple example, of how-to cross compile an Windows application in a Linux environment.
Although this worked fine and raised my motivation, it is not a very useful example since the real world is much more complex. One of the complexities that I want to introduce is the use of Qt libraries, that are used thoroughly in my projects. I will now go in detail, through a slightly more complicated “Hello World”, where I make use of these libraries.

A key thing to understand, when cross compiling with Qt, is that you need to setup the development environment, both in the host and in the target OS (those are in this case: Ubuntu and Windows).

I started by downloading the Qt distribution, compiled for MinGW (of course, you may also decide to download the source codes and compile it yourself, which will add another complexity layer to this task…):

wget http://download.qt-project.org/official_releases/qt/4.8/4.8.5/qt-win-opensource-4.8.5-mingw.exe

Then I ran this with Wine, and followed the installation instructions:

wine qt-win-opensource-4.8.5-mingw.exe

Qt got installed under my “wine drive”, in the $HOME directory:

~/.wine/drive_c/Qt/4.8.5

As I mentioned before, you also need a working Qt environment for Linux. I had Qt 4.8.7 installed in my machine, and I stupidly thought I could use that; it turns out I can not. The Qt versions in Windows and Linux need to match! Since MinGW did not released a 4.8.7 version yet, I decided to download the Linux version of Qt 4.8.5, and installed it along with my current version (let us see what problems this may bring me in the future, if I keep developing in 4.8.7!).
If you did not understand my grump in the last paragraph, the only thing you need to know is that I downloaded Qt 4.8.5 for Linux, and installed it in:

/usr/local/Trolltech/Qt-4.8.5

The next step is to configure the mkspecs for cross-compiling; for that you need to navigate to /usr/local/Trolltech/Qt-4.8.5/mkspecs and create a new directory there. I called it “win32-x-g++”.

/usr/local/Trolltech/Qt-4.8.5/mkspecs/win32-x-g++

Inside this directory, you need to place a version of qmake.conf, with all the necessary information for cross compiling; mainly you need to pay attention to the paths of the Qt installed in Wine, and the minGW paths in Linux. Here is a working version for me; please amend it to reflect your paths.

Then I wrote a very simple Qt “Hello World”:

// Hello World in C++ for the Qt framework
#include 
#include 
int main(int argc, char *argv[])
{ 
	QApplication a(argc, argv); 
	QLabel l("Hello World!", 0); 
	l.setText("Test"); 
	l.setAlignment(Qt::AlignCenter); 
	l.resize(300, 200); 
	l.show();
 return(a.exec());
}

This is just a Window with “Hello World” written, but it is calling the Qt Libraries.
My Qt project file, looks like this (I generated it with qmake, and then added the relevant libraries to the Qt variable):

######################################################################
# Automatically generated by qmake (2.01a) Fri Dec 20 13:18:06 2013
######################################################################

TEMPLATE = app
TARGET = hello
QT += core gui
CONFIG += qt
DEPENDPATH += .
INCLUDEPATH += .

# Input
SOURCES += hello.cpp

I make a small parenthesis here, to note that the qmake version should be the one on:

/usr/local/Trolltech/Qt-4.8.5

If you do have more than one Qt version (like I do!), it is worth to set the $QTDIR variable to this directory, while cross compiling. You can check if this working as expected, by checking the path of qmake:

which qmake

If this points to /usr/local/Trolltech/Qt-4.8.5, then everything is ok; otherwise, please take a moment to export your $QTDIR.

To generate the makefiles, using the created mkspecs file above, you can use the following syntax:

qmake -spec win32-x-g++

Then you can compile the application, using make. If the compilation was successful, you should be able to run the executable “hello.exe”, using wine. When I attempted to do this, I had some errors that related to not finding some libraries. As in my previous example, I copied “mingwm10.dll” to the current directory, but I still had errors related to Wine not finding the Qt libraries.

I edited the QTDIR and PATH variables on Wine(Windows), by calling regedit.

regedit

Unfortunately that still did not do it for me. I thought a quick fix, for forcing the application to pick the libraries would be to copy the required DLL’s to the app path. Unfortunately that also did not worked 😦

err:module:import_dll Library libgcc_s_dw2-1.dll (which is needed by L"Z:\\home\\joana\\projects\\cross-compiling\\hello\\release\\QtCore4.dll") not found
err:module:import_dll Library QtCore4.dll (which is needed by L"Z:\\home\\joana\\projects\\cross-compiling\\hello\\release\\hello.exe") not found
err:module:import_dll Library libgcc_s_dw2-1.dll (which is needed by L"Z:\\home\\joana\\projects\\cross-compiling\\hello\\release\\QtCore4.dll") not found
err:module:import_dll Library QtCore4.dll (which is needed by L"Z:\\home\\joana\\projects\\cross-compiling\\hello\\release\\QtGui4.dll") not found
err:module:import_dll Library libgcc_s_dw2-1.dll (which is needed by L"Z:\\home\\joana\\projects\\cross-compiling\\hello\\release\\QtGui4.dll") not found
err:module:import_dll Library QtGui4.dll (which is needed by L"Z:\\home\\joana\\projects\\cross-compiling\\hello\\release\\hello.exe") not found
err:module:LdrInitializeThunk Main exe initialization for L"Z:\\home\\joana\\projects\\cross-compiling\\hello\\release\\hello.exe" failed, status c0000135

Then I thought, that even if I cannot run the executable with Wine, it does not necessarily mean that I did not generate it correctly (which is ultimately my purpose, with all of this!). So I went to a Windows guest in VirtualBox, and I managed to successfully run the application! 🙂

hello1

This was a nice outcome, however it is still necessary to do this for the entire project, tracking all required Qt dll’s, Boost, and any other needed libraries. It would probably be a good idea to consider static linking, since I would not have to ship the DLL’s, and would also avoid digging into the problem of “why Wine is not finding the shared libraries”. However, from my experience, static linking of the Qt framework is quite an onerous task, and in this case I would have to do it twice (for Linux and for Windows).

“Hello World” Cross-Compiling

I recently switched to work on a Linux environment, in a project where most of the potential users will be using Windows.
Theoretically that is not a problem, since I am sticking to standard C++, and a few other libraries that are fully cross-platform (e.g.: Qt[1], Boost[1])
Of course in practice, things are always slightly more complicated 🙂

Since my original project was created in Windows, using the nmake compiler, I still have a functional environment that I can use. However this solution requires me to:

  1. start Windows;
  2. checkout the latest source code;
  3. tweak the configuration for the Windows environment;

It is mainly 1. and 3. that bother me; I don’t start Windows that often, and there are quite a few things to tweak, when you change from GNU g++ to MS nmake[3].

There are many options to make things a bit “smoother”, and one of them would be to change the compiler to Jom[4] or MinGW, so that it could be use in both OS, with minimal tweaks. Another option would be to create a repository only for the source code, and leave the configuration files unchanged, both in Windows in Linux. These are all things that I want to explore, but for the moment I ceased to my “fascination” with cross-compiling, and decided to investigate it a bit further. Wouldn’t it be so cool to be able to generate a Windows binary, straight from Linux? It seems like the perfect option, if it was that easy!

To get a functional cross-compiling environment, you basically will need Wine[5] and MinGW[6]. In a nutshell, Wine is a is a compatibility layer capable of running Windows applications on several POSIX-compliant operating systems; the acronym says everything: “Wine Is Not an Emulator”. MinGW is a minimalist development environment for native Microsoft Windows applications. Installing these software in Ubuntu, is as easy as:

sudo apt-get install wine mingw32 mingw32-binutils mingw32-runtime

This should be it; so I started with a simple example that I grabbed from [7]:

#include 

int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
    LPSTR lpCmdLine, int nCmdShow)
{
  MessageBox(NULL,
    "Cette fenêtre prouve que le cross-compilateur est fonctionnel !",
    "Hello World", MB_OK);
  return 0;
}

I compiled it like this, using MingGW:

i586-mingw32msvc-g++ -o essai.exe essai.cpp

To execute the binary, you will need the mingw library: mingwm10.dll, that can be copied like this to the current directory:

gunzip -c /usr/share/doc/mingw32-runtime/mingwm10.dll.gz &gt; mingwm10.dll

Then the application can be launched with:

wine essai.execute

hello2

References:
[1] http://qt-project.org/
[2] http://www.boost.org/
[3] http://msdn.microsoft.com/en-us/library/dd9y37ha.aspx
[4] http://qt-project.org/wiki/jom
[5] http://www.winehq.org/
[6] http://mingw.org/
[7] http://retroshare.sourceforge.net/wiki/index.php/Ubuntu_cross_compilation_for_Windows