fbpx
Tag

mongoDB

October 1, 2018

Mongo security

Sooner or later it is unevitable to secure your database. At least in my opinion... It was the first time I was securing a MongoDB instance and when I was looking for some information I came across this blog post. At first, if you want to set up username&password authentication for your MongoDB instance, you will find the article really helpful. Secondly, you can take much more from the post...specifically almost 600TB of data from all around the world (if you wish of course).

MongoDB security

So...I promised a remarkably big bundle of data. Good news is it is as easy as copypasting a command to your console to get it...if you have 600TBs of storage space. If you want to try this out, just read this analysis mentioned in the blog referenced above. The bad news is that probably almost none of those hundreds of whatever bytes out there are publicly accessible intentionally. This data comes from more than 30 000 completely unsecured MongoDB instances. It is natural to ask for a reason - so why is it that this big number of MongoDB instances serves data to just anyone who asks for it?

At first, it is important to note that 30 000 IS a big number. The main reason behind this global security neglection is simple - for a fairly long time, default MongoDB configuration was left completely unsecured.

Security is usually the first argument against using Mongo and it has been heavily criticised in that field (especially after a lovely global ransomware attack. But MongoDB developers are definitely not the only ones in charge for the situation - in fact, those 30 000 instances is how 'nah, it'll be OK' looks like - thousands of people just didn't want to spend a while configuring even a simple authentication mechanism. Simply the idea that your data is safe because it is not any kind of top secret information with a little bit of natural human laziness grant results (almost 600TBs of results).

After I realised how simple it is to provide the basic authentication settings I was just wondering why so many people haven't done these simple steps. In my opinion the biggest problem with data security is the perception of data - sadly not everyone sees it as something valuable nowadays...but as I would not leave alone my wallet or phone I would not do the same with data. The second important factor is our human nature - everyone has this 'nah, it'll be OK' in them. For me the most important message is that setting up at least username&password authentication for your MongoDB is a small step for a developer, but a huge leap for the security of your data.

Our team
WRITTEN BY VÁCLAV

August 12, 2018

The curse of string distance

When you work for some time on a larger project, you may realise it doesn't perform as well as it did. That is usually expected but sometimes, it just doesn't feel justified.

This was also our case and because the performance of the project was important, we had to get our hands dirty with profilers and debuggers. Of course, we found a few bottlenecks.

The cursed method

Pobody's nerfect, but I don't want to write today about them. I want to write about a cursed method, which was with us right from the start. Which was becoming more and more demanding as we were adding more and more classes.

The curse started from an innocent need.. Let's have short field names in Mongo to have organized layout. Let's have self explanatory bean names in application. Spring very helpfully provides the annotation @Field, which can be used to define different database field name than bean field name. And we used it generously. Just a quick search shows about 200 annotation occurrences.

We found out from profiling, the application is spending unhealthy amount of time in method org.springframework.beans.PropertyMatches.calculateStringDistance. Every field, which had different name than Mongo field, calculated the distance against every other bean field.

To make it worse, the application is processing entities from the database and we had to load about 100.000.000 of them. As you can imagine, the number of calculateStringDistance calls was pretty high. At the end of the day, we implemented a cache in PropertyMatches, but now we have to maintain our custom build of spring-beans. Not something we want to do for the whole life span of project.

The more we looked at the issue, the more we believed the call is not necessary. The distance was calculated for exception message of PropertyReferenceException from Spring Data MongoDB project.  I would argue that a preparation of Exception message should be as light as possible. Still, it can be justified if the message is helpful. This exception is caught in QueryMapper.getPath and method returns null, so we even cannot see the content of message, which drags down the performance. However, the field is still mapped correctly probably (this is where we stopped debugging) using the name from annotation @Field. The existence of @Field  itself suggests the bean name will most likely differ from Mongo field name. The PropertyReferenceException shouldn't be needed in those cases.

Is it something that should be investigated in Spring Data MongoDB? Or is there something we should do on our side to prevent this issue happening? There are not a lot of places in our code where to do things differently though. Our bean fields uses @Field annotation and we don't even have a custom bean converter. This is how we get the entity from database.

MongoTemplate template;
public OurEntity ConvertBsonDocument2OurEntity(Document entityObj) {
}

It's hard to believe we are the only ones facing this issue. If you have some observation or solution, feel free to use our comment section. We also opened the ticket DATAMONGO-1991 in Spring Jira and it would help if more people participate in the discussion. It could be hopefully resolved sooner with more feedback.


Our team
OUR SCRUM MASTER

July 22, 2018

Null checking in MongoDB

When we started using MongoDB in our projects, I was very confused by aggregations. But I wrote so many of them at this point, I actually enjoy putting them together. At least I enjoy it more than writing SQL queries. Yet there is one thing I have to process for a while in my mind every time - how to do null checking and field existence checking.

This can be quite confusing especially when you're coming from relational databases.

If we make a connection between table and collection, we see similarities between rows - documents and columns - fields. The biggest difference is, that each document from same collection can contain very different set of fields. So not only the field can contain null, but it may also not be there. Sometimes, we want to differentiate between those. The way of doing checks also depends on the usage and what do you want to achieve.

Query

The syntax of many operators differs when used in query or in aggregation. Null checking in query is rather simple. We can test any field against null. This query will find all accounts not only where field accountId is null, but also where the field doesn't exist.

db.accounts.find({ accountId : null })

To find accounts where accountId does exist and isn't null, we can use operator $ne - not equals. The second query with operator $eq is equivalent of query above.

db.accounts.find({ accountId : {$ne : null} })
db.accounts.find({ accountId : {$eq : null} })

In case we want to check only existence and don't care about null, there is an operator $exists.

db.accounts.find({ accountId : { $exists: true} })
db.accounts.find({ accountId : { $exists: false} })

Aggregation

The stage $match used to filter documents works exactly the same way as query.

{ $match: { accountId : null } }
{ $match: { accountId : {$ne : null}} }
{ $match: { accountId : {$eq : null}} }
{ $match: { accountId : {$exists : true} } }
{ $match: { accountId : {$exists : false} }

So far so good. Here comes one tricky point. In some cases, you don't want to exclude documents from pipeline but rather create a field with value which depends on the existence or null. Using the previous knowledge, we could put together Project stage with null test in condition.

{ $project: { accountIdFlag : {$cond : [{$eq : ["$accountId", null]}, 0, 1]} }}

My goal here is to create field accountIdFlag which will be 0 if accountId is null or non-existent and 1 if it contains value. However if you run this aggregation, the field will always be 1. We have to choose different strategy in this case.

{ $project: { accountIdFlag : {$cond : [{$eq : [{ $ifNull: ["$accountId", null]} , null]}, 0, 1]} }}

If the expression in $ifNull evaluates to null or missing field, replacement expression is returned - null. If not, the value is returned. We test the result against null and this time, we get correct flag - 0 if the field is null or missing and 1 if not.

At the end of the day, null checking isn't hard. But the use case above shows that it can be sometimes tricky so I hope this article will make it clear for those who struggles with null checks in MongoDB.


Our team
OUR SCRUM MASTER

July 7, 2018

Does a developer must have a pet project?

Having a pet project - a project being developed in your free time - can be a lot of fun. Give me a free week and option between spending the whole time on the beach or possibility to work on my stuff, I will probably take #2. Don't get me wrong, who doesn't like vacation? But when as the idle time raises, I'm getting bored and the itch to work on my pet projects grows. Yet by being a software developer, there is an unhealthy assumption that everyone needs to have a pet project. There are even hiring managers, who won't hire you if you don't have some. It's a badge you're committed to your profession. Right?

Well, not really. Just the existence of this unspoken yet important rule can suck all fun from that. When you're pushed to choose and finish project, it could feel like a work. With the difference you aren't paid for it. Can't your previous work experience and interview itself speak for you better? It's not that a surgeon is doing surgeries on hamsters in his free time to prove something to someone. The artist need to have his portfolio prepared because he can't just go to an interview and spend several hours drawing or prove his abilities from a talk. It's harder for a software developers to maintain a portfolio. You most probably aren't allowed to take some code from your company which represents your skills. Even if you could, a method, class or few lines of code out of context speaks nothing about you. So having a pet project makes sense and it makes sense even more, when you're fresh out of school. But let's be honest, your first contact with hiring manager is through your CV, not a GitHub repository. And even the most impressive pet project won't overshadow the interview.

"Hey, you answered all my questions well and you worked for 5 years in a big and well known company. Sadly, your Github account is empty. I hope the time spent with your kids was more important. Not hired."

Working overtime and then working again at home (in front of PC) is just not healthy. It's not good for your eyes, back, blood pressure nor blood sugar level. Don't feel bad that you don't work after work. Yes, having a pet project can be a lot of fun. It allows you to learn new technologies you wouldn't be able to meet in your job. There is a great sense of ownership as you're the owner of project in most cases. By working for example on a computer game, you can learn a totally different skill set, like sound design, music composing or 3D modeling (please throw away the stereotype that software developer cannot do an art). Not necessary something useful for, let's say, FinTech developer, but it's still a lot of fun and that's what matters if you ask me.

As always, everything is about compromises, this topic included. Not having a pet project can be a disadvantage and spending all free time on them is not good either. Instead of having a bunch of unfinished ambitious projects, try to came with one, small and enjoyable. Maybe even useful. Do you like playing games? Try to learn some game engine and participate in Game Jam - events when individual developers or small groups have to design and make a game in very limited time (often few days). My favorite type of game related development is extending game engine with plugins. The scope of them is usually small, it's something which solves my issue and it can be useful to many more people. Even having a small desktop utility is better than nothing. On the other hand, as praiseworthy as the contributing to community developed open source projects is, it's not very presentable. Try to have at least one project solely developed by yourself.

To conclude my thoughts in a tl;dr manner, don't feel bad for not coding in your free time, if your pet project doesn't bring you a joy, throw it away and think about your mental and physical health.


Our team
OUR SCRUM MASTER

June 21, 2018

Visual client for MongoDB database?

Was looking for a visual client for MongoDB database worth it? Definitely!

Fortunately, I do not belong to those developers who are satisfied just with a console. It might be because I have only started to use Linux actively two years ago (during that time Linux even invade my personal PC in the incarnation of Elementary OS). Or maybe it's because I'm a relatively short time in a position of a developer. Either way, the significant number of views of "How to close Vim" article on Stack Overflow is caused by me.

The moment, I reached the point of working with the MongoDB database, I was sure I had to find a visual client. Since I like to use and develop FOSS software, I've been searching in these waters. The waters are deep and wild, but soon I found a great solution, that I have been already using for some time - the Robo 3T, which was that time called Robomongo.

It does not offer much more than Mongo shell and it might be said that it is just a graphical extension of it. Robo 3T shows you all databases and collections in the list, numbers rows and provides automatic query completion. That makes it easy to work with multiple collections at the same time or offers a connection manager while defining the ssh tunnel directly, which is quite useful. But you can't expect more advanced features. If you want to export to CSV, you have to write your own Javascript function and let Robo 3T run it. Some time ago, I came across small problems with too little font, when the client was running on multiple screens at the same time.

Still, with those little things around it' s a clever helper. I am just a bit worried about his future because the project has gone under 3T Software Labs which works on Studio 3T. Unlike it, Robo 3T is an open source and as I went through some experiences - it seems like the best OSS projects have usually just a few of main developers who leads the direction of how the future of the project ends up. Here we reach our pain point. If an open source is an anarchy, it indeed doesn't end up well. Especially, when these developers are working alongside on a competitive commercial solution. The question arises: "Will these developers be really serious about implementing new features?!"

So far I have the impression that 3T Software Labs will want to make Robo 3T a free and poorer alternative to the 3T Studio, which will also serve as an advertisement for their commercial solution.

Of course, I'm realistic and I'm aware that the development of more advanced features costs money. After all, I decided to try Studio 3T and haven't changed since then. It indeed offers a lot, and much more than we need in our projects so far.

But let's have a look at which features made my work with Mongo easier.

  1. The most crucial one is the aggregation interface on which you can build (what the surprise), aggregation! Studio 3T offers a good compromise between graphic representation and the text representation. The interface is not too cluttered, so for that composing aggregation does not become a clicking hell. On one hand, you still have to write the content manually to the content of the individual phases, but on the other hand, it is useful to have each stage in your own tab, because you can quickly change their order or view the output of one phase from the chain. When you're finished, you can simply export your aggregation to the text. If for some reasons, you have to do changes and find out, you are still not done, you will miraculously import it back.
  2. Another frequently used feature in our company is export of collections, aggregations or queries in different formats. Certainly, I would expect the export and import from the client, but not all of them can handle it, and not all of them can manage it as well as Studio 3T. So far, it hasn't happened that I was unable to export the content exactly as I need it. What a real power is that as you don't waste time on finding a solution or another client that can do so well.
  3. I was also very attracted by the new feature, which I, unfortunately, did not have the chance to use, even though, it would be great for us. It's the automatic aggregation conversion to different languages like Javascript, Java, Python and C #. All of these are supported. It means, if your software consists of components written in different languages, this feature is invaluable. Just select the target language on the tab and it's done!!

Do you also use database clients to solve your projects? If not, find some time to try it out. It's worth it and you will not regret. Of course, one client varies from another client, but at least in our case, Studio 3T has made it way easier for us. What's more, finally I'm having fun with the aggregations.


Our team
OUR SCRUM MASTER