Thursday, February 28, 2013

Rebalancing your hdfs, hadoop cluster

Are you stuck in a scenario where replication factor is not correct, or not like what you expect it to be?

You can go for re balancing of hdfs, what you can do is:

Suppose you have replication factor as 2 but some files are showing replication factor as 2 or 3 or 1 and you want your replication factor to be 2.

Just increase the replication factor and then decrease for your hdfs root file system recursively.

suppose you need to increase the replication factor 2 and you are having replication factor as 1, and hadoop is not automatically replicating these blocks than you can increase the replication factor to 3 and then again decrease the replication factor to 2

Use following command to increase and then decrease the replication factor.

Increasing:

hadoop dfs -setrep -R 3 /      -----> this will increase the replication factors of all the files to 3 and replicate automatically once you have enough replication you can decrease the replication factor to stabalise the cluster as you needed

Decreaseing:

hadoop dfs -setrep -R 2 /     ------>  this will make the replication factor to 2 recursively for your hdfs root partition.

this method you can apply for a single file or a specific folder too.

if you have over and underutilized nodes in the hadoop cluster you can run balancer which is in bin dir to make your cluster balanced.

NOTE : you should have enough space on your dfs for replication, because as you are increasing the replication factor, it will need space.

Wednesday, February 27, 2013

Import all tables from rdbms using sqoop

Import all tables from a database using Sqoop

sqoop import-all-tables --connect jdbc:mysql://<servername>/databasename

The command could not be located because '/usr/bin' is not included in the PATH environment variable. error

if you are getting this error :

Symptoms :

1. You are not able to login, loging screen returning back to itself.
2. If you press alt+ctrl+[F1-F7] and issue some come command and its throwing this error.

Solution:

First check "echo $PATH" if it is having entries like

/usr/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin:
/bin:/sbin

Entries are there in the pats variable, if those are not there there is a problem.

So lets starrt :

Login with your user, on terminal type /usr/bin/vi or /usr/bin/vim /etc/environment

and add

PATH="/usr/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin:/bin:/sbin"

and restart the machine.

Friday, February 22, 2013

Sunday, February 10, 2013

Difference between MySQL INT(1) or INT(10)


Hello Guys !!

Here, I would like to discuss the differences between MySQL int(1) & int(10) ... etc 

In short, it really doesn't matter. 
I know i'm not alone in thinking that it affected the size of the data field. An unsigned int has the max value of 4294967295 no matter if its INT(1) or int(10) and will use 4 bytes of data. 

 So, what does the number in the brackets mean? It pretty much comes down to display, its called the display-width. The display width is a number from 1 to 255. You can set the display width if you want all of your integer values to “appear”. If you enable zerofill on the row, the field will have a default value of 0 for int(1) and 0000000000 for int(10). 

There are 5 main numeric data types, and you should choose each one on its own merits. Based on the data you expect (or in some cases hope) to hold, you should use the correct data type. If you dont ever expect to use a value of above 127 and below -128, you should use a tinyint. This will only use 1 byte of data, which may not seem like much of a difference between the 4 used by an int, but as soon as you start to store more and more data, the effect on speed and space will be noticeable. 

 Anyway, I thought I should share my new found knowledge of the display width with everyone, because it will save me thinking its optimising stuff changing from 10 to 5, ha ha.

  Illustration :-

Friday, February 1, 2013

Phoenix: A SQL layer over HBase 'We put the SQL back in the NoSQL'

 

Phoenix is a SQL layer over HBase, delivered as a client-embedded JDBC driver, powering the HBase use cases at Salesforce.com. Phoenix targets low-latency queries (milliseconds), as opposed to batch operation via map/reduce. To see what's supported, go to our language reference guide, and read more on our wiki.

Featured Posts

#Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc

 #Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc Linux is an open-source operating system that is loved by millio...