All the question that scared me now i am trying to scare them .. so that they cant scare others :)
Friday, September 4, 2015
Wednesday, September 2, 2015
Install GNOME Desktop Ubuntu
Using following command we can install and enable desktop GUI for Ubuntu server version if needed.
sudo apt-get install ubuntu-gnome-desktop
sudo service gdm restart
sudo apt-get install ubuntu-gnome-desktop
sudo service gdm restart
Tuesday, September 1, 2015
org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist
Add following entries to yarn-site.xml if not present and restar yarn service.
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Sunday, August 30, 2015
Enable passwordless sudoers for a user
We can add following line in /etc/sudoers to enable the user to be passwordless login from other machines.
<username> ALL=(ALL:ALL) NOPASSWD: ALL
<username> ALL=(ALL:ALL) NOPASSWD: ALL
Monday, August 24, 2015
Comparison between RDBMS and Map-Reduce
|
RDBMS
|
Map-Reduce
|
Scalability
|
Scale UP
|
Scale Out
|
Data
Size
|
GB
|
PB and more
|
Read/Write
|
Batch and Interactive
|
Batch
|
Update
Type
|
Write many Read many
|
Write once Read many
|
Integrity
|
High
|
Low
|
Structure
|
Structural data/ Schema first Write later
|
Non Structural data/ Write first Schema later
|
Query
|
SQL
|
No SQL and SQL support too with add on tools
|
Response
time
|
Faster for Less data/ slow once size increases
|
Faster for more data in comparison
|
Note:
Slowly with the new sub projects being developed for Hadoop this gap is being
filled up as people are developing an abstract layer on top of Map-Reduce and
YARN frameworks which takes SQL and in turn convert it to Map-Reduce/Yarn.
|
Tuesday, June 16, 2015
Extract All Tar Files in a directory in Linux
This will first list the files containing extension tar.gz, and then awk will get the file names which is column 9 in
ls -lrth command, and
NF > 2 will remove the blank line and
tar -xvzf will extract files names contained in variable $i,
Like this we can experiment various operations like renaming all files with specific extension of so and can fiddled with to achieve various goals.
for i in `ls -lrth *.tar.gz |awk 'NF>2 {print $9}'`; do tar -xvzf $i; done
ls -lrth command, and
NF > 2 will remove the blank line and
tar -xvzf will extract files names contained in variable $i,
Like this we can experiment various operations like renaming all files with specific extension of so and can fiddled with to achieve various goals.
for i in `ls -lrth *.tar.gz |awk 'NF>2 {print $9}'`; do tar -xvzf $i; done
Thursday, June 4, 2015
All about hadoop Balancer.
Hadoop Data Balancing |
Hadoop Balancer:
This is tool provided
to balance the disk uses throughout the Hadoop cluster. I may happen sometime
that some of the nodes in the cluster becomes over utilized or underutilized,
which occurs due to addition of new nodes where newly added nodes may be
underutilized or if there are less number of nodes result in overutilization.
We can run balancer from more than 1 machine in the cluster to increase the
speed of balancing but it will increase bandwidth uses to very high.
This tool requires administrator
right on the Hadoop cluster to run.
Syntax of the
balancer:
bin/start-balancer.sh
[-threshold <threshold>]
Where
start-balancer.sh files resides in the bin directory of the Hadoop folder. And the
threshold is the parameter which decides target of balance, this lies in
fraction between 0,1 the default value is 10% if nothing is passed as the threshold
value.
This process does the transferring
of blocks between the nodes resulting network activity and if a production
cluster must be used cautiously, as it result in some block missing error or late
reply from the cluster.
This process can be
stopped any time if required using following command:
Subscribe to:
Posts (Atom)
Featured Posts
#Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc
#Linux Commands Unveiled: #date, #uname, #hostname, #hostid, #arch, #nproc Linux is an open-source operating system that is loved by millio...
-
Hadoop is a batch processing system, and Hadoop jobs tend to have high latency and incur substantial overhead in job submission and sched...
-
Print numbers in order : #!/bin/bash for i in $(seq 0 4) do for j in $(seq $i -1 0) do echo -n $j done echo done Will gi...