Technical troubleshooting and more

Posts

Veritca - Drop node / remove node on virtual environment when machine is already terminated

in our case I was trying to add 5 nodes to Vertica 10 nodes cluster. while adding the new nodes we were facing some problems with the file system and we were forced to terminate the new machines. we fixed the VM image and recreated the nodes in order to scale out Vertica cluster again. during this process we were keep getting an errors message like: "node x already exists" while running the add nodes command (although we removed them from admintools.conf and from the cluster using: /opt/vertica/sbin/install_vertica --remove-hosts) finally, we found out a solution that was very helpful in our case and is not documented well in Vertica documentation, (we were not able to find it there) drop node command. DROP NODE <node_name>; probably leftovers of the first scale out run were stayed in the database metadata itself, I guess that when we terminated the new nodes after the first try we didn't notice that one of the nodes were still part of the cluster...

VNETPERF error - Unable to connect to host

when running Vertica vnetperf utility you may face these errors below although you have no FW between servers and the communication is OK and no network errors are observed Vertica DB is up and running on these Vertica hosts as well [Connector Thread 172.16.8.11 ] Couldn't connect to 172.16.8.11 (family 2, attempt 0): Connection refused; errno=111 (Connection refused) [Connector Thread 172.16.8.10 ] Couldn't connect to 172.16.8.10 (family 2, attempt 0): Connection refused; errno=111 (Connection refused) [Connector Thread 172.16.8.12 ] Couldn't connect to 172.16.8.12 (family 2, attempt 0): Connection refused; errno=111 (Connection refused) [Connector Thread 172.16.8.11 ] Could not find anything to connect to for 172.16.8.11; errno=111 (Connection refused) [Connector Thread 172.16.8.10 ] Could not find anything to connect to for 172.16.8.10; errno=111 (Connection refused) [Connector Thread 172.16.8.12 ] Could not find anything to connect to for 172.16.8.12; err...

how to change Vertica DB node IP to use different subnet

let's say for example that you have 3 Vertica nodes and the inteconnect communication subnet was 10.0.0.X now for some reason you want to change the internal communication between the nodes to use another interface using this subnet: 192.168.1.X please follow these step by step guide in order to accomplish this target: ----------------------------------------------------------------- first change network from to use point-to-point mode 1. select set_control_mode('pt2pt'); select reload_spread(true); Check in the catalog directory if spread.conf got modified to broadcast mode. 2. Edit admintools.conf controlmode controlmode = pt2pt 3. Use admintools to distribute database configuration and metadata to all cluster nodes 4. In order to make spread adjust new configuration you need to restart Vertica DB for each Vertica node run these steps to change IP used by interconnect 1. Stop Vertica on node 2. Change hostname through vsql : alter no...

v_vdb_node0003: VX001/3381 : Failed to install new catalog

After rebooting Vertica cluster I ran into this error in one of Vertica nodes startup.log: { "node" : "v_vdb_node0003", "stage" : "Database Halted", "text" : "@v_vdb_node0003: VX001/3381: Failed to install new catalog\n\tLOCATION: doInstallAndJoin, /scratch_a/release/svrtar3866/vbuild/vertica/Transaction/TransAPI.cpp:2886", "timestamp" : "2018-01-17 08:32:40.880" } after wondering what can go wrong and trying to start the node in force mode without any success... I saw this little message in Vertica.log file: 2018-01-17 08:30:23.210 Spread Client:0x9b39800 <ERROR> @v_vdb_node0003: {doInstallAndJoin} 42501/2812: Could not add location [/vertica_data2]: Permission denied then I realized that on this specific node we had another STORAGE LOCATION after remounting the /vertica_data2 file system and starting the node everything become normal :-)

Mongodb and Jade -> sending ReplicaSet to JADE

this is how it should look like: app.js index.js replicaSet.jade result:

@v_vdb_node0005: VX001/2973: Data consistency problems found; startup aborted

While trying to start one of Vertica nodes you may face a data consistency problem. from vertica.log <PANIC> @v_vdb_node0005: VX001/2973: Data consistency problems found; startup aborted HINT: Check that all file systems are properly mounted. Also, the --force option can be used to delete corrupted data and recover from the cluster LOCATION: mainEntryPoint, /scratch_a/release/svrtar5575/vbuild/vertica/Basics/vertica.cpp:1613 so.. don't PANIC :-) solution: restart the problematic node with force flag which will repair the corruptions from buddy nodes. [dbadmin ]$ /opt/vertica/bin/admintools -t restart_node -d $db_name -s $host --force and the result: *** Restarting nodes for database vdb *** restart host node0005 with catalog v_vdb_node0005_catalog issuing multi-node restart Start...

mount.nfs: backgrounding

if you face this kind of error with your remote NFS: [root@Vertica000 ~]# mount /files/application/Rremote3 mount.nfs: backgrounding "10.0.0.2:/files/application/remoteFiles" mount.nfs: mount options: "bg,hard,nointr,rsize=65536,wsize=65536,tcp,actimeo=0,vers=3,timeo=600,addr=10.0.0.2" look for the problem in the log file: cat /var/log/messages | grep mount mount to NFS server '10.0.0.2' failed: timed out, retrying Solution: In most of the cases, you have a problem with your iptables in the destination server login as root to dest server (10.0.0.2) in my case and type this command: iptables --flush the go back to your origin server to try remount the problematic NFS file system of course this is in case nfs server was installed and functioning properly. Good luck.