Learner Knowledge Base Python, Spark, Hadoop, Hbase, kubernetes

Hadoop

What is meant by “Tablet server is quiescing” in Apache Kudu

ByJerry Richard May 26, 2023

“Tablet server is quiescing” in a state When the Kudu Tablet server is down or not responding and the Kudu…

Hadoop

Resolve the “Memory limit exceeded” issue in IMPALA

ByJerry Richard May 26, 2023

“Memory limit exceeded” usually, happens When the query reached its max limit and is unable to allocate any more memory

Hadoop

Resolve “TTransportException: MaxMessageSize reached” Hive

ByJerry Richard May 26, 2023

Symptoms The “TTransportException: MaxMessageSize reached” exception will cause hive query failures when it tries to access a table that has…

Hadoop

Resolve “Slow BlockReceiver write packet to mirror”

ByJerry Richard May 26, 2023

“Slow BlockReceiver write packet to mirror” error usually indicates an unhealthy node in your Hadoop cluster or if the Node…

Hadoop

How to find and delete files older than X days in HDFS

ByJerry Richard May 26, 2023

find and delete files older than X days in HDFS

Hadoop

How to get the specific column from the command output in Linux

ByJerry Richard May 26, 2023

The awk command is used in Linux to get the value of the specific column from a command output or…

Hadoop

Solr TTL – Auto-Purging Solr Documents

ByJerry Richard May 26, 2023

In this blog, We will learn about Auto-Purging and the importance of TTL (Time-To-Live), and how to remove documents automatically…

Hadoop

How to Recover Standby Namenode (Bootstrap Standby Namenode)

ByJerry Richard May 26, 2023

– There are scenarios, Where we can’t able to bring back the standby Name node due to Disk crash, OS…

Hadoop

Kafka CLI Command Cheat Sheet

ByJerry Richard May 26, 2023

This article, Helps you to know the Kafka CLI Command used to create and list of topics and to start…

Hadoop

Resolve “Orphan region in HDFS: Unable to load .regioninfo from table” in Hbase

ByJerry Richard May 26, 2023July 19, 2023

“Orphan region in HDFS: Unable to load .regioninfo from table” usually happens, When “.regioninfo” is unavailable under the HDFS table…

Hadoop

How to increase CPU Utilization in Linux (CPU Spike)

ByJerry Richard May 26, 2023

There are scenarios, Where we need to test or replicate an issue in the cluster, which requires a CPU spike…

Hadoop

Deleting Documents in Apache Solr manually

ByJerry Richard May 26, 2023

Multiple times, we need to remove the collection data manually, Like when it occupies a lot of space in the…

Hadoop

How to Add or Delete Replicas in Solr Collection

ByJerry Richard May 26, 2023

In this article, We will understand about Solr Search and How to add or delete a replica in the Solr…

Hadoop

Resolve “org.apache.hadoop.hive.ql.lockmgr.LockException(No record of transaction could be found)”

ByJerry Richard May 26, 2023

Symptom: Hive/Tez job fails with the below error messages Error while compiling statement: FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(No record of…

Hadoop

IPC’s epoch X is less than the last promised epoch Y”

ByJerry Richard May 26, 2023

“IPC’s epoch X is less than the last promised epoch Y” will see this error message in both Namenode and…

Hadoop

Apache Ozone vs HDFS

ByJerry Richard May 26, 2023

Apache Ozone vs HDFS

Hadoop

How to Create a Hive table with Ozone Filesystem

ByJerry Richard May 26, 2023

Apache Hadoop Ozone is a distributed object store for Hadoop designed for scaling to trillions of objects and providing a…

Hadoop

How to Connect to Apache Phoenix Shell from Terminal

ByJerry Richard May 26, 2023July 19, 2023

Apache Phoenix allows you to run SQL queries on top of HBase tables. We can use JDBC or SQLLINE command line utility to access phoenix

Hadoop

How to connect impala using Beeline (Direct and via Apache Knox)

ByJerry Richard May 26, 2023

There are multiple ways you can connect with Impala. In this post, We are going to discuss connecting Impala using…

Hadoop

Resolve “MetaException(message:Timeout when executing method: get_partitions” ERROR in Hive

ByJerry Richard May 26, 2023

“MetaException(message:Timeout when executing method: get_partitions” Occurs, When your query is unable to get the meta information

Python

How to connect Apache Hive from Python

ByJerry Richard May 26, 2023

First of all, let’s start with some background information. Apache Hive is a data warehousing tool that allows you to…

Python

Create Your First Python Game Rock, Paper, Scissors

ByJerry Richard May 26, 2023

What is Rock Paper Scissors and How to play it? Its a game played by 2 people with their hands,…

Python

Generate random Words or Letters in Python

ByJerry Richard May 26, 2023

Have you ever needed to generate random words or letters in Python? Check out this article to learn different ways…

Python

What is Randomization in Python – Explained with Coin Flip Example

ByJerry Richard May 26, 2023

What is Randomization? Randomization is to make something unpredictable and with no rules. But in real-time making a program unpredictable…

Python

Python program to iterate through two lists in parallel

ByJerry Richard May 26, 2023

Have you ever needed to work with two lists of data in Python, and wanted to process them at the…

Python

How custom classes are created in Python?

ByJerry Richard May 26, 2023

Custom classes in Python allow you to create your own data types with specific properties and behaviors. It will make…

Python

Does Python have a Block Scope? (Scope in Python explained)

ByJerry Richard May 26, 2023

Python doesn’t have a block scope. If you come from a programming languages background like C, C++, or Java, In…

Python

TypeError: a bytes-like object is required, not str error

ByJerry Richard May 26, 2023

TypeError: a bytes-like object is required, not str error occurs in Python when we try to use string where a…

Python

How to Calculate the Average of Numbers in Python

ByJerry Richard May 26, 2023

In the day to activities of a programmer, Calculating the average for a set of numbers is one of the…

Python

How to Find Factors of a Number in Python

ByJerry Richard May 26, 2023

In mathematics, factors play an important role in various operations such as finding prime numbers, calculating greatest common divisors, and…

Python

FIX – Consider using the –user option

ByJerry Richard May 26, 2023

Consider using the –user option is one of the common issues that Python users face permission issues while installing Python…

Python

Python Find in List – How to Find the Index of an Item in a List (with example)

ByJerry Richard May 26, 2023

To Find the Index of an Item in a List, We can use the index() function, Which takes a single…

Python

Python Program to Print all Prime Numbers in an Interval

ByJerry Richard May 26, 2023

Printing all prime numbers in an interval is a common problem in mathematics. It involves finding all prime numbers within…

Python

Defining and Calling a Function in Python

ByJerry Richard May 26, 2023

What is a function? It’s a block of statements that performs a specific task, which can be later reused any…

Python

FizzBuzz Interview Question Python

ByJerry Richard May 26, 2023

FizzBuzz is one of the top Python interview questions asked to check, How the candidate arrives at the logic for…

Python

Difference between List and String in Python

ByJerry Richard May 26, 2023

Python has several built-in data types to store and manipulate values. Two of the most commonly used data types in…

Python

How to Create a Random Decimal Number Between a Range, Using Random Module

ByJerry Richard May 26, 2023

Generating random decimal numbers is a common task in many programming projects. In Python, you can easily create a random…

Python

How to Create a Love Calculator using lower() and count() Functions in Python

ByJerry Richard May 26, 2023

This blog will help you to learn about functions like lower, and count using that, We will be creating a…

Python

How to Make a Tip Calculator in Python

ByJerry Richard May 26, 2023

Let’s Learn together: This blog helps to write Python code that will calculate tips for a restaurant bill and share…

Python

How to read a file and write it to another file in python

ByJerry Richard May 26, 2023

Python has inbuilt functions to open, read, write a file and perform various activities around it Let’s learn about the functions and how…

Spark

Resolve “Job aborted due to stage failure” in Spark

ByJerry Richard June 22, 2023June 22, 2023

When it comes to troubleshooting Spark issues. One thing you get used to it is knowing what the error exactly…

Spark

Resolve “Could not find CoarseGrainedScheduler” in Spark

ByJerry Richard June 21, 2023June 21, 2023

In this article, we will understand and learn about the CoarseGrainedScheduler and why we are encountering this error in the…

Spark

FIX – TypeError: an integer is required (got type bytes)

ByJerry Richard June 19, 2023June 19, 2023

In this article, we will learn about the “TypeError: an integer is required (got type bytes)” that occurs in PySpark…

Spark

How to Save DataFrame as a CSV File in Spark

ByJerry Richard June 15, 2023June 15, 2023

Spark provides a lot of APIs to save DataFrame to multiple formats like CSV, Parquet, Hive tables, etc. In this…

Spark

How to Save Spark DataFrame directly to Hive

ByJerry Richard May 31, 2023May 31, 2023

I hope you have encountered a similar situation, Where you wanted to do some manipulation on a spark dataframe and…

Spark

Resolve the “Container killed by YARN for exceeding memory limits” Issue in Hive, Tez, and Spark jobs

ByJerry Richard May 26, 2023

“Container killed by YARN for exceeding memory limits” usually happens, When the JVM usage goes beyond the Yarn container memory…

Spark

Why Spark/MR not considering UTF-8 encoding

ByJerry Richard May 26, 2023

Reading/WRITING UTF-8 enabled file Sometimes, we could have encountered issues in which Spark returns non-ASCII characters in the wrong format….

Spark

How to read and write Excel files with Spark

ByJerry Richard May 26, 2023

Apache Spark is a powerful data processing framework, Commonly, Spark is used to process data stored in various formats, including…

Spark

Difference between groupByKey and reduceByKey in Spark

ByJerry Richard May 26, 2023

groupByKey and reduceByKey are the two different operations that help to transform RDD (Resilient Distributed Datasets). What is the difference…

Spark

Understanding the Spark stack function for pivoting data

ByJerry Richard May 26, 2023

Hello! If you’re into big data processing, you’ve probably heard of Spark, right? It’s a popular distributed computing framework used…

Spark

How to set Apache Spark Executor and Driver Memory/Cores ( pyspark and spark-submit)

ByJerry Richard May 26, 2023

In multiple cases, We need to increase the Driver/executors memory/cores to improve performance or to avoid Out of Memory issues

Spark

Spark Driver in Apache Spark and Where does the spark driver run?

ByJerry Richard May 26, 2023

Drivers are the one that starts the spark context or session in Spark, which helps in communicating with resource managers and runs tasks in

Spark

What are broadcast variables in Spark

ByJerry Richard May 26, 2023

Broadcast variables are commonly used by Spark developers to optimize their code for better performance. This article will provide a…

Spark

Script to collect thread dump (Jstack)

ByJerry Richard May 26, 2023

Jstack is a command line tool that helps to capture the thread dump of the java process. Using the thread…

Spark

Handling Data Skew in Apache Spark Application

ByJerry Richard May 26, 2023

What is Data skew? Let’s take a basic example of “CONSTRUCTION WORKERS“ In the above example: Skew happened due to…

Spark

How to Run the Spark history server locally

ByJerry Richard May 26, 2023

Sharing a step-by-step guide to the setup of the Spark history server locally (Mac or Windows). This helps to debug…

Spark

Difference between DataFrame, Dataset, and RDD in Spark

ByJerry Richard May 26, 2023

Short History of Spark: — Spark was created in Berkeley back in 2009 — An evolution of the MapReduce concepts…

Spark

What is the difference between Cache and Checkpoint in Spark

ByJerry Richard May 26, 2023

Spark is a data processing framework that helps to process data faster. It uses in-memory and multiple nodes to run…

Spark

Resolve “Task serialization failed: java.lang.StackOverflowError” in Spark

ByJerry Richard May 26, 2023

“Task serialization failed: java.lang.StackOverflowError” usually happens, When the JVM encounters a situation where it is unable to create a…

Spark

How to Enable Kerberos Debugging for Spark Application and “hdfs dfs” Command

ByJerry Richard May 26, 2023

Kerberos debugging involves enabling debug log level for the Krb5LoginModule module at the JVM level, This would help us to…

kubernetes

How to Become Certified Kubernetes Administrator & Developer (CKA, CKAD)

ByJerry Richard June 29, 2023June 29, 2023

Writing this article from my personal experience and hope it will help you to become a CKA & CKAD (Cheers) Before starting with the tips,…

kubernetes

Learn Kubernetes Deployments and how to record changes

ByJerry Richard May 29, 2023May 29, 2023

There are two ways to create a deployment in Kubernetes – Imperative way – Declarative way This has been explained…

kubernetes

How to View Kubernetes Pod Logs (With Docker logging Examples)

ByJerry Richard May 26, 2023

Viewing Kubernetes pod logs is an essential task for debugging and troubleshooting issues with applications running in a Kubernetes cluster….

kubernetes

How to Create and Edit a pod in Kubernetes

ByJerry Richard May 26, 2023

We can create pods and other objects (like deployments, services, etc.) using the imperative or declarative method. Check here to…

kubernetes

How to Backup and Restore Kubernetes cluster manually

ByJerry Richard May 26, 2023

Kubernetes is a powerful tool for container orchestration, but like any complex system, it is susceptible to failures and data…

kubernetes

How to run a command inside a Kubernetes Container/Pod

ByJerry Richard May 26, 2023

Kubernetes is an open-source container orchestration platform that automates container deployment, scaling, and management. In Kubernetes, a pod is the…

kubernetes

How to Format the Output of “kubectl” Command

ByJerry Richard May 26, 2023

By default, the output from the “kubectl” command will be easily readable by humans, But it can be further formatted based on our needs

kubernetes

How to Update the Image in Kubernetes Deployment

ByJerry Richard May 26, 2023

We can update the image of a Kubernetes deployment by simply running the “kubectl set image” command with the updated image

kubernetes

How to get the YAML file from Kubernetes objects (Pod, Deployment, Services, and combined)?

ByJerry Richard May 26, 2023

Using the “-o yaml ” option with the “kubectl get” command will get you the latest YAML file of currently deployed objects

kubernetes

Resolve “node has conditions: [DiskPressure]”

ByJerry Richard May 26, 2023May 31, 2023

You might have faced the “DiskPressure” error messages in the Kubernetes cluster, Which results in pod/container eviction. In this article,…

kubernetes

How to list all running pod names in Kubernetes

ByJerry Richard May 26, 2023

list the pod name in Kubernetes

kubernetes

What’s the difference between ClusterIP, NodePort, and LoadBalancer service types in Kubernetes?

ByJerry Richard May 26, 2023

What are Kubernetes Services? A service is just another Kubernetes object just like a pod (Pod is the smallest unit…

kubernetes

Declarative vs Imperative way of creating Kubernetes objects

ByJerry Richard May 26, 2023

There are 2 ways to create an object in a Kubernetes cluster, either imperative or declarative. Let’s see a few…

kubernetes

Useful commands during Kubernetes certification (CKA,CKAD)

ByJerry Richard May 26, 2023

Know the shortcuts Kubernetes certification is basically a practical scanrios-based exam. Creating shortcuts would help you to save a lot…

kubernetes

Kubernetes: Use and where to find KubeConfig File

ByJerry Richard May 26, 2023

Use of Kubeconfig file: To access a Kubenetes cluster, we need to be aware about the Kube-API-Server and where it…

kubernetes

Authentication vs Authorization: What’s the Difference (Kubernetes)

ByJerry Richard May 26, 2023

Authentication: Let’s start this with different type of users, Who will be trying to access the cluster (We will use…

kubernetes

How to create a Dockerfile

ByJerry Richard May 26, 2023

— Dockerfile is a text file, Which is in a specific format Dockerfile [INSTRUCATION] [ARGUMENT] FROM python:3.6 <- Start from…

kubernetes

How to create our own Docker Image

ByJerry Richard May 26, 2023

-> Why would we need it in the 1st place, Because if you can’t find a command or service, which…

kubernetes

How can I keep a pod/container running on Kubernetes?

ByJerry Richard May 26, 2023

In general, containers are meant to exit on completion. Basically, Container will perform the task assigned to them and exit on completion (

kubernetes

What is Helm Chart in Kubernetes

ByJerry Richard May 26, 2023

What is Helm? Consider helm as a package + Release manager. Let’s talk about the current difficulty in deploying applications…

No posts