Welcome to the Learner Knowledge Base Community
What is meant by “Tablet server is quiescing” in Apache Kudu
“Tablet server is quiescing” in a state When the Kudu Tablet server is down or not responding and the Kudu…
Resolve the “Memory limit exceeded” issue in IMPALA
“Memory limit exceeded” usually, happens When the query reached its max limit and is unable to allocate any more memory
Resolve “TTransportException: MaxMessageSize reached” Hive
Symptoms The “TTransportException: MaxMessageSize reached” exception will cause hive query failures when it tries to access a table that has…
Resolve “Slow BlockReceiver write packet to mirror”
“Slow BlockReceiver write packet to mirror” error usually indicates an unhealthy node in your Hadoop cluster or if the Node…
How to find and delete files older than X days in HDFS
find and delete files older than X days in HDFS
How to get the specific column from the command output in Linux
The awk command is used in Linux to get the value of the specific column from a command output or…
Solr TTL – Auto-Purging Solr Documents
In this blog, We will learn about Auto-Purging and the importance of TTL (Time-To-Live), and how to remove documents automatically…
How to Recover Standby Namenode (Bootstrap Standby Namenode)
– There are scenarios, Where we can’t able to bring back the standby Name node due to Disk crash, OS…
Kafka CLI Command Cheat Sheet
This article, Helps you to know the Kafka CLI Command used to create and list of topics and to start…
Resolve “Orphan region in HDFS: Unable to load .regioninfo from table” in Hbase
“Orphan region in HDFS: Unable to load .regioninfo from table” usually happens, When “.regioninfo” is unavailable under the HDFS table…
How to increase CPU Utilization in Linux (CPU Spike)
There are scenarios, Where we need to test or replicate an issue in the cluster, which requires a CPU spike…
Deleting Documents in Apache Solr manually
Multiple times, we need to remove the collection data manually, Like when it occupies a lot of space in the…
How to Add or Delete Replicas in Solr Collection
In this article, We will understand about Solr Search and How to add or delete a replica in the Solr…
Resolve “org.apache.hadoop.hive.ql.lockmgr.LockException(No record of transaction could be found)”
Symptom: Hive/Tez job fails with the below error messages Error while compiling statement: FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(No record of…
IPC’s epoch X is less than the last promised epoch Y”
“IPC’s epoch X is less than the last promised epoch Y” will see this error message in both Namenode and…
Apache Ozone vs HDFS
Apache Ozone vs HDFS
How to Create a Hive table with Ozone Filesystem
Apache Hadoop Ozone is a distributed object store for Hadoop designed for scaling to trillions of objects and providing a…
How to Connect to Apache Phoenix Shell from Terminal
Apache Phoenix allows you to run SQL queries on top of HBase tables. We can use JDBC or SQLLINE command line utility to access phoenix
How to connect impala using Beeline (Direct and via Apache Knox)
There are multiple ways you can connect with Impala. In this post, We are going to discuss connecting Impala using…
Resolve “MetaException(message:Timeout when executing method: get_partitions” ERROR in Hive
“MetaException(message:Timeout when executing method: get_partitions” Occurs, When your query is unable to get the meta information
How to connect Apache Hive from Python
First of all, let’s start with some background information. Apache Hive is a data warehousing tool that allows you to…
Create Your First Python Game Rock, Paper, Scissors
What is Rock Paper Scissors and How to play it? Its a game played by 2 people with their hands,…
Generate random Words or Letters in Python
Have you ever needed to generate random words or letters in Python? Check out this article to learn different ways…
What is Randomization in Python – Explained with Coin Flip Example
What is Randomization? Randomization is to make something unpredictable and with no rules. But in real-time making a program unpredictable…
Python program to iterate through two lists in parallel
Have you ever needed to work with two lists of data in Python, and wanted to process them at the…
How custom classes are created in Python?
Custom classes in Python allow you to create your own data types with specific properties and behaviors. It will make…
Does Python have a Block Scope? (Scope in Python explained)
Python doesn’t have a block scope. If you come from a programming languages background like C, C++, or Java, In…
TypeError: a bytes-like object is required, not str error
TypeError: a bytes-like object is required, not str error occurs in Python when we try to use string where a…
How to Calculate the Average of Numbers in Python
In the day to activities of a programmer, Calculating the average for a set of numbers is one of the…
How to Find Factors of a Number in Python
In mathematics, factors play an important role in various operations such as finding prime numbers, calculating greatest common divisors, and…
FIX – Consider using the –user option
Consider using the –user option is one of the common issues that Python users face permission issues while installing Python…
Python Find in List – How to Find the Index of an Item in a List (with example)
To Find the Index of an Item in a List, We can use the index() function, Which takes a single…
Python Program to Print all Prime Numbers in an Interval
Printing all prime numbers in an interval is a common problem in mathematics. It involves finding all prime numbers within…
Defining and Calling a Function in Python
What is a function? It’s a block of statements that performs a specific task, which can be later reused any…
FizzBuzz Interview Question Python
FizzBuzz is one of the top Python interview questions asked to check, How the candidate arrives at the logic for…
Difference between List and String in Python
Python has several built-in data types to store and manipulate values. Two of the most commonly used data types in…
How to Create a Random Decimal Number Between a Range, Using Random Module
Generating random decimal numbers is a common task in many programming projects. In Python, you can easily create a random…
How to Create a Love Calculator using lower() and count() Functions in Python
This blog will help you to learn about functions like lower, and count using that, We will be creating a…
How to Make a Tip Calculator in Python
Let’s Learn together: This blog helps to write Python code that will calculate tips for a restaurant bill and share…
How to read a file and write it to another file in python
Python has inbuilt functions to open, read, write a file and perform various activities around it Let’s learn about the functions and how…
Resolve “Job aborted due to stage failure” in Spark
When it comes to troubleshooting Spark issues. One thing you get used to it is knowing what the error exactly…
Resolve “Could not find CoarseGrainedScheduler” in Spark
In this article, we will understand and learn about the CoarseGrainedScheduler and why we are encountering this error in the…
FIX – TypeError: an integer is required (got type bytes)
In this article, we will learn about the “TypeError: an integer is required (got type bytes)” that occurs in PySpark…
How to Save DataFrame as a CSV File in Spark
Spark provides a lot of APIs to save DataFrame to multiple formats like CSV, Parquet, Hive tables, etc. In this…
How to Save Spark DataFrame directly to Hive
I hope you have encountered a similar situation, Where you wanted to do some manipulation on a spark dataframe and…
Resolve the “Container killed by YARN for exceeding memory limits” Issue in Hive, Tez, and Spark jobs
“Container killed by YARN for exceeding memory limits” usually happens, When the JVM usage goes beyond the Yarn container memory…
Why Spark/MR not considering UTF-8 encoding
Reading/WRITING UTF-8 enabled file Sometimes, we could have encountered issues in which Spark returns non-ASCII characters in the wrong format….
How to read and write Excel files with Spark
Apache Spark is a powerful data processing framework, Commonly, Spark is used to process data stored in various formats, including…
Difference between groupByKey and reduceByKey in Spark
groupByKey and reduceByKey are the two different operations that help to transform RDD (Resilient Distributed Datasets). What is the difference…
Understanding the Spark stack function for pivoting data
Hello! If you’re into big data processing, you’ve probably heard of Spark, right? It’s a popular distributed computing framework used…
How to set Apache Spark Executor and Driver Memory/Cores ( pyspark and spark-submit)
In multiple cases, We need to increase the Driver/executors memory/cores to improve performance or to avoid Out of Memory issues
Spark Driver in Apache Spark and Where does the spark driver run?
Drivers are the one that starts the spark context or session in Spark, which helps in communicating with resource managers and runs tasks in
What are broadcast variables in Spark
Broadcast variables are commonly used by Spark developers to optimize their code for better performance. This article will provide a…
Script to collect thread dump (Jstack)
Jstack is a command line tool that helps to capture the thread dump of the java process. Using the thread…
Handling Data Skew in Apache Spark Application
What is Data skew? Let’s take a basic example of “CONSTRUCTION WORKERS“ In the above example: Skew happened due to…
How to Run the Spark history server locally
Sharing a step-by-step guide to the setup of the Spark history server locally (Mac or Windows). This helps to debug…
Difference between DataFrame, Dataset, and RDD in Spark
Short History of Spark: — Spark was created in Berkeley back in 2009 — An evolution of the MapReduce concepts…
What is the difference between Cache and Checkpoint in Spark
Spark is a data processing framework that helps to process data faster. It uses in-memory and multiple nodes to run…
Resolve “Task serialization failed: java.lang.StackOverflowError” in Spark
“Task serialization failed: java.lang.StackOverflowError” usually happens, When the JVM encounters a situation where it is unable to create a…
How to Enable Kerberos Debugging for Spark Application and “hdfs dfs” Command
Kerberos debugging involves enabling debug log level for the Krb5LoginModule module at the JVM level, This would help us to…
How to Become Certified Kubernetes Administrator & Developer (CKA, CKAD)
Writing this article from my personal experience and hope it will help you to become a CKA & CKAD (Cheers) Before starting with the tips,…
Learn Kubernetes Deployments and how to record changes
There are two ways to create a deployment in Kubernetes – Imperative way – Declarative way This has been explained…
How to View Kubernetes Pod Logs (With Docker logging Examples)
Viewing Kubernetes pod logs is an essential task for debugging and troubleshooting issues with applications running in a Kubernetes cluster….
How to Create and Edit a pod in Kubernetes
We can create pods and other objects (like deployments, services, etc.) using the imperative or declarative method. Check here to…
How to Backup and Restore Kubernetes cluster manually
Kubernetes is a powerful tool for container orchestration, but like any complex system, it is susceptible to failures and data…
How to run a command inside a Kubernetes Container/Pod
Kubernetes is an open-source container orchestration platform that automates container deployment, scaling, and management. In Kubernetes, a pod is the…
How to Format the Output of “kubectl” Command
By default, the output from the “kubectl” command will be easily readable by humans, But it can be further formatted based on our needs
How to Update the Image in Kubernetes Deployment
We can update the image of a Kubernetes deployment by simply running the “kubectl set image” command with the updated image
How to get the YAML file from Kubernetes objects (Pod, Deployment, Services, and combined)?
Using the “-o yaml ” option with the “kubectl get” command will get you the latest YAML file of currently deployed objects
Resolve “node has conditions: [DiskPressure]”
You might have faced the “DiskPressure” error messages in the Kubernetes cluster, Which results in pod/container eviction. In this article,…
How to list all running pod names in Kubernetes
list the pod name in Kubernetes
What’s the difference between ClusterIP, NodePort, and LoadBalancer service types in Kubernetes?
What are Kubernetes Services? A service is just another Kubernetes object just like a pod (Pod is the smallest unit…
Declarative vs Imperative way of creating Kubernetes objects
There are 2 ways to create an object in a Kubernetes cluster, either imperative or declarative. Let’s see a few…
Useful commands during Kubernetes certification (CKA,CKAD)
Know the shortcuts Kubernetes certification is basically a practical scanrios-based exam. Creating shortcuts would help you to save a lot…
Kubernetes: Use and where to find KubeConfig File
Use of Kubeconfig file: To access a Kubenetes cluster, we need to be aware about the Kube-API-Server and where it…
Authentication vs Authorization: What’s the Difference (Kubernetes)
Authentication: Let’s start this with different type of users, Who will be trying to access the cluster (We will use…
How to create a Dockerfile
— Dockerfile is a text file, Which is in a specific format Dockerfile [INSTRUCATION] [ARGUMENT] FROM python:3.6 <- Start from…
How to create our own Docker Image
-> Why would we need it in the 1st place, Because if you can’t find a command or service, which…
How can I keep a pod/container running on Kubernetes?
In general, containers are meant to exit on completion. Basically, Container will perform the task assigned to them and exit on completion (
What is Helm Chart in Kubernetes
What is Helm? Consider helm as a package + Release manager. Let’s talk about the current difficulty in deploying applications…
No posts