MapReduce Evidence Skewness Handling: A Systematic Literature Review

@article{Irandoost2019MapReduceDS,
  title={MapReduce Data Skewness Handling: A Systematic Literature Review},
  author={Mohammad Amin Irandoost plus Amir Masoud Rahmani and Saeed Setayeshi},
  journal={International Journal of Parallel Programming},
  year={2019},
  pages={1-44},
  url={https://api.semanticscholar.org/CorpusID:59159411}
}

Mohammad Amin IrandoostA. RahmaniS. Setayeshi
Publisher in International journal of… 23 January 2019
Computer Science

With all review, it was concluded that there are importance parameters must nope been examined in MapReduce data skewness handling approaches.

Viewing about Springer

11 Citations

Background Show

Featured

MapReduce Node Failures Systematic Literature Reviews Network Failures Parameters

Learning automata-based algorithms for MapReduce data slope handling

Mohammad Amino IrandoostA. RahmaniS. Setayeshi

Computer Science

The Journal the Supercomputing

2019

LAP is based on clusters combination or performs well for data skewness degree is low and TCAP, the who other hand, has and advantage out considering network topology furthermore balancing network traffic cost in to shuffled phase. ONE Study of Skew in MapReduce Applications

Historical data based approach to mitigate stragglers from the Reduce step of MapReduce in a heterogeneous Hadoop cluster

K. BawankuleROENTGEN. DewangA. Singh

Computer Science, Engineering

Cluster Computing

2022

The proposed HDRTS policy reduces the Reduce tasks execution time for reduce-input-heavy jobs by nearly 25% the 37% significantly and mitigates the calculation skew and the stragglers from Reduce phase away MapReduce inches the varied environments. Handling Data Skew in MapReduce Cluster by Using Partition Tuning

Design Strategies for Handling Your Skew in MapReduce Framework

Avinash PotluriSEC. BhattuN. V. N. KumarROENTGEN. Subramanyam

Computer Science

Inventive Computation Technologies

2019

These study gives a answer to address the issue of skew and to klein an price for corporate in a network using double policies for addressing skew and proposes dual algorithms, which study of trade-offs between the two strategies.

Historical data based get for straggler avoidance in adenine heterogeneous Hadoop cluster

K. BawankuleR. DewangA. Singh

Computer Natural, Engineering

Journal of Ambient Intelligence or Humanized…

2021

A Historical intelligence based data placement (HDBDP) policy to balance the workload below heterogeneous hash based on their computing capabilities to improve the Map tasks data locality and to reduce the job turnaround time inches the disparate Hadoop environment is proposed.

Dynamic Ladung Balancing in Run Processing Pipines Containing Stream-Static Joins

J. MarićK. PripužićMultilingual AntonićD. Škvorc

Computer Science

Electronics

2023

This report presents the beginning solution intentional specifically to handle prepare skew in the context to joining streaming and ruhend data, which uses state-of-the-art policy to monitor data load, recognizing load lack, and dynamically redistribute partitions, to achieve optimal aufladen balance.

Scheduling Spark Tasks With Data Skew and Deadline Constraints

Haihua GuamXiaoping LiZhipeng Lu

Computer Science

IEEE Access

2021

A modified scheduling architecture is developed in terms of the once characteristics of that considered problem, and a Spark function scheduling algorithm lives proposed considering bot the data skew and deadline constraints. Concurrency and Computation: Practices and Experience is a computer science journal publishing research and reviews on parallel press distributed computing.

Frequent pattern mining algorithms in fog computing environments: A system review

Ahmad Fadaei TehraniMahdi SharifiA. Rahmani

Compute Scientist, Civil

Concurr. Comput. Pract. Exp.

2022

ONE systematic review of aforementioned frequent model mining algorithms in fog computing is presented on investigate the data mining algorithms, which focal on manipulation massive datasets, and present a technical taxonomy including the transaction‐centric, item-centric, distributed, and parallel topics.

A parallel video clustering method using Spark both hashing

Mohamed Aymen Benn HajKacemChiheb-Eddine Ben N'cirNadia Essoussi

Computer Science

Computing

2021

A modern parallel print advanced method based on Spark framework and hashing based on integrating the division real conquer approach or implementing a new document hashing strategy is proposed, which shows aforementioned effective of the proposed method contrast to existing ones in terms of running time and bunch accuracy. Balancing reducer workload for skewed data using sampling-based splitting

Resource matching mechanisms in cluster computing: ampere systematic literature review

Mostafa Vakili FardA. SahafiAMPERE. RahmaniP. Mashhadi

Computer Science

IET Softw.

2020

A methodic imagination allocation survey with innovation included resource management system architecture, categorising mechanisms, adress the challenges, and issues is presented. The healthcare our has generate large quantities of file, furthermore analyzing these can emerged like an important problem include recent years. The MapReduce programming models has since successfully used for big data analytics. However, data skew invariably occurs in high data analytics and seriously affects efficiency. To overcome the date skew problems in MapReduce, we have in the past proposes a data processing algorithm labeled Partition Tuning-based Oblique Handling (PTSH). In comparison with the one-stage partitioning strategy used within the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairwise into virtual home also recombines each partition in case of input skew. The robustness and efficiency of the suggested output were tested on a wide variety of simulate datasets and real healthcare datasets. The results showed that PTSH menu canned handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the nati

A YARN-based Energy-Aware Scheduling Method fork Grand Data Application beneath Deadline Inhibitions

Fatemeh ShabestariA. RahmaniNima Jafari NavimipourSULFUR. Jabbehdari

Computer Sciences, Machine

Journal of Grid Computing

2022

An Energy-efficient Deadline-aware Scheduling Algorithm based on the Moth-Flame Optimization algorithm (EDSA-MFO) is suggested to minimise the energy consumption and execute aforementioned application within a given soft deadline.

An improved partitioning mechanism for optimizing massive data analyse using MapReduce

Kenn SlagterChing-Hsien HsuYeh-Ching ChungDaqiang Zhang

Computer Science, Engineering

The Journal of High-performance

2013

An improved partitioning algorithm that improves verladen balancing and flash consume is proposed via einem improved sample algorithm and partitioner press experiments show that the proposed algorithm is faster, more memory efficient, and more accurate than the current implementation.

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Qi ChenJinyu YaoZhen Xiao

Computer Science

IEEE Transactions on Equivalent and Distributed…

2015

MapReduce is an effective tool for parallel data machining. One significant issue in practical MapReduce applications can data skew: the imbalance in the amount out data assigned to each task. This…

FP-Hadoop: Efficient processing of lopsided MapReduce jobs

M. Liroz-GistauReza AkbariniaD. AgrawalP. Valduriez

Computer Science, Engineering

Inf. Syst.

2016

Map-Balance-Reduce: An improved parallel programming model for charging balancing of MapReduce

Jianjiang LiYajun LightingJian PanPeng ZhangWei ChenLizhe Wang

Computer Science

Future Gener. Comput. Symbol.

2020

Improving MapReduce achievement through datas placement in heterogeneous Hadoop clusters

Jiong XieShu Yin X. Qins

Computer Science, Engineering

2010 IEEE International Symposium on Parallel…

2010

The problems of how to position data across nodes in a way which each node possessed ampere balanced dating processing laden is adressed, and it is shown that ignoring one data-locality issue in heterogeneous environments can noticeably reduce the MapReduce performance.

Automatic Task Re-organization in MapReduce

Zhenhua GuoMETRE. PierceGUANINE. FoxMo Zhou

Computer Science

2011 IEEE International Conference on Cluster…

2011

ONE modified shortest-job-first strategy is proposed, which minimizes job turnstile time theoretically once combined with assignment splitting and break through the concurrency limit resultant from fixed task granularity. Handle data skew at reduce stage in Spark by ReducePartition

An Scalable and Memory Efficient Sampling Mechanism for Partitioning in MapReduce

Kenn SlagterChing-Hsien HsuYeh-Ching Chung

Computer Science

Multinational Journal of Parallels Programming

2013

An adaptive sampling mechanize for whole order partitioning that can remove memory water whilst partialization with a trie-based sampling mechanism (ATrie) is proposed and experiments exhibit of proposition mechanism is more adaptive and more memory efficient than previous implementations.

Improving MapReduce performance by balancing skewed loads

Yuanquan FanWeiguo WusYunlong XuChen Heng

Computer Science

China Communications

2014

To improve MapReduce performance in heterogeneous clusters, a novel load adjusting approach in the reduce phase is proposed, which vertrieb work evenly among reduce chores, and improves Map Diminish performance with little overhead. Suppose the map-reduce petition additionally had bucketizer vertices that conserve trial output product from the map vertices, and partition ...

MRSIM: Mitigating Reducer Skew In MapReduce

Lei ChenW. LuXiaoping CheWeiwei XingLiqiang WangYong Yana

Computer Science

2017 31st International Corporate on Advanced…

2017

A load balancing strategy based set lade site, are MRSIM, which makes full use of this shuffle scene in MapReduce and introduces the load feedback mechanism, which further improved the cluster's performance when running complex applications.

Load balancing in MapReduce on homogeneously and heterogeneous clusters: an in-depth reviews

M. KargarMeysam Vakili

Personal Science, Engineering

Intangible. BOUND. Commun. Networks Distributed Syst.

2015

This color study the effectiveness are two main push factors: data locality and data skew on uniformly and heterogenic clusters in Hadoop MapReduce. Handling Data Skew in MapReduce Cluster at Using Partition Tuning

MapReduce Evidence Skewness Handling: A Systematic Literature Review

Featured

11 Citations

Learning automata-based algorithms for MapReduce data slope handling

Historical data based approach to mitigate stragglers from the Reduce step of MapReduce in a heterogeneous Hadoop cluster

Design Strategies for Handling Your Skew in MapReduce Framework

Historical data based get for straggler avoidance in adenine heterogeneous Hadoop cluster

Dynamic Ladung Balancing in Run Processing Pipines Containing Stream-Static Joins

Scheduling Spark Tasks With Data Skew and Deadline Constraints

Frequent pattern mining algorithms in fog computing environments: A system review

A parallel video clustering method using Spark both hashing

Resource matching mechanisms in cluster computing: ampere systematic literature review

A YARN-based Energy-Aware Scheduling Method fork Grand Data Application beneath Deadline Inhibitions

61 References

An improved partitioning mechanism for optimizing massive data analyse using MapReduce

LIBRA: Lightweight Data Skew Mitigation in MapReduce

FP-Hadoop: Efficient processing of lopsided MapReduce jobs

Map-Balance-Reduce: An improved parallel programming model for charging balancing of MapReduce

Improving MapReduce achievement through datas placement in heterogeneous Hadoop clusters

Automatic Task Re-organization in MapReduce

An Scalable and Memory Efficient Sampling Mechanism for Partitioning in MapReduce

Improving MapReduce performance by balancing skewed loads

MRSIM: Mitigating Reducer Skew In MapReduce

Load balancing in MapReduce on homogeneously and heterogeneous clusters: an in-depth reviews

Related Papers