DSRS: Estimation and Forecasting of Journal Influence in the Science and Technology Domain via a Lightweight Quantitative Approach

Authors: Snehanshu Saha, Neelam Jangid, Archana Mathur, Anand M

The evaluation of journals based on their influence is of interest for numerous reasons. Various methods of computing a score have been proposed for measuring the scientific influence of scholarly journals. Typically the computation of any of these scores involves compiling the citation information pertaining to the journal under consideration. This involves significant overhead since the article citation information of not only the journal under consideration but also that of other journals for the recent few years need to be stored. Our work is motivated by the idea of developing a computationally lightweight approach that does not require any data storage, yet yields a score which is useful for measuring the importance of journals. In this paper, a regression analysis based method is proposed to calculate Journal Influence Score. Proposed model is validated using historical data from the SCImago portal. The results show that the error is small between rankings obtained using the proposed method and the SCImago Journal Rank, thus proving that the proposed approach is a feasible and effective method of calculating scientific impact of journals.

Journal Influence Factor Calculation


Enter Input Parameter

H-Index

Total Docs. (Current Year)

Total Refs.

Total Cites (3years)

Citable Docs. (3years)

Cites / Doc. (2years)



OUTPUT

Journal Influence Factor

View More

ScientoBASE: A Framework and Model for Computing Scholastic Indicators of non-local influence of Journals via Native Data Acquisition algorithms

Authors: Gouri Ginde, Snehanshu Saha, Archana Mathur, Sukrit Venkatagiri

Defining and measuring internationality as a function of influence diffusion of scientific journals is an open problem. There exists no metric to rank journals based on the extent or scale of internationality. Measuring internationality is qualitative, vague, open to interpretation and is limited by vested interests. With the tremendous increase in the number of journals in various fields and the unflinching desire of academics across the globe to publish in ”international” journals, it has become an absolute necessity to evaluate, rank and categorize journals based on internationality. Authors, in the current work have defined internationality as a measure of influence that transcends across geographic boundaries. There are concerns raised by the authors about unethical practices reflected in the process of journal publication whereby scholarly influence of a select few are artificially boosted, primarily by resorting to editorial manoeuvres. To counter the impact of such tactics, authors have come up with a new method that defines and measures internationality by eliminating such local effects when computing the influence of journals. A new metric, Non-Local Influence Quotient (NLIQ) is proposed as one such parameter for internationality computation along with another novel metric, Other-Citation Quotient as the complement of the ratio of self-citation and total citation. In addition, SNIP and International Collaboration Ratio are used as two other parameters. As these journal parameters are not readily available in one place, algorithms to scrape these metrics are written and documented as a part of the current manuscript. Cobb-Douglas production function is utilized as a model to compute JIMI (Journal Internationality Modeling Index). Current work elucidates the metric acquisition algorithms while delivering arguments in favor of the suitability of the proposed model. Acquired data is corroborated by different supervised learning techniques. As part of future work, the authors present a bigger picture, RAGIS- Reputation And Global Influence Score, that will be computed to facilitate the formation of clusters of journals of high, moderate and low internationality.

View More

RREF:Reference nesting in Scientific Literature and Scholastic Diversity Score: A graph mining approach

Authors: Gouri Ginde, Snehanshu Saha, Aditya Agarwal, Arijit Mukherjee, Archana Mathur

Citation network analysis of scholarly articles and journals has already been explored in depth and the subtlety of the differences between citations and references has also been recognized. The articles listed under the references sec- tion of an article contribute to the citation count of the ref- erenced article. Analyzing citations of an article or a jour- nal of that article, which is a bottom up approach, provides a varied degree of information, such as, patterns of spread and influence of them in the academic world. However, analysis of references provides a top down approach. The reference network is represented as a graph; the nodes of this graph represent articles and the directed connection between the nodes represent the referenced relationship forming a Nested Reference Network (NRN). The refer- ence network analysis can help in exploring the history of any famous or influential article of a journal. Identi- fying various important articles in the reference network of such an article, using graph theory, helps pin point the path breaking articles which contributed in the subject/do- main evolution. Text Analysis on keywords of multiple ref- erence network of many highly cited articles of a scholar is used in generating readership profile for that scholar. Further, text analysis and natural language processing is used in introducing and computing Scholastic Diversity Score, a novel concept. An interface, which provides di- versity, readership profile and history of information mined through graph theory and text analysis, should be a handy tool for young researchers looking for a range of back- ground material in the early stages of his/her research ca- reer. This tool can be easily scaled up using Neo4j graph database for data storage and mining in future. Scholastic Diversity Score, a potentially rich discovery from data that may turn out to be inspirational and could feature promi- nently in the Scientometrics literature in future.

View More

Visualisation of massive data from scholarly Article and Journal Database: A Novel Scheme.

Author: Gouri Ginde

Scholarly articles publishing and getting cited has become a way of life for academicians. These scholarly publications shape up the career growth of not only the authors but also of the country, continent and the technological domains. Author affiliations, country and other information of an author coupled with data analytics can provide useful and insightful results. However, massive and complete data is required to perform this research. Google scholar which is a comprehensive and free repository of scholarly articles has been used as a data source for this purpose. Data scraped from Google scholar when stored as a graph and visualized in the form of nodes and relationships, can offer discerning and concealed information. Such as, evident domain shift of an author, various research domains spread for an author, prediction of emerging domain and sub domains, detection of journal and author level citation cartel behaviours etc. The data from graph database is also used in computation of scholastic indicators for the journals. Eventually, econometric model, named Cobb Douglas model is used to compute the journal’s Modeling Internationality Index based on these scholastic indicators.

View More

Evaluating the Effect of Selfishness on Flooding Based DTN Routing Algorithms

Authors: Sobin C C, Vaskar Raychoudhury

Delay Tolerant Networks (DTNs) are sparse mobile networks in which a complete end-to-end path may not exist. Routing is challenging in such networks, because of the frequent network disconnections, which will result in frequent change in network topology. Most of the existing DTN routing algorithms consider that all the nodes are honestly participating in message delivery. But in many real scenarios, nodes are selfish, and not willing to forward the message further in the network. We observed that in many real scenarios, neither all nodes are altruistic nor selfish, and classified the selfishness present in the network into individual selfishness and social selfishness. In case of individual selfishness, a node may behave selfishly because of some external reason, such as limited buffer space, limited power, etc. A node will forward the message only to nodes, who are friends, or having similar interests, or belongs to same community, etc., in case of social selfishness, In this paper, we have analyzed the impact of both individual and social selfishness on exiting flooding based routing algorithms such as Epidemic routing and Spray and Wait routing and proposed a method to detect and resolve selfishness to improve their routing performance.

View More

Big Data acquisition, preparation and analysis using Apache Software Foundation Projects

Author: Gouri Ginde, Rahul Aedula, Snehanshu Saha, Archana Mathur, Sudeepa Roy Dey, Gambhire Swati Sampatrao, BS Daya Sagar

Challenges in Big Data analysis include data inconsistency, incompleteness, scalability, timeliness and data security. Most fundamental challenge is the existing computer architecture. For several decades the latency gap between multi-core CPUs and mechanical hard disks is growing every year, making the challenges of data-intensive computing harder to overcome (Hey, Tansley, & Tolle, 2009). A systematic and general approach to these problems with a scalable architecture is required. Most of the big data is unstructured or of complex structure, which is hard to represent in rows and columns. A good candidate for a large design space can efficiently solve the big data problem in different disciplines. The book chapter highlights two specific objectives: (1) To introduce an efficient model,SVD for complex computer experiments arising in Big data which can be used across different scientific disciplines and (2) To introduce optimization techniques and tools for handling big data problems.

View More

An analytical model of prominence dynamics

Authors: Swati Routh, Snehanshu Saha, Atul Bhat, Sundar MN

Solar Prominence is an intriguing, but poorly understood magnetic structure of the solar corona. Convective motions in the photosphere and sub photo-sphere may be responsible for generating the magnetic fields that support long-lived quiescent solar prominence. The dynamics of solar prominence has been the subject of a large number of studies. We developed an analytic model to analyze the nature of the dynamics of these quiescent solar prominence.

View More

Exposition on Random Forest: A stock example

Author: Suryoday Basak, Snehanshu Saha

In random forests, decision tree learners are constructed by randomly selecting m out of M features and n out of N attributes. Here, we illustrate the working of random forests by randomly considering 20 samples from the data set as the training set and 5 samples as the test set; the training and test sets are mutually exclusive.

View More

Early Prediction of LBW Cases via Minimum Error Rate classifier: A Statistical Machine Learning Approach

Low Birth weight (LBW) acts as an indicator of sickness in newborn babies. LBW is closely associated with infant mortality as well as various health outcomes later in life. Various studies show strong correlation between maternal health during pregnancy and the child’s birth weight. This manuscript exploits machine learning techniques to gain useful information from health indicators of pregnant women for early detection of potential LBW cases. The forecasting problem has been reformulated as a classification problem between LBW and NOT-LBW classes using the Bayes’ minimum error rate classifier rendering LBW detection as a binary machine classification problem. Expectedly, the proposed model achieved accuracy of 96.77%. Indian health care data was used to construct decision rules to be extrapolated to predictive health care in smart cities. A screening tool based on the decision model is developed to assist health care professionals in Obstetrics and Gynecology (OBG). Index Terms—Low Birth weight (LBW), Smart health informatics, Minimum error rate classifier, Predictive analytics, Machine Learning (ML), Feature Ranking.

View More

Revenue Forecasting in Technological Services: Evidence from Large Data Centers

Authors: Jyotirmoy Sarkar, Bidisha Goswami, Saibal Kar, Snehanshu Saha

The global dependence on data centers has grown phenomenally in recent times. The demand for storage and maintenance of data is not limited to facilitators of information technology only, but has spread to all forms of businesses and service providers, private, public and individual. The economic scope and performance of the data centers, however, seem little discussed in the related literature. The rising cost of power supply, the crunch in storage space owing to high property prices, the difficulty of acquiring land for industrial use in various countries, etc., translate into important adjustment costs for data centers. Since the growth of business and competition leads to lower per unit prices, the rising costs offer considerable difficulty in arriving at the optimal revenue for large data centers. The results mainly show that for constant elasticity of scale production functions, the revenue and profit are maximized at low levels of elasticity. The firms can still cope with rising cost because the market for data center operations is fairly concentrated. We utilize the Constant Elasticity of Scale functions to derive conditions for cost minimization and revenue forecasting for large data centers. Further, this paper offers factor analysis in order to identify the precise contribution of each factor input in the overall cost function. The operational management in large data centers has important outcomes in view of considerable externality associated with it.

View More

Fair Resource Allocation: Load and Cost Elasticity Defined Game Theoretic Approach

Authors: Sujata Gaddemane, Snehanshu Saha, Bidisha Goswami, Sumana Sinha

Cloud Computing is a key methodology for sharing resources. Multi tenancy feature of cloud enables efficient resource sharing among multiple users simultaneously. While the resource sharing is efficient, there is a possibility of performance degradation due to the load imbalance created by the nature of resource allocation. Given an option, users are likely to be attracted towards using servers with lower unit cost, which can lead to increased load on such server and thus resulting in poor performance. This in turn leads to higher response time resulting in increased average cost to the cloud users. Objective of this project is to optimize the resource allocation in cloud environment using the mechanisms defined by game theory. In proposed method, cost charged per user is calculated based not only on the unit cost of the server but also on the current load of the server. Thus the proposed model ensures the users are charged optimally and least load imbalance among the servers after the allocation. The Cobb-Douglas production function is used for computing cost incurred by each client. Multiple experiments are carried out which shows that the load imbalance factor after the allocation among the servers is less than 1 with this proposed method.

View More

Internet Data Center

A Data center is a facility, which houses thousands of computing systems to run IT enabled business services without any interruption. The infrastructure to store, process and analyze information and to provide services through internet defined by business needs, minimizing disruptions and obstacles related to information systems in the process, defines an Internet Data center. As IT operations scaled globally, the importance of Internet Data center has become grown manifold. A data center is now an integral part of enterprise organizations, beyond reasonable doubt. Enterprises need to handle astronomical amount of data generated every day. This is not a recent phenomena but in an increasingly competitive and open market, most enterprises would have to use the data and create new platforms and solutions. These, in turn, keep the enterprises in healthy financial shape and help maintain the customer base. Efficient customer facing analytics solutions have emerged in many enterprises and this is predominantly, due to the leverage data centers offer.

View More

Fair Resource Allocation: Load and Cost Elasticity Defined Game Theoretic Approach

Authors: Sujata Gaddemane, Snehanshu Saha, Bidisha Goswami, Sumana Sinha

Cloud Computing is a key methodology for sharing resources. Multi tenancy feature of cloud enables efficient resource sharing among multiple users simultaneously. While the resource sharing is efficient, there is a possibility of performance degradation due to the load imbalance created by the nature of resource allocation. Given an option, users are likely to be attracted towards using servers with lower unit cost, which can lead to increased load on such server and thus resulting in poor performance. This in turn leads to higher response time resulting in increased average cost to the cloud users. Objective of this project is to optimize the resource allocation in cloud environment using the mechanisms defined by game theory. In proposed method, cost charged per user is calculated based not only on the unit cost of the server but also on the current load of the server. Thus the proposed model ensures the users are charged optimally and least load imbalance among the servers after the allocation. The Cobb-Douglas production function is used for computing cost incurred by each client. Multiple experiments are carried out which shows that the load imbalance factor after the allocation among the servers is less than 1 with this proposed method.

View More

A Study of Revenue Cost Dynamics in Large Data Centers: A Factorial Design Approach

Authors: Gambhire Swati Sampatrao, Sudeepa Roy Dey, Bidisha Goswami, Sai Prasanna M. S., Snehanshu Saha

Revenue optimization of large data centers is an open and challenging problem. Œe intricacy of the problem is due to the presence of too many parameters posing as costs or investment. Œis paper proposes a model to optimize the revenue in cloud data center and analyzes the model, revenue and di‚erent investment or cost commitments of organizations investing in data centers. Œe model uses the Cobb-Douglas production function to quantify the boundaries and the most signi€cant factors to generate the revenue. Œe dynamics between revenue and cost is explored by designing an experiment (DoE) which is an interpretation of revenue as function of cost/investment as factors with di‚erent levels/ƒuctuations. Optimal elasticity associated with these factors of the model for maximum revenue are computed and veri€ed . Œe model response is interpreted in light of the business scenario of data centers.

View More

SCIENTOMETRICS - A STUDY OF SCIENTIFIC PARAMETERS AND METRICS

Authors: Roy Dey, Archana Mathur, Gambhire Swati Sampatrao, Sandesh Sanjay Gade, Sai Prasanna M S

The term “Scientometrics” emerges from two significant words –Science and Metrics. It is concerned with metrics used for quantitative analysis of researcher’s contribution to various scientific domains. An effective medium to communicate scientific knowledge is via scholarly publications. It provides a platform to propagate research output within and across domains. Thus, there arise need to discover parameters which can measure a researcher’s contribution to his field. The most significant metric to measure the impact of a scientific work is citations. The Citation Indexes are utilized as scientometric tool to measure the research output of authors, articles and journals. This book chapter explores the existence of many such scientific parameters at both journal and author level. Further, authors make an earnest attempt to use them to measure the INTERNATIONALITY of peer-reviewed journals. They claim that already existing parameters alone are not sufficient for evaluation of internationality and explore new parameters for computing unbiased index both at journal and author level.

View More

Machine Learning Approaches for Supernovae Classification

Authors: Surbhi Agrawal, Kakoli Bora, Swati Routh

In this chapter, authors have discussed few machine learning techniques and their application to perform the supernovae classification. Supernovae has various types, mainly categorized into two important types. Here, focus is given on the classification of Type-Ia supernova. Astronomers use Type-Ia supernovae as “standard candles” to measure distances in the Universe. Classification of supernovae is mainly a matter of concern for the astronomers in the absence of spectra. Through the application of different machine learning techniques on the data set authors have tried to check how well classification of supernovae can be performed using these techniques. Data set used is available at Riess et al.2007 (astro-ph/0611572).

View More

DESIGN OF ASSISTIVE SPELLER MACHINE BASED ON BRAIN COMPUTER INTERFACING

Author: Suryoday Basak

Machine Learning (ML) has assumed a central role in data assimilation and data analysis in the last decade. Many methods exist that cater to the different kinds of data centric applications in terms of complexity and domain. Machine Learning methods have been derived from classical Artificial Intelligence (AI) models but are a lot more reliant on statistical methods. However, ML is a lot broader than inferential statistics. Recent advances in computational neuroscience has identified Electroencephalography (EEG) based Brain Computer Interface (BCI) as one of the key agents for a variety of medical and nonmedical applications. However, efficiency in analysing EEG signals is tremendously difficult to achieve because of three reasons: size of data, extent of computation and poor spatial resolution. The book chapter discusses the Machine Learning based methods employed by the author to classify EEG signals for potentials observed based on varying levels of a subject’s attention, measured using a NeuroSky Mindwave Mobile. It reports challenges faced in developing BCIs based on available hardware, signal processing methods and classification methods.

View More