R is a programming language developed by Ross Ihaka and Robert Gentleman in 1993. R possesses an extensive catalog of statistical and graphical methods. It includes machine learning algorithm, linear regression, time series, statistical inference to name a few. Most of the R libraries are written in R.R is a programming language and free software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.R is not only entrusted by academic, but many large companies also use R programming language, including Uber, Google, Airbnb, Facebook and so on.
Roughly half of all data scientists use R for data mining and statistical analysis — it is the programming language of choice within the rather nebulous “Big data” industry you keep hearing about. R includes built-in functions and variables designed to make statistical analysis easier, and it also provides graphic-generation tools that produce publication-quality data visualizations.R is highly extensible, and many packages exist to address specific data analysis tasks and problems. It owes a part of it’s popularity to its open-source status, which means that anyone can use R and have access to world-quality statistical analysis tools.R is designed to work on virtually any platform and can be run on systems with a Unix, Linux, Windows, or Mac OS operating system.
For installing R ,follow the link below
For installing R Studio ,follow the link below
Packages are collections of **R** functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library. **R** comes with a standard set of packages. Others are available for download and installation. Once installed, they have to be loaded into the session to be used.
install.packages("tm") # for text mining install.packages("wordcloud") # word-cloud generator
The package “tm” is used to text mining and the package “wordcloud” is used to generate wordcloud
The function library is used to load the installed packages
Choose the dataset file like your article for which you have to generate wordcloud.
words <- Corpus(VectorSource(text)) inspect(words) # View
Load our corpus and extract the words from it
# Convert the text to lower case words <- tm_map(words, content_transformer(tolower)) # Remove numbers words <- tm_map(words, removeNumbers) # Remove english common stopwords words <- tm_map(words, removeWords, stopwords("english")) # specify your stopwords as a character vector words <- tm_map(words, removeWords, c("the", "is")) # Remove punctuations words <- tm_map(words, removePunctuation) # Eliminate extra white spaces words <- tm_map(words, stripWhitespace)
Before processing the corpus we need to clean it . For example removinf stop words , punctuations , whitespaces etc
textdocument<- TermDocumentMatrix(words) matrix<- as.matrix(textdocument) sum <- sort(rowSums(m),decreasing=TRUE) dataframe <- data.frame(word = names(v),freq=v) head(d, 10)
Counting the frequency of words in a document.
set.seed(1) wordcloud(words = d$word, freq = d$freq, min.freq = 1,max.words=200, random.order=FALSE, rot.per=0.15,colors=brewer.pal(8, "Dark2"))
R is free and open-source, making it possible for anyone to have access to world-class statistical analysis tools. It is used widely in academia and the private sector and is the most popular statistical analysis programming language today. Learning R isn’t easy — if it was, data scientists wouldn’t be in such high demand. However, there is no shortage of quality resources you can use to learn R if you’re willing to put in the time and effort.