Neural networks, natural language processing, and earning call transcripts

In this post I will explain some of the steps I took in order to train a neural network from earning call transcripts, which I gathered from Seeking Alpha. First using a webscraper in order to get the text, I was able to transform them into their stem terms (in order to reduce the number of features) and treat them as categorical variables, and ultimately use the changes in stock prices after the transcipt was released as the target variable.

Libraries and set-up

As always, we first load all the needed libraries and set up our Amazon bucket (although is not required and I will only use it to deploy the model once it is built). We also load the NRC lexicon which we will use to reduce the dimensionality of our input variables by pruning out ‘not important’ terms.

library(quantmod)
library(knitr)
library(tm)
library(quanteda)
library(tidyverse)
library(rvest)
library(pins)

b = board_s3(
  '',
  region = 'us-west-1',
  access_key = '',
  secret_access_key = ''
)

## NRC lexicon
download.file('https://raw.githubusercontent.com/mjockers/syuzhet/master/R/sysdata.rda',destfile = 'syu.dic.rda')
load('syu.dic.rda')

Text-mining functions

We also set-up a couple of functions we will need to scrap the web articles.

countwords = function(str1){
  lengths(gregexpr("\\W+", str1)) + 1
}

## Gets text from a link
getText = function(link){
  try({
    link %>% 
      read_html() %>% 
      html_nodes('p') %>% 
      html_text() %>% 
      paste(collapse = '. ') %>% 
      str_replace_all('\\.\\.','\\.')
  })
}

## Gets date from an article given a link
getInfo = function(url){
  try({  attrs = url %>% 
    read_html() %>% 
    html_nodes('span') %>% 
    html_attrs()
  date = (url %>% 
            read_html() %>% 
            html_nodes('span') %>% 
            html_text())[match(1, (sapply(attrs, names) %>% str_extract_all('data') %>% sapply(length)))]
  t = url %>% 
    read_html() %>% 
    html_nodes('h1') %>% 
    html_text() 
  return(cbind(t,date))})
}

Scraping & data-wrangling

We can now employ the functions we built and fetch the data from articles that have IDs between the range of 4478500:4479741. I have tried adding a bit of commentary to my code to explain what it does.

vec = paste0("https://seekingalpha.com/article/",4478500:4479741)
info = vec %>% lapply(getInfo)

##get dates
info2 = c()
web = c()
for(i in 1:length(info)){

  if(length(info[[i]]) == 2){
    web = c(web , i )
    info2 = rbind(info2 ,info[[i]])
  }

}

 ## get names of tickers
t = info2[,1] %>% 
  str_extract_all('[(][A-Z]{1,5}[)]') %>%
  unlist %>% 
  str_remove_all('\\(|\\)')

ind = info2[,1] %>% 
  str_which('[(][A-Z]{1,5}[)]') %>%
  unlist

df = cbind( info2[ind,1:2], t)

##transform into stem terms
txt = vec[web][ind] %>% sapply(getText) %>% sapply( stemDocument)
nwords = sapply( txt, countwords)
ind = which(nwords>500) %>% unname

mat = dfm(txt[ind])
mat = mat[,((mat %>% colnames() %>% tolower()) %in% syuzhet_dict[,1])] 

##get dates
adj =  (df[,2] %>% 
          str_extract('Jan.+2022') %>% 
          str_remove_all('Jan..|......$') %>% 
          parse_number())

##get change in prices before and after transcript was released
prices = c()
for ( i in 1:length(adj)){
  if(adj[i]!=14){
    entry = NA
    try({
      symb = getSymbols(df[i,3],auto.assign = F) %>% tail(n = 9)
      dates = symb %>% time %>% lubridate::day()
      entry =  ((symb[findInterval( adj[i], dates )+1,1][[1]])-
        (symb[findInterval( adj[i], dates ),1])[[1]])/ symb[findInterval( adj[i], dates ),1][[1]]

    })
  } else{
    entry = NA
  }
  prices = c(prices, entry)
}
##define input and output variables
predictor = mat[!is.na(prices[ind]),] %>% as.matrix()
target = prices[ind] %>% na.omit() %>% scales::rescale()

##scale the input variables
for( i in 1:ncol(predictor)){
  predictor[,i] = predictor[,i]%>% scales::rescale()
}

pin_write(b, predictor, 'nlpnn.data')

## EXAMPLE OF INPUT #####################################################
predictor %>% head %>% knitr::kable()
goodlimitlikepleassirthankconcernexpectriskactualhopenewgreatresultsstrongdisruptsolidlevelgrowthworthversusshareforwardholidaymomentumwellbuildtalkfocusmajorgrowrealrightcapabilitiesexpandmortarmarketspandemicoffersevervisitquitcareprotectdifficultworkcenterprospectfallsharedprogressnicelydiligencedoubtdemandsmartsupportoptionprovenessentialexecutionallowcertainleadgreaterefficiencylowestbetterlostcontendmodelcompellingpayprovidedeveloppartnersavehardsoftappropriatebenefitsreasonfreeprovesuccessbenefitsorrytrustedtrustsaferesponsiblyinterruptionsfeevaluablemainfoundgeneralsurpassconvenienceadmitgreatestaccomplishappreciationcommituncertaincontributorstellarjoindeterminationincludeaheadforeignoffsetweakfoodcleaningledfreshenjoygrosstalentexpensesinterestcreditcashrisksgainlossconvertassettaxfavorrestricttrendshortagesrespectrepaydebtadaptcurioussignificancegovernimprovementaccordinglyoptimisticbighitincurkindyesthankstechnologydistortbestleverageprofessionadeptstrengthenfastersmartlywiseenoughsupportedhighlightfairexchangestrongerdisconnectchairmananalystvicepresentreformperformancefleetpartnershiplaunchstrengthimportantlyglobalsafetyhirecontinueproductionlabortargetsexceedmanagementsafeguardsymptomvirusvigilantunpredictablesteadfastpositiveencouragingcomplementrobustjourneyinnovationprouddirectorflourishcongratulationsunitedepthpriorityavoiddelaysdeliveryvitalrefurbishfriendmodernformulalightercoalitionguidanceamendrefunduncertaintyattractbalanceadoptclearimprovewaitsplitexceldownturndifficultiesdelaylagglitchseamlessdepresscorrectdealimprovedcrudeimpressbroaderassetsnicelatedynamicscrapextendtruequickerwrongluckbabyboombeneficialdragconventconvictexpertconsensusoutreachstrengthensfantasticvolatilityextraclarifycorrectlytoughinclementdisappointproblemsbleedsignificantproblematicoptimistcutcapabilitygrantpureattentionathleticwolfagreementfinallyvisionadjustmentsculturepassionenthusiastspursuitcontentpremierleaguefitexpertiseopportunityfinerhelpfulmistakebroadlymodesttalentedspentterrificflatterlearnpleasederogatoryplayerlovepromotesununequivocallyroughacquiringfrictionendeavorperfectgrowingkicklegalincludingpioneerflagshiptreatmentobstructdrughorizonejectpatientsupportiveashnoveltumoraloneconsistentauthorizationinterestedaccomplishedulcerexcitingcriticlitigationblockopportunitiesimportanthopefullyclearerdiseaseworksconfidenceflexibilitybalancedbarrierphysicianpracticecleanblackleadercomfortactionargumententhusiasmstrokestudybleedinggovernmentadvancewhitewinexcellencefoundationbullishsetbackangerdelayedstopshamcompletionapprovaldisappointingforgetcomplexwarnguardiandisruptionsfamiliaraccountabilityboldlydecisivenessownershiprewardgenerationpaymentproblemorganizationrelationshipscontactfruitionrichreliantpaingeniusequityleansetbacksacceptablerationalprocedurefocalcatheternegativeupsettingfocusedacceptwisheasierpleasurebrightyoungerwedgehotdeathchronicsuperiorefficacyworkinggroundworkpromisingconceptcurecandiddiediagnosisdetectimprovementscadaverchallengealterattackdamageinhibitattenuateddiscussionexemplarincreasedeligiblereimbursementmortalfallenrecommendimpressiveaugmentunderstandingtherapeuticstraightreliefaddictnervouspresencefriendlyremissurgentsolutionstucksolutionsstarefficientpressuredetrimentworsestressillsufferillnessprofessionalcompliantpropertruthpenetrationburntbonusjudiciousprudentinflationincreasessuperunitedbuildingsavingsstableappreciatepromisesdomesticinformationproductivityspitesucceedconstraintoutbreakunwaveringdisciplinedifficultyexcludeimpairconstructcommitmentunfortunatelyutilityproperlydeferconstrainforceskillcraftslowhandicapmarginallyimminentcushionentityexcesspermanentlyevolutiondeterminedprosperfinebitewealthselectiveentertainentertainmentloseoutperformjewelincreaseattractiveawardripstronglyforemostbenignlossesdeclineslowerprolongdisappearsneaklackcompetereadyexpensiveexcellentdisclaimapprovedauthorizedyoungdemonstratedflucancerlatentreadinessmorbidaxeshotstraintremendouslyavailablesickinfectkillerlessoncontagiousboostprotectedforbidboostedscientistprotectionpreventexpectationinfectedsevereluckyvirulentdrastichealthyprotectivestrongestpreventedinfectionbreakserumpatentinterestingcomplianceexcitedprimefevercompetitionblessdreamscratchincredibleinvitationpermittingscourgethrivedurabilitypreventivemoneygutsfireepidemicfightheroesadvantagetrapscombatfearresistendemicunprotectedspecificsfailexcitinglyimminentlycandidatecrowdeddeficienciesinjectionsmutantdangerthreatgrantedscaryconcernsaccordallergiesbadevilextraordinaryallergyunfamiliarhospitalsnuclearmanufacturerswasteefficientlyachievablestabilityconfronttenacityagilitywelcomeflushapplaudcoughcoldrecoveryeffectivepreferloudhonesthonorindiscernibleboldlegacyseparationperformerusualsuspendsuccessfulstraightforwardunderestimatepoormutualcapturewoundchancesweetclearlyexcitedementiamentordepressionwitbreakthroughfastesthallmarkengagetangleseagerawaitcriticalcellularabilitydisordersagespecialgoodsdisabilitysymptomsbolstersustainabilityhumanityoverlookflexiblereaffirmdelightdefendcommunityproactivecourageousmistakenrecommendationcomplimenttoxicprogressionpopularsustainablecleaneststarkcleaneroutcomeinnovativeadaptablevibrantnotedidealabsentconfidentfavorablecoasttraditionalcarefullyfabricationpressuresshelterhelpedpridepreventionleukemiatriggerdarkkillingremodelambitionscopycatremarkableaspirationcornerstoneresistancegivingresolvesharesprospectsdecentwithstandshockhopingprayingcreepsensitivewowprofoundmindfulconsistentlypersonalizedgladbiasguardassuredhappyconsentharmknowledgeartcomfortablewonderfulmaturebuttanticipatingpreciselyfirmercheerseasiesteatwonawesomeattackingbrokenmeaningfulcloudcooperativepretendsuperiorityheightenpreciousfoundersolvebathdeveloperlimitsburnappealabilitiescheerfunnyguidestrikesimplestneatdicerepresentedtaxingorganizedlabyrinthbrittleeffectivelyaggressiveinexpensivebrokeshortageinflationaryfactstrumpveteranmajoritydemocracyelectbotchfrugalcancelnormalcyaggressivelysecuredpredictabilityconsistencydistractaccomplishmentsboostingreceivedprimaryvacationsrecommendationsstringentillnessespleasantbuoyantideallypopularitycomplexityblockingoracleblindlyhybridricherchoicesuccessfullyincidentfairlyenableproductivelyingphenomenalresigndisconnectedengageddishonoruniversalunknownincorrectconsiderationsaffirmacumengritillicitsurvivedeflationrelentlessnotablyharvestcaresspiritsspiritcreativepermissibleexpendunfortunatedeterprimertenderweakengoodwillcultivationfellleadingreflectsoptimismcontinuingregulatorypotencyunsustainableterminatecreatecautionproduceraffordrecklessdominatedietsnackburdenschopconcernedparticipationshameflawlesshemorrhagecommittedenthusiastlaunchedstrengtheningacceptedacquiresteadyhiddengemoverwhelmefficaciousconvenientstupidcompetitiveinitiatedstrategicgreetingsspoilagecampusdroughtreplenishresourcessofterspoiloutstandingrecoverbreakdowndesertresolvederrcongestioncomplicatedattorneygodsenddirestormeconomicsslipmireadditionalfortunateapologizehatesuckunexpectedrevertoutstriplikespatienceretentionsmackamazingpaperworkcrazytougherassisteradicatealertsteepslamwildlethaleffectivenesshouseholdpillcrystalafraidscarefamecompletelyblindcertaintyhangcompetentaccumulateflyingsizableroutfunsorewinnermagnetinstantlycleverbeautifullytrivialboysufferedmagiczealousdraconianrenewalthreatenaccurateindicatorinstructionknightaidcompanionsuspectsurveillanceobesitydiscoverylearningunveilinfinitysolvedcumbersomeachieveexemptdetectionunboundtaskschizophreniasurvivalcheaptitmortalityaccoladeadvocacystealferventempathylessenpromisedsubtractdisciplinedprovidingplushfullyworriesupliftgrandmarvelbattlefieldhashunclearpremisefraudconstrainedspeechanimationcreatortwinrundownburdencounselstaticparamountaccidentfusematurityadvantagesconfirmeddistinctioncontrollerachievementshinetoughestworsengoldabolishfailedfulfillmentintroducedlimitedauthoritymotivationreasonablegurudeceleratedtenablethanksgivingkeynotefailureintuitivebogwarsinfantorganicdisadvantagescompoundingcomplaintsadnuisanceplaintiffhurtpenaltyimprovesrelapsemisuselawyerabusemurkyprevailguiltyerosionbattlehassledeductoutsiderboreimprovingboringhardlinespearslimeshoppingeasinggiftworstpurposefulwagesnegativespositivesscientificpollutioncovetsmoothlyexquisiteintegritystainlessfragmentedspectacularactivegainsphilosophiesdurableperpetuitytensionunscathedimmaterialbanfreedthrillgoldenpanicreliabilitypowerfulwinsdisruptivestabilizelaughcooldreamslesserbackwardbreakschatterleadshappinesspervasiveagreeloyaltyredeemrewardscollectivelycollectivebreakfastwithdrawcreativitylapseloyallagsrefreshenlightenpresentabledentenabledprecisionsimplersubstantialblowoutplanningstolenbreachinfringementdamagesneedlessunsuccessfulnegotiateoverturnedattainoverturnexhaustramawardedelectronicssucceededunforeseenexcitementawardspromiseexplorerwirelesscontributiondisputeseconomicblocksinevitablyseverityconnectionsinnovateaffordablerejectdesertsbeautifulevidentappreciatedmarshalpredictionspermissionsecurelyuniquetrendingallegemoderateclarityinterventioncongratsacidrunawayhollowunrivaledcartridgecontraryunmatchedhappiertherapeuticsdepressedbiopsywillinginstructdilemmainsultingusefulstopssmarterexcusesfacetiousridiculousdumpmesscrisisexaggeratingdiedstrangeassemblyproducingwidespreadrampantdefensedustaccomplishmentslashabruptmeritfiredovercomewinnersstrutheckcomplaindedicationunusualcalculationblameskepticdeaddistressbargainlikedinsanefunkmedleyartisanlumpypleasedlimitationinflammationaccessibleexploringfractureavertinfluenzaqualifyinfectionsmalariaparalysisdeafblindnessmanageablemetastasispersistencegravehottestrecessionexcusedissectpeeldeservefluctuationcostlyneighbordemandsfatelioncalculatingtragicdisagreesatisfysadlystaggeringbroodobjectiveweakenedstrengthenedbrilliantintelligencerelentlesslydoomconvincingdegradationantagonistslowlyeldersimplifyingoverdueagilelandmarkambitionexplorationsexplicitsuppressedexcludedpleasingdisruptionliechildabsolutesecureunderworlddepartureconsternationhelpingwarattractingarguerelevantrelationweedsbreakingwarfaregoodnessuninsuredmisalignempowercholesterolrigoroushesitateclassichackthreatsexploitationexploitvulnerablevulnerabilityglitznimbleagreedprohibitgruntattacksbulletprooffendattackerquietnoseyoddliableboundanomalyqualmripesuperbswiftintegratedhypeshoothampermudblowcompetenceunstoppableenjoyedtrickdiminishboundlessmissingfatalmissedunlikelyrivalprosconsurmountincludedwithdrawnimmaturestaggereddebilitatingdesirablemisconceptionavoidedamenjoinedcompatibledestinyreliablestallunfairevasiveinsignificantsunkhittingtariffmisunderstoodmeasuredbreathtakingexposedvengeancestealthycrypromptapttacticsextorttheftcriminalsadversarybastionrigorenrichopennesshuntinginterruptsparkcrackingcrackvisitationeducateddamningpartisanboutcatastropherefuseunsuccessfullyleaveentertainingpunchpremiseseasythoughtfulnightmarescramblesmonsterarduoususurpkiddingbrotherskewedfailingadvocateschemeworkedfavoritebeachfrontwarrantymanagebotoxadvancesadjuncttoxinspahealinghealaestheticsfreedomvisionaryuniversityimpressivelyinterestscredibilitywizardpleaprettyheavenrageproprietarychosenhellopportunisticoversightstraightenfictionrejectedhoodunprovenattractionoffensewelcomedsillywarmdamperwinningdownsideinaudiblesurpriseinventiveimmensesharplyhecticstolevisitorriskymaliciousfundamentalhelpsprocessionredirecteliminationfascinatingtroublethinkerjackpotlatencycontinuityrespectstolluselessisolationcomplacentarrogantdearmortgagepromotionstandoutpeculiarextensivefreshervanguardupsetaberrationjurisdictionvirginrecoupequallyskidcivilcomplicationadvantageousinfusecataractdysfunctionmyopialifebloodlockupexplosionanxietychildhoodbirthdyingoverstateearnestinvitecutemiservirtuouseditsstabexperiencedadvancementindependencesleekexceptionalisolatedbrilliancefraystigmakingunevenadvancedclearedinvasiveidentifyingcherishoutperformingintensesolvingimportanceacquiredirrelevantdominantmisreadkillgardenrealisticbeneficiarybustdinnercorrectionregretresignationlaborsscarcemistrustbegstrictrestrainfortunatelynoncomplianceabsenceloomeaseimpairmentdauntcomprehensivevacationfeverishrashdepreciationtalentsfadevolatileenergeticsunnyenrichmentcaterensuredecreasecorporationunsecureddeferralaccommodationbravefabulousfansfoulwastedbotherdumbmolotovshockingmisconceptionslurkanalyzeenhancedflawrobberygunhorriblesponsorgraciousburnedrelaxationabsurdweaknessfamousvarietypreferredkeenoverlookedcadstimulaterendersilkrestoredresilientattendantsinfectiousleisureforgottenstickyalarmseasonalperkmusicstreamlinedrecipientretractslowestsatisfiedrenewedgrantsintricateartistmisinterpretconstructiveslowedmandatoryhopefulweakersubparpaybacksophisticatedsufficientplentifulfrustrationskepticalbragerrorcomplaintsresigningspeculationmoralderailstoppedpermanentbonusesentrustendowmentliabilityunjustassistancesurchargejargonrespectfullygrindreserveflushingfragilerefrainsubduedcombusttoiletconcealattendancegreetintelligentpinnaclerenowncrucialaccountableaffluentcustodystubbornsimplifiedtiredfuelsswinemisleadmisleadingpositively
https://seekingalpha.com/article/44785130.38461540.60.31914890.22222220.50.11940300.33333330.30555560.19047620.06060610.28571430.46511630.250000.20.3250.80.20.61538460.33333330.250.90909090.12121210.160.09523810.15384620.14893620.07142860.08571430.09090910.10526320.89473680.53333330.16666670.33333330.363636410.66666670.18181820.22222220.13333330.14285710.10.08450700.05882350.33333330.10344830.42857140.250.12510.05882350.511.00.05714290.20.36842110.3750.210.33333330.46153850.20000000.18181820.71428570.50.50.64285710.333333310.411764710.10.50.093750.08333330.3750.55555560.510.50.28571430.25000000.50.18181820.46153850.066666710.09523810.50110.111111110.20.250.55555560.510.333333310.2510.07142861.00.333333310.4285714110.115384610.85714290.50.333333310.66666670.06250.16666670.04166670.66666670.50.11764710.05882351.00000.50.22222220.03448281.00000000.13333330.55555560.14285710.3750.09090910.50.142857110.04166670.250.210.09090910.50.510.03846150.16666670.33333330.15254240.17073170.13333330.111111110.10.33333330.210.16666670.110.50.410.10.20.250.250.33333330.00.000.00.00000.000.00000000.00000000.00.00000000.00000000.00.00000000.00.000000000.00000000.00000000.000000000.00000000.00000000000.00.00.00.00000000.00000000.00000000.00.0000.00.00000000.00000000.00000000.000.0000000000.00.00000.0000.0000.00000000.000.00000000.00000000.00000000.00000000.00000000.000.000000000.0000.000.0000.00.0000000000.00000000.00000000.00.000.00000000.00.00.00000000.00000.00.0000000.00.00000000.00000000000.000.00000000.00000000.000.000000000.00000000.00000000.0000.00.000.00000000.00.00000000.00000.00000000.00000000.000.00000.0000000000.0000.00000000.00.000000000.0000.0000.000.00000000.000000000.000.00000.000.000000000.000000000.00.000000000.000000000.000000000.000.0000000000.000000000.00.0000.00000000.00000000.00.00.0000000000.000.00.00000000.000.00.00.0000.000.000.00000000.000.00000000.0000000000.000.00000000.00.00.00.00000000.00.00000000000.000.00000000.00.00000.000000.0000.00000000.00000.00.00000000000.00.000.00.000.00000000.00000000.00000000.00.00.000000000.0000000000.00000.00.000000000000.00000000.00000000.0000.0000.0000.00.00.00000000.000.00000000.000.0000.00000000.000000000.00000000.000.000000000.00000000.0000.00000000.00.000.0000.00.00.0000.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.50.50.50.50.500000000000000000000000000000000000000000000000000000.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.5
https://seekingalpha.com/article/44785140.46153850.20.31914890.05555560.00.19402990.00000000.61111110.00000000.21212120.14285710.53488370.406250.20.4750.60.00.30769230.16000000.250.36363640.21212120.080.09523811.00000000.48936170.17857140.42857140.18181820.21052630.36842110.33333330.10000000.00000000.136363600.16666670.18181820.00000000.06666670.14285710.40.00000000.11764710.33333330.48275860.00000000.250.00000.05882350.000.50.42857140.00.26315790.0000.000.00000000.15384620.40000000.36363640.00000000.50.00.50000000.000000000.000000000.20.00.062500.16666670.0000.22222220.000.00.28571430.25000000.00.18181820.61538460.133333300.00000000.50000.000000000.00.250.11111110.000.333333300.5000.07142860.00.000000000.2857143000.076923100.42857140.00.000000000.00000000.00000.00000000.37500000.11111110.00.23529410.17647061.00000.00.11111110.00000000.66666670.60000000.55555560.28571430.0000.09090911.00.000000000.04166670.000.000.09090910.00.010.00000000.33333330.33333330.30508470.19512200.13333330.000000000.20.00000000.000.16666670.000.00.000.20.20.001.000.33333330.40.040.60.25010.410.22222220.04545450.20.33333330.64285710.20.09090910.50.285714310.11111110.57142860.285714310.33333330.07692311111.00.50.50.42857140.08333330.33333330.60.7510.50.22222220.33333330.33333330.510.1666667110.50.18750.5110.3110.33333330.410.11111110.11111110.66666670.16666670.33333330.250.666666710.1110.250.1250.50.0571429110.33333330.07692310.20.500.16666670.50.40.22222220.25010.50.2511110.20.33333330.05263161110.250.33333330.33333330.510.333333310.16666670.33333330.5110.20.250.14285710.50.33333330.25000.00000000.00000000.000.00000.0000000000.0000.00000000.00.000000000.0000.0000.000.00000000.000000000.000.00000.000.000000000.000000000.00.000000000.000000000.000000000.000.0000000000.000000000.00.0000.00000000.00000000.00.00.0000000000.000.00.00000000.000.00.00.0000.000.000.00000000.000.00000000.0000000000.000.00000000.00.00.00.00000000.00.00000000000.000.00000000.00.00000.000000.0000.00000000.00000.00.00000000000.00.000.00.000.00000000.00000000.00000000.00.00.000000000.0000000000.00000.00.000000000000.00000000.00000000.0000.0000.0000.00.00.00000000.000.00000000.000.0000.00000000.000000000.00000000.000.000000000.00000000.0000.00000000.00.000.0000.00.00.0000.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.50.50.50.50.500000000000000000000000000000000000000000000000000000.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.5
https://seekingalpha.com/article/44785870.17948720.20.38297870.44444440.00.22388060.00000000.08333330.04761900.03030300.00000000.41860470.062500.00.0500.00.00.07692310.17333330.000.09090910.06060610.120.00000000.00000000.17021280.42857140.54285710.00000000.15789470.63157891.00000000.13333330.00000000.136363600.00000000.00000000.00000000.00000000.00000000.60.01408450.00000000.00000000.27586210.00000000.250.00000.00000000.010.00.00000000.00.00000000.0000.000.00000000.07692310.26666670.18181820.00000000.00.00.57142860.000000000.352941200.70.00.062500.00000000.0000.00000000.000.00.14285710.08333330.00.00000000.15384620.000000000.00000000.00000.000000000.40.000.55555560.500.000000000.0000.14285710.00.000000000.7142857000.230769200.14285710.00.000000000.00000000.00000.00000000.00000000.33333330.00.23529410.00000000.06250.00.00000000.10344830.00000000.13333330.00000000.00000000.1250.00000000.00.142857100.00000000.000.200.00000000.00.000.34615380.00000000.00000000.06779660.02439020.33333330.000000000.20.00000000.200.00000000.000.00.200.00.60.000.000.33333330.00.000.40.25000.000.00000000.00000000.20.00000000.07142860.00.00000000.00.000000000.00000000.00000000.000000000.00000000.00000000000.00.00.50.00000000.50000000.00000000.20.2500.50.22222220.00000000.00000000.000.0000000000.00.00000.0000.0000.00000000.000.00000000.00000000.00000000.00000000.00000000.000.000000000.0000.000.0000.00.1142857000.00000000.23076920.00.000.00000000.00.00.00000000.00000.00.0000000.00.00000000.00000000000.000.00000000.00000000.000.000000000.00000000.00000000.0100.20.000.00000000.00.33333330.00110.07142860.14285711.010.21110.2272727110.7510.38461540.50.333333310.5110.5000.210.11111110.083333310.250.12510.510.166666710.333333310.20.333333300.000000000.000000000.000.0000000000.000000000.00.0000.00000000.00000000.00.00.0000000000.000.00.00000000.000.00.00.0000.000.000.00000000.000.00000000.0000000000.000.00000000.00.00.00.00000000.00.00000000000.000.00000000.00.00000.000000.0000.00000000.00000.00.00000000000.00.000.00.000.00000000.00000000.00000000.00.00.000000000.0000000000.00000.00.000000000000.00000000.00000000.0000.0000.0000.00.00.00000000.000.00000000.000.0000.00000000.000000000.00000000.000.000000000.00000000.0000.00000000.00.000.0000.00.00.0000.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.50.50.50.50.500000000000000000000000000000000000000000000000000000.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.5
https://seekingalpha.com/article/44787910.17948720.40.21276600.00000000.00.14925370.66666670.66666670.14285710.03030300.00000000.39534880.218750.00.1000.00.00.00000000.60000000.000.45454550.15151520.480.00000000.23076920.31914890.21428570.11428570.45454550.05263160.47368420.13333330.06666670.33333330.363636400.08333330.00000000.00000000.06666670.00000000.00.00000000.00000000.16666670.10344830.00000000.000.12500.05882350.000.00.00000000.00.10526320.2500.201.00000000.00000000.00000000.54545450.00000000.00.00.35714290.000000000.000000000.00.00.093750.16666670.0000.00000000.000.00.42857140.25000000.00.18181820.07692310.133333300.00000000.00000.000000000.00.000.00000000.000.000000000.2500.00000000.00.333333300.2857143000.307692300.42857140.00.000000000.00000000.00000.00000000.00000000.00000000.00.00000000.00000000.25000.00.00000000.03448280.00000000.86666670.00000000.00000000.0000.09090910.00.000000000.00000000.250.000.00000000.00.000.11538460.00000000.00000000.11864410.00000000.13333330.000000000.10.00000000.001.00000000.000.00.000.10.00.000.000.33333330.40.000.00.37510.000.11111110.59090911.00.33333330.14285710.00.00000000.00.000000000.11111110.00000000.000000000.00000000.00000000000.20.00.00.14285710.16666670.00000000.40.0000.00.00000000.66666670.33333330.000.0000000000.00.00000.0000.2000.00000000.100.00000000.61111110.00000000.16666670.00000000.000.000000000.0000.000.0000.00.1714286000.00000000.38461540.40.000.33333330.00.00.11111110.12500.00.2500100.00.00000000.00000000000.000.00000000.00000000.000.000000000.00000000.00000000.0000.00.000.00000000.00.00000000.00000.07142860.14285710.000.00000.0000000000.2500.00000000.00.666666700.0000.0000.000.00000000.000000000.000.12500.000.166666700.000000000.00.000000010.041666710.111111110.610.3333333110.144927510.50.1250.14285710.33333330.50.50.3333333110.250.20.33333330.250.20.20.2510.050.250.33333330.510.66666670.3636364110.250.14285710.20.20.50.33333330.20.16666671110.000.00000000.00.00000.000000.0000.00000000.00000.00.00000000000.00.000.00.000.00000000.00000000.00000000.00.00.000000000.0000000000.00000.00.000000000000.00000000.00000000.0000.0000.0000.00.00.00000000.000.00000000.000.0000.00000000.000000000.00000000.000.000000000.00000000.0000.00000000.00.000.0000.00.00.0000.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.50.50.50.50.500000000000000000000000000000000000000000000000000000.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.5
https://seekingalpha.com/article/44788060.17948720.40.44680850.05555560.00.07462690.00000000.63888890.04761900.03030300.07142860.41860470.218750.20.2000.60.00.84615380.38666670.000.09090910.36363640.320.00000000.00000000.34042550.07142860.34285710.60606060.10526320.57894740.06666670.11666670.00000000.045454500.66666670.00000000.11111110.13333330.00000000.30.04225350.00000000.16666670.55172410.28571430.000.00000.05882350.000.00.02857140.00.00000000.0000.000.33333330.38461540.00000000.45454550.71428570.00.00.50000000.000000000.588235300.00.50.218750.08333330.0000.00000000.000.00.00000000.33333330.00.00000000.30769230.133333300.00000000.25000.000000000.00.000.22222220.000.000000000.0000.42857140.50.000000000.5714286000.000000000.28571430.00.000000000.00000000.00000.00000000.00000000.66666670.00.11764710.00000000.25000.00.00000000.00000000.33333330.20000000.00000000.00000000.0000.00000000.50.142857100.00000000.000.000.00000000.50.000.57692310.00000000.00000000.06779660.12195120.13333330.333333300.10.00000000.000.33333330.200.00.200.20.20.000.500.00000000.20.080.00.50001.000.00000000.18181820.00.16666670.35714290.00.04545450.00.000000000.00000000.00000000.428571400.00000000.00000000000.20.01.00.28571430.00000000.33333330.00.0000.00.00000000.00000000.00000000.000.0000000000.00.00000.0000.0000.00000000.500.33333330.00000000.33333330.00000000.00000000.750.000000000.0000.000.0000.00.0285714000.00000000.07692310.00.250.00000000.00.00.00000000.00000.00.0000000.00.00000000.05263160000.000.00000000.00000000.000.000000001.00000000.00000000.0000.00.250.00000000.00.00000000.00000.00000000.42857140.000.40000.0000000000.0000.07692310.00.000000000.0000.1250.010.44444440.000000000.250.00000.000.166666700.000000000.00.000000000.000000000.000000000.000.0000000000.115942000.00.1250.00000000.00000000.00.00.0000000000.000.20.00000000.000.00.20.0000.000.000.00000000.000.00000000.0909091000.000.57142860.00.00.00.00000000.00.00000000000.510.66666671.00.51110.125110.2510.33333330.75110.50.33333331110.50.750.50.250.08333330.11111110.33333330.20.50.052631610.3333333110.25110.50.333333311110.33333330.33333330.1250.0000.0000.00.00.00000000.000.00000000.000.0000.00000000.000000000.00000000.000.000000000.00000000.0000.00000000.00.000.0000.00.00.0000.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.50.50.50.50.500000000000000000000000000000000000000000000000000000.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.5
https://seekingalpha.com/article/44788070.25641030.40.08510640.16666670.00.07462690.00000000.27777780.14285710.03030300.00000000.13953490.125000.20.0500.00.00.38461540.12000000.000.09090910.36363640.240.00000000.00000000.25531910.07142860.08571430.09090910.10526320.00000000.20000000.10000000.00000000.090909100.08333330.00000000.11111110.53333330.00000000.00.02816900.05882350.33333330.13793100.00000000.000.25000.47058820.000.00.00000000.00.10526320.1250.000.00000000.07692310.00000000.00000000.28571430.00.00.07142860.000000000.058823500.00.00.218750.08333330.1250.22222220.500.00.14285710.25000000.00.36363640.69230770.000000000.00000000.25000.000000000.00.000.33333330.000.000000000.0000.00000000.00.000000000.0000000000.000000000.14285710.00.000000000.00000000.00000.00000000.00000000.00000000.00.17647060.00000000.12500.00.00000000.00000000.00000000.06666670.00000000.00000000.0000.00000000.00.000000000.00000000.000.000.00000000.00.000.11538460.00000000.00000000.03389830.21951220.06666670.000000000.10.00000000.000.00000000.000.00.000.50.20.000.000.00000000.00.040.00.12500.000.00000000.27272730.10.16666670.21428570.20.04545450.00.000000000.11111110.14285710.000000000.00000000.00000000000.01.00.00.14285710.00000000.66666670.00.0000.00.00000000.33333330.00000000.500.1666667000.00.00000.0000.0000.00000000.000.00000000.16666670.00000000.00000000.00000000.000.000000000.0000.000.0000.00.0000000000.33333330.00000000.00.000.33333330.00.00.22222220.00000.00.2500000.00.00000000.00000000000.000.00000000.00000000.000.000000000.00000000.00000000.0000.20.000.00000000.00.00000000.00000.07142860.00000000.200.00000.0000000000.2500.23076920.00.000000000.0000.0000.800.00000000.000000000.000.00000.000.000000000.000000000.00.000000000.000000000.000000000.900.0555556001.000000000.00.0000.00000000.00000000.00.00.0000000000.000.20.00000000.000.00.00.0001.000.000.00000000.000.00000000.3636364000.000.00000000.00.00.00.33333330.00.16666670000.000.00000000.50.00000.000000.5000.00000000.00000.00.00000000000.00.000.00.000.00000000.00000000.00000000.00.00.000000001.0000000000.00000.00.000000000000.00000000.00000000.0000.2510.5110.50.20.08333330.250.22222220.510.5110.14285710.333333310.05882350.510.333333310.02631580.5110.33333330.50.510.5110.20.50.2110.511100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.50.50.50.50.500000000000000000000000000000000000000000000000000000.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.50.5
##EXAMPLE OF OUTPUT ###################################################

target %>% head %>% knitr::kable()
x
0.4044899
0.1757081
0.0458230
0.6111343
0.5286285
0.5072060

Neural network

Now that we have a defined data set to train with, we can start building a neural network and try to predict the price chane given a new url.

library(keras)

new.url = 'https://seekingalpha.com/article/4479741-chr-hansen-holding-s-chyhy-ceo-mauricio-graber-on-q1-2022-results-earnings-call-transcript'
new.mat = getText(new.url) %>% stemDocument() %>% dfm()

predictor = predictor[,colnames(predictor) %in% colnames(new.mat)] 
(predictor %>% dim)[1] == (target %>% length)
## [1] TRUE
model = keras_model_sequential() %>% 
  layer_dense(units= ncol(predictor), activation="relu", input_shape= ncol(predictor)) %>% 
  layer_dense(units= (ncol(predictor) %>% sqrt %>% floor), activation = "relu") %>% 
  layer_dense(units=1, activation="linear")

model %>% compile(
  loss = "mse",
  optimizer =  "adam", 
  metrics = list("mean_absolute_error")
)

model %>% summary()
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## dense_2 (Dense)                     (None, 160)                     25760       
## ________________________________________________________________________________
## dense_1 (Dense)                     (None, 12)                      1932        
## ________________________________________________________________________________
## dense (Dense)                       (None, 1)                       13          
## ================================================================================
## Total params: 27,705
## Trainable params: 27,705
## Non-trainable params: 0
## ________________________________________________________________________________
history <- model %>% fit(
  predictor, 
  target, 
  epochs = 600,
  batch_size = 10, 
  validation_split = 0.2
)

tags = colnames(predictor)
toBpred  = rep(0, dim(predictor)[2])

toBpred[colnames(predictor) %in% colnames(new.mat)] = new.mat[,colnames(new.mat) %in% colnames(predictor)] 

Result

We can scale back the output to its original scale and see what is the expected price change.

    scales::rescale(
      predict(model, toBpred %>% matrix(nrow = 1)),
      to = c( min(prices %>% na.omit()), max(prices %>% na.omit())),
      from = 0:1
    )
##             [,1]
## [1,] 0.004079509

Leave a comment