當XS遇到AI之2~如何用XS來準備Data

By | 2017-10-27

上一篇舉了一個例子,來說明過往的程式交易作法是把我們的操作邏 輯寫成腳本,把決策過程清楚的定義後,拿歷史資料來回測看其勝率,再來決定這個策略要不要用? 要用到什麼商品? 在什麼情況下進場? 出場?,所有的決策邏輯,用and,or,not,xor來判斷,一般我們稱為rule base的決策方式。今天我想舉實際的例子,透過XS把我覺得會影響股價的因素列出來,準備足夠的樣本,然後用Python的AI模組,試著看看透過多層感知器這樣的模型,能不能達到預測未來多空方向的效果。

首先,先來說明一下我的思維架構

我認為,人工智慧的演算過程,是用已知的X1,X2,X3……….Xn,去建構一個函數,讓這個函數產出的Yf,愈接近真實的Yt愈好。

Yf=f(x1,x2……….xn)

min(Yf-Yt)

所以建構人工智慧的演算模型,一共有幾件事情要做

一,決定x1,x2……….xn等等的輸入特徵值

二,決定要找的答案是什麼?(也就是輸出值是什麼?),要預測的是什麼?

三,準備好可以讓電腦演算學習的資料

四,建構演算模型

五,決定衡量預測能力的標準

例如我想透過昨天收盤後的各種數據,想要預測今天某檔個股會不會比前一天上漲?

首先,要決定那些數據會影響隔天的行情?

在這個例子裡,我一共用了三種數據

  1. 昨天跟大家介紹的股性相關數據,超過20天平均水準一定的比例就記為1,不然就記為0
  2. K棒本身開高低收的相對位置
  3. 幾個常用技術分析數據的值

根據上述的作法,我用XS所寫的整理數據腳本如下

variable:v1(0),v2(0),v3(0),v4(0),v5(0),v6(0),v7(0),v8(0),v9(0),v10(0)
,v11(0),v12(0),v13(0),v14(0),v15(0),v16(0),v17(0),v18(0),v19(0),v20(0);
variable:v21(0),v22(0),v23(0),v24(0),v25(0),v26(0),v27(0),v28(0),v29(0),v30(0);
var:y(0);

//如果某個欄位表現異於平常就記為1,不然就記為0,以0跟1代表某股性的特徵值
input:day(20);
input:ratio(30);
variable:count(0),x(0);
value1=GetField("總成交次數","D");
value2=average(value1,day);
value3=GetField("強弱指標");
value5=GetField("外盤均量");
value6=average(value5,day);
value7=GetField("主動買力");
value8=average(value7,day);
value9=GetField("開盤委買");
value10=average(value9,day);
value11=GetField("資金流向");
value12=average(value11,day);
value13=countif(value3>1,day);
value14=average(value13,day);//比大盤強天數
value16=GetField("法人買張");
count=0;
if value1>value2*(1+ratio/100)
then v1=1
else v1=0;
if value13>value14*(1+ratio/100)//比大盤強的天數
then v2=1
else v2=0;
if value5>value6*(1+ratio/100)
then v3=1
else v3=0;
if value7>value8*(1+ratio/100)
then v4=1
else v4=0;
if value9>value10*(1+ratio/100)
then v5=1
else v5=0;
if truerange> average(truerange,20)//真實波動區間
then v6=1
else v6=0;
if truerange<>0
then begin
if close<=open
then
value15=(close-low)/truerange*100
else
value15=(open-low)/truerange*100;//計算承接的力道
end;
if value15>average(value15,day)*(1+ratio/100)
then v7=1
else v7=0;
if volume<>0
then value17=value16/volume*100;//法人買張佔成交量比例
if value17>average(value17,10)*(1+ratio/100)
then v8=1
else v8=0;

if value11>average(value11,10)*(1+ratio/100)
then v9=1
else v9=0;
x=0;
value18=summationif(close>=close[1]*1.02,x,5);
if value18>=2 
then v10=1
else v10=0;
;//N日來漲幅較大的天數
value19=GetField("融資買進張數");
value20=GetField("融券買進張數");
value21=(value19+value20);
value22=average(value21,day);
if value21<value22*0.9 //散戶作多指標
then v11=1
else v11=0;
if close*1.2<close[30]
then v12=1
else v12=0;
//把一根K棒開高低收四點彼此間的差異列成六個不同特徵值
v13=(close-open)/close*100;//漲跌幅
v14=(close-low)/close*100;//
v15=(high-close)/close*100;
v16=(high-low)/close*100;
v17=(high-open)/close*100;
v18=(open-low)/close*100;

//把股價單日漲幅是否有超過2.5%當成一個特徵值
if close>close[1]*1.025
then v19=1
else v19=0;
//把近兩日合計漲跌幅視為一個特徵值
v20=close/close[2];
//把幾個常用的技術指標的計算結果也視為特徵值
variable:rsv1(0),k1(0),d1(0);
stochastic(9,3,3,rsv1,k1,d1);
 
 v21=k1;
 
input: period(20,"計算區間");
value1=rateofchange(close,period);
//計算區間漲跌幅
value2=arctangent(value1/period*100);
//計算上漲的角度
 
v22=value2;
 

input: Length1(14, "期數"), Threshold(25, "穿越值");
variable: pdi_value(0), ndi_value(0), adx_value(0);
DirectionMovement(Length1, pdi_value, ndi_value, adx_value);

v23=pdi_value;

input: FastLength(12, "DIF短期期數"), SlowLength(26, "DIF長期期數"), MACDLength(9, "MACD期數");
variable: difValue(0), macdValue(0), oscValue(0);
MACD(weightedclose(), FastLength, SlowLength, MACDLength, difValue, macdValue, oscValue);
 
v25=difvalue;
 
v26= momentum(close,10);
 
value6=rsi(close,12);
 v27=value26;
 v24=linearregslope(value26,6);

input:Length2(20); //"計算期間"
variable:u1(0),u2(0),u3(0),u4(0),u5(0),u6(0);
LinearReg(close, Length2, 0, u1, u2, u3, u4);
//做收盤價20天線性回歸
{u1:斜率,u4:預期值}
u5=rsquare(close,u4,20);//算收盤價與線性回歸值的R平方
 v28=u5;
 v29=u1;

value11=GetField("投信買賣超");
input:day1(8);
v30=countif(value11>0,day1);
 
//定義輸出的Y值
if close>close[1]
then y=1
else
y=0;

Print(file("C:\Users\lee\.spyder-py3\f301.log"),
 numtostr(v1[1], 0), ",",
 numtostr(v2[1], 0), ",",
 numtostr(v3[1], 0), ",",
 numtostr(v4[1], 0), ",",
 numtostr(v5[1], 0), ",",
 numtostr(v6[1], 0), ",",
 numtostr(v7[1], 0), ",",
 numtostr(v8[1], 0), ",",
 numtostr(v9[1], 0), ",",
 numtostr(v10[1], 0), ",",
 numtostr(v11[1], 0), ",",
 numtostr(v12[1], 0), ",",
 numtostr(v13[1], 2), ",",
 numtostr(v14[1], 2), ",",
 numtostr(v15[1], 2), ",",
 numtostr(v16[1], 2), ",",
 numtostr(v17[1], 2), ",",
 numtostr(v18[1], 2), ",",
 numtostr(v19[1], 0), ",",
 numtostr(v20[1], 2), ",",
 numtostr(v21[1], 2), ",",
 numtostr(v22[1], 2), ",",
 numtostr(v23[1], 2), ",",
 numtostr(v24[1], 2), ",",
 numtostr(v25[1], 2), ",",
 numtostr(v26[1], 2), ",",
 numtostr(v27[1], 2), ",",
 numtostr(v28[1], 2), ",",
 numtostr(v29[1], 2), ",",
 numtostr(v30[1], 2), ",",
 numtostr(y,0));




我把這個腳本用策略雷達來跑台郡這檔股票,

這樣就會輸出一個文字檔,我把這個文字檔轉成CSV檔,加上表頭,準備給python的人工智慧模組當測試資料

csvdata

接下來我用的Python多層感知器模組的程式碼如下

import numpy as np
import pandas as pd

# 讀入CSV資料
df = pd.read_csv(‘f303.csv’)
df.head()

# 取所需的欄位資料
cols_2d = df[[‘v1′,’v2′,’v3′,’v4′,’v5′,’v6′,’v7′,’v8′,’v9′,’v10′,’v11’,
‘v12′,’v13′,’v14′,’v15′,’v16′,’v17′,’v18′,’v19′,’v20′,’v21’,
‘v22’, ‘v23’, ‘v24’, ‘v25’, ‘v26’, ‘v27’, ‘v28’, ‘v29′,’v30′,’yy’]]
cols_2d.head()

# 取 feature, X
X = cols_2d[[‘v1’, ‘v2’, ‘v3’, ‘v4’, ‘v5’, ‘v6’, ‘v7’, ‘v8’, ‘v9’, ‘v10’, ‘v11’,
‘v12’, ‘v13’, ‘v14’, ‘v15’, ‘v16’, ‘v17′,’v18′,’v19′,’v20′,’v21′,’v22′,’v23′,’v24′,’v25’,
‘v26’, ‘v27’, ‘v28’, ‘v29′,’v30’ ]]
X.head()

# 取 label, y
y = cols_2d[‘yy’]
y.head()

# 分訓練與測試資料
from sklearn.cross_validation import train_test_split

X_train_o, X_test_o, y_train, y_test = train_test_split(X, y, test_size = 0.3) # 80%訓練, 20%測試

# 確認分後的筆數
print(‘total:’, len(cols_2d), ‘train X:’, len(X_train_o), ‘test X:’, len(X_test_o), ‘train y:’, len(y_train), ‘test y:’, len(y_test))
# 將 X 所有欄位進行正規化,將原數字壓到 0~1之間的數字
from sklearn import preprocessing

minmax_scale = preprocessing.MinMaxScaler(feature_range = (0, 1))
X_train = minmax_scale.fit_transform(X_train_o)
X_test = minmax_scale.fit_transform(X_test_o)

X_train[:10]
X_test[:10]
# 建立 MLP(多重感知器)模型
from keras.models import Sequential
from keras.layers import Dense, Dropout

model = Sequential()

model.add(Dense(units = 120, input_dim = 30, kernel_initializer = ‘uniform’, activation = ‘relu’))
model.add(Dense(units = 100, kernel_initializer = ‘uniform’, activation = ‘relu’))
#model.add(Dropout(0.35))
model.add(Dense(units = 80, kernel_initializer = ‘uniform’, activation = ‘relu’))
#model.add(Dropout(0.35))
model.add(Dense(units = 70, kernel_initializer = ‘uniform’, activation = ‘relu’))
#model.add(Dropout(0.35))
model.add(Dense(units = 1, kernel_initializer = ‘uniform’, activation = ‘sigmoid’))

model.compile(loss = ‘binary_crossentropy’, optimizer = ‘adam’, metrics = [‘accuracy’])

train_history = model.fit(x = X_train, y = y_train, validation_split = 0.1, epochs = 30, batch_size = 30, verbose = 2)

# 用測試資料預測
scores = model.evaluate(x = X_test, y = y_test)

# 預測準確率
scores[1]

透過這樣的運算,我們可以找到一組的參數,在用這些特徵值去預測隔日股價漲跌時的精準度可以達到66%

精準度sample

以上是跟大家透過舉例,介紹XS可以做為 人工智慧運算前整理特徵資料的平台,人工智慧博大精深,要用人工智慧來作投資操作,有很多的關上要克服,XS看來在特徵值的萃取及測試資料的整理上可以幫得上忙

至於演算的部份,就只有靠我們自己努力繼續唸書了。